The chip industry built its identity on a single promise: transistor counts would double roughly every two years, delivering faster and cheaper computing in a reliable cadence. That promise, known as Moore’s Law, is now buckling under physical and energy constraints at the exact moment artificial intelligence demands exponential growth in processing power. The collision between slowing hardware gains and accelerating AI ambition has turned the pursuit of the technological singularity, the hypothetical point where machine intelligence surpasses human cognition, into a desperate engineering scramble with no guaranteed finish line.
Inside the frantic race to reach the singularity before Moore’s law dies now looks less like a pure software sprint and more like a fight against hard limits in chips, power, and manufacturing capacity.
Why General-Purpose Chips Hit a Wall
For decades, shrinking transistors delivered automatic speed and efficiency gains through a principle called Dennard scaling: as transistors got smaller, they used proportionally less power, so clock speeds could rise without overheating. That relationship broke down in the mid-2000s. The U.S. National Science Foundation has framed the problem bluntly, identifying the end of Dennard scaling, architectural inefficiencies, and shifting application demands as the three forces that stalled general-purpose CPU performance growth. Without Dennard scaling, adding more transistors no longer translates into proportional speed increases; chips simply run hotter.
That breakdown explains why the industry pivoted toward specialized accelerators, particularly GPUs and custom AI chips, rather than continuing to bet on faster general-purpose processors. The NSF’s own research programs and public-access archive reflect this shift, with growing investment in domain-specific architectures. Complementing this qualitative shift, the NSF’s statistical arm tracks the broader innovation landscape through science and engineering indicators, underscoring how advances in computing hardware now depend on tightly coupled ecosystems of specialized chips, software, and infrastructure. The practical consequence is stark: the old model where a new CPU generation lifted all software equally is gone. AI workloads now require purpose-built silicon, and the companies that control that silicon hold disproportionate influence over how fast intelligence scales.
Energy Walls and the Limits of Efficiency
Even as chipmakers squeeze more transistors onto each die, the energy cost of frontier-scale computing is becoming a binding constraint. A peer-reviewed paper in Cluster Computing revisited Koomey’s Law, which tracks how computing energy efficiency improves over time, using public supercomputing data from the TOP500 and Green500 lists. The study reports measurable doubling rates for energy efficiency, but its central finding ties the end of Moore’s Law directly to energy constraints: AI scaling runs into power and thermal walls even when compute demand explodes. Efficiency gains, in other words, are not keeping pace with the raw appetite of large AI models.
The real-world evidence is visible at the top of the supercomputing rankings. Oak Ridge National Laboratory’s Frontier system, a flagship exascale machine, published detailed power measurements from a 2023 benchmark run used for its TOP500 and Green500 submission, documenting the enormous electrical draw required to sustain exascale performance. Training a single large language model on comparable infrastructure can consume energy equivalent to thousands of households over weeks, locking AI progress to the availability of cheap, reliable electricity and advanced cooling. If efficiency gains continue to lag behind model growth, the path to superintelligent AI narrows to whoever can secure the most electrical capacity and data-center real estate, not just the best algorithms.
In that sense, the race to the singularity before Moore’s Law dies is also a race to build enough compute and power infrastructure to keep scaling when transistor improvements slow.
ASML, NVIDIA, and the Bottleneck Gatekeepers
Two companies sit at critical chokepoints in the global AI hardware supply chain, and both face risks that could slow the race. ASML, the Dutch firm whose extreme ultraviolet lithography machines are the only tools capable of printing the most advanced chip patterns, effectively gatekeeps leading-edge lithography according to its 2024 annual report filed with the U.S. Securities and Exchange Commission. No competitor offers an alternative at the same resolution. Export controls targeting China add geopolitical friction to an already constrained supply chain, meaning that access to cutting-edge chips is increasingly shaped by diplomatic decisions as much as engineering ones.
On the accelerator side, NVIDIA’s latest Form 10-K, filed with the SEC for the fiscal year ended January 26, 2025, lays out the company’s risk factors in plain terms: supply constraints, geopolitics and export controls, customer concentration, and data-center growth drivers all appear as material concerns. The filing reflects a company riding enormous demand for AI GPUs while acknowledging that the supply chain supporting that demand is fragile. If either ASML’s lithography pipeline or NVIDIA’s GPU production stumbles, the downstream effect on AI training capacity would be immediate. The concentration of so much capability in so few hands is itself a systemic risk that most singularity forecasts tend to ignore, because it turns what looks like a smooth exponential curve into a staircase vulnerable to political shocks, factory accidents, or export bans.
Software Workarounds When Hardware Stalls
Researchers are not waiting passively for the next chip generation. A growing body of work focuses on squeezing more useful computation from existing hardware through algorithmic innovation and clever scheduling. One example is PipeInfer, a technique described in a paper on accelerating LLM inference using asynchronous pipelined speculation, which targets the bottleneck of running large language models at inference time by overlapping computation stages that traditionally run sequentially. This line of work sits alongside quantization, sparsity, and mixture-of-experts architectures, all of which aim to reduce the number of high-precision operations required per token without degrading model quality too severely.
These software-level optimizations matter because they change the economics of AI deployment. If inference costs drop significantly through better scheduling, speculative decoding, or model compression, the same fixed pool of GPUs can serve many more users and support more ambitious applications. That, in turn, can offset some of the slowdown in raw hardware improvement and buy time for new process technologies, packaging approaches, and energy innovations to mature. Yet there are limits: when models grow by one or two orders of magnitude, incremental software efficiencies cannot fully erase the underlying need for more transistors and more watts, they can only stretch the existing infrastructure a bit further.
The Singularity, Reframed as an Infrastructure Problem
All of these trends point to a reframing of the technological singularity from an abstract question about intelligence to a concrete question about infrastructure. Instead of asking when machine cognition will surpass human cognition in the abstract, it may be more realistic to ask when our fabrication plants, power grids, and data centers can support another several orders of magnitude in compute. The end of Dennard scaling, the energy ceilings highlighted in supercomputing efficiency studies, and the chokepoints around lithography and accelerators suggest that progress will be punctuated by bottlenecks and plateaus, not a smooth, inevitable curve toward omniscient AI.
That does not mean superhuman AI is impossible, only that its arrival is contingent on an interlocking set of engineering, economic, and geopolitical bets. Overcoming physical limits may require breakthroughs in areas such as advanced cooling, new transistor materials, or radically different computing paradigms, and even then, someone must finance and operate the massive facilities needed to deploy them at scale. In the meantime, the most realistic path forward is a hybrid one: incremental hardware advances, aggressive software optimization, and careful management of scarce manufacturing and energy resources. The race toward the singularity, in other words, is no longer just an algorithmic contest. It is a global competition to build and coordinate the physical substrate of intelligence itself.
More from Morning Overview
*This article was researched with the help of AI, with human editors creating the final content.