Most computer chips are small enough to hide behind a postage stamp. The one Cerebras Systems builds would cover your dinner plate.
In 2019, the California-based startup unveiled the Wafer Scale Engine (WSE), a processor etched across an entire 300-millimeter silicon wafer containing 1.2 trillion transistors. Rather than slicing a wafer into hundreds of thumbnail-size dies the way Nvidia, AMD, and Intel do, Cerebras left the wafer intact and wired the whole surface into a single, unified circuit purpose-built for training artificial intelligence models.
The company has since shipped two successors. The WSE-2, introduced in 2021, packed 2.6 trillion transistors on a 7-nanometer process. And in March 2024, Cerebras announced the WSE-3, fabricated on TSMC’s 5nm node with 4 trillion transistors, 900,000 AI-optimized cores, and 44 gigabytes of on-chip SRAM. Each generation has roughly doubled the transistor count of the last.
Why one giant chip instead of thousands of small ones
Training a frontier large language model on a conventional cluster means splitting the model’s parameters across hundreds or thousands of GPUs, then constantly shuffling data between them over high-speed cables and switches. That inter-chip communication eats time and burns power. Cerebras’s pitch is simple: if every core and every byte of working memory lives on the same piece of silicon, the data never has to leave the chip. No network hops, no waiting on interconnects, no complex parallelism software to orchestrate the split.
“The whole idea is that you eliminate the communication overhead that dominates large-scale training,” Cerebras co-founder and CEO Andrew Feldman told reporters at the WSE-3 launch event in March 2024. The CS-3, the server system that houses the WSE-3, is designed to slot into existing data-center racks and claims twice the AI training performance of its predecessor, the CS-2.
The engineering problem most chipmakers avoid
There is a reason no one else ships a chip this large. Semiconductor manufacturing is inherently imperfect. Tiny defects, caused by dust particles, chemical irregularities, or lithographic errors, scatter randomly across every wafer a foundry produces. When a wafer is diced into small chips, a defect ruins one die and the rest ship fine. Yield losses are manageable.
On a wafer-scale chip, every defect hits the only product. Cerebras addressed this by designing redundancy into the fabric: spare cores and extra routing paths allow the system to detect dead spots and wire around them, so the surviving cores still function as a single processor. The approach works well enough to ship product, though Cerebras has not publicly disclosed what percentage of cores are typically disabled on a finished wafer or what its overall manufacturing yield looks like.
Thermal management is another challenge unique to this form factor. A slab of silicon roughly 46,000 square millimeters in area generates far more concentrated heat than a rack of discrete GPUs spread across multiple server nodes. Cerebras has engineered custom cold-plate cooling for the CS-3 system, but detailed thermal design power (TDP) ratings and measured energy-per-training-step figures have not been published in the company’s spec sheets as of mid-2026.
Where the chips are already running
Despite the startup’s lower profile compared to Nvidia, Cerebras hardware is not confined to a lab. Argonne National Laboratory, one of the U.S. Department of Energy’s flagship research centers, has deployed Cerebras systems for scientific AI workloads. Mayo Clinic has explored the technology for medical research applications. And several cloud and enterprise customers have accessed Cerebras compute through the company’s Condor Galaxy AI supercomputer clusters, built in partnership with Abu Dhabi-based technology group G42.
That G42 relationship also drew scrutiny. When Cerebras filed its S-1 registration with the Securities and Exchange Commission in September 2024 ahead of a planned initial public offering, the document revealed that a substantial portion of the company’s revenue was tied to G42-related orders. The Committee on Foreign Investment in the United States (CFIUS) reportedly reviewed aspects of the deal, raising questions about the export of advanced AI hardware to the Middle East at a time when Washington has been tightening chip-export controls.
The S-1 filing also offered the first detailed look at Cerebras’s financials, showing rapid revenue growth but significant net losses, a profile typical of a hardware startup competing against an entrenched incumbent with Nvidia’s scale and margins.
What independent benchmarks show, and what they don’t
Cerebras has submitted results to MLPerf, the industry-standard AI benchmark suite run by the MLCommons consortium, demonstrating competitive training throughput on certain models. Those submissions provide a degree of third-party validation that the architecture works as advertised for specific tasks.
But a full apples-to-apples comparison against a modern Nvidia GPU cluster of equivalent cost remains elusive. Nvidia’s latest data-center platform, the Blackwell-based GB200 NVL72, connects 72 GPUs into a single liquid-cooled rack with its own high-bandwidth interconnect fabric. Comparing a single WSE-3 wafer against a full NVL72 rack on the same model, at the same precision, with total cost of ownership factored in, is the benchmark AI procurement teams actually need. Neither company has published that head-to-head test.
Without it, the core economic question stays open: can one wafer-scale chip replace a multi-rack GPU cluster for a given training job at equal or lower total cost? Answering that requires three numbers Cerebras has not made fully public: the manufactured cost per wafer-scale chip, the typical usable core count after routing around defects, and real-world throughput on the largest frontier models. Until those figures surface, buyers are weighing architectural logic against Nvidia’s proven, if expensive, ecosystem.
A bet on physics against an empire of software
Cerebras’s wager is fundamentally a physics argument: data moves faster across silicon than across cables, so a bigger chip should train models faster. The logic is sound. Whether it translates into a business that can challenge Nvidia’s dominance depends on factors well beyond transistor counts: manufacturing costs, software maturity (Nvidia’s CUDA ecosystem has a decade-plus head start), customer willingness to adopt a single-vendor architecture, and the regulatory landscape around AI chip exports.
The confirmed specifications are genuinely remarkable. Four trillion transistors on a single working circuit, fabricated at 5nm, with nearly a million AI cores sharing on-chip memory, represents a feat of engineering that no other company has replicated. What remains unproven is whether that feat can scale from a technical achievement into a market force.
For now, Cerebras’s wafer-scale processors sit at an unusual intersection: validated enough to run real workloads at national laboratories and cloud data centers, but not yet benchmarked thoroughly enough for the broader AI industry to know exactly what they are worth. The next chapter depends less on cramming more transistors onto silicon and more on opening the books, publishing the numbers, and letting independent testers put the dinner-plate chip through its paces.
More from Morning Overview
*This article was researched with the help of AI, with human editors creating the final content.