The next wave of AI chips with 16-layer HBM4 memory enters volume production this quarter

The first generation of AI accelerators built around high bandwidth memory reshaped the data center. Now the next wave, built on 16-layer HBM4 stacks, is moving from engineering samples into volume production, promising another jump in compute density and power efficiency. As these parts ramp this quarter, they will test the capacity of global packaging lines and reset expectations for what a single server rack can do.

HBM has already become the defining component of modern AI hardware. With HBM4, memory vendors and chipmakers are simultaneously pushing capacity, bandwidth, and thermal limits, shifting the balance of power among countries and companies that can manufacture such complex devices at scale.

What happened

Leading memory manufacturers have begun volume production of 16-layer HBM4 stacks that pair directly with next-generation AI accelerators. These devices build on the HBM3E generation but increase the number of DRAM layers in a single stack to sixteen, raising per-package capacity and aggregate bandwidth while fitting into similar footprints on the interposer.

The shift reflects how AI workloads have turned memory from a supporting component into the main bottleneck. Industry data shows that demand for high bandwidth memory has grown far faster than for traditional DRAM, as training large language models and recommendation systems saturates existing memory channels. One analysis notes that the AI boom has made advanced memory chips the “star of the show” in data center builds, with governments such as Singapore investing in advanced packaging so local firms can assemble and test complex HBM-based modules for global customers, according to Singapore’s advanced packaging.

HBM4 continues the trend of stacking more DRAM dies vertically and wiring them together through through-silicon vias. The 16-layer configuration increases the number of vertical channels available to the GPU or accelerator die, which in turn raises peak bandwidth without requiring a wider external bus. Packaging these stacks on a silicon interposer alongside very large compute dies has become a central focus for foundries and outsourced semiconductor assembly and test providers.

Industry reporting describes a tight race among memory makers to supply these advanced stacks. Capacity is constrained both by the difficulty of producing defect-free 16-layer stacks and by the limited number of facilities that can handle the thermal and mechanical stresses of such tall structures. As one detailed account of the race to supply explains, HBM production now hinges as much on packaging and test as on the DRAM fabrication itself.

The move to volume production this quarter means that major GPU vendors can finally lock in bill-of-materials for their next accelerator generations. Long-term supply agreements, often involving prepayments and capacity reservations, have been signed to secure HBM4 output for hyperscale cloud providers and leading AI labs.

Why it matters

The arrival of 16-layer HBM4 in volume is not just a specification bump. It reshapes the economics of AI training and inference and changes which regions can participate in the highest value parts of the semiconductor supply chain.

On the technical side, HBM4 increases both capacity and bandwidth per accelerator. A single package can now host significantly more gigabytes of memory than HBM3E while driving higher transfer rates. That combination allows model developers to train larger models on fewer GPUs, or to run more concurrent inference streams per accelerator without hitting memory limits. For operators of clusters that already run tens of thousands of accelerators, even modest percentage gains in utilization translate into large absolute cost savings.

HBM4 also helps address energy efficiency. AI data centers are increasingly constrained by power delivery and cooling rather than floor space. By stacking more memory vertically and keeping it close to the compute die, HBM reduces the energy per bit transferred compared with traditional DDR modules on a motherboard. Moving to 16-layer stacks further increases the amount of memory that can be served within a given power envelope, which is critical as cities and utilities scrutinize the power draw of new AI campuses.

Economically, the shift concentrates value in a small set of companies that can produce and package HBM4. The production of 16-layer stacks requires advanced lithography, high aspect ratio etching for through-silicon vias, and precise wafer thinning. On top of that, advanced packaging lines must handle 2.5D interposers, micro-bump bonding, and complex thermal management. Governments have responded with targeted incentives for such facilities. Singapore, for instance, has positioned itself as a hub for advanced packaging and test, with its economic development agency highlighting how local firms are moving into high value activities such as HBM-based AI modules and chiplet integration, as described in the AI-related manufacturing push.

Control over HBM4 capacity also has strategic implications. Countries that host leading memory fabs and packaging plants gain leverage in the AI supply chain, similar to the way advanced logic foundries influence global technology flows. Export controls on high performance accelerators already hinge partly on their memory configurations, and future rules are likely to pay close attention to HBM4 availability and performance thresholds.

For investors, the ramp of 16-layer HBM4 is a clear signal of where margins are shifting. Financial analysis of leading memory vendors has highlighted how AI-related products carry higher average selling prices and more stable long-term contracts than commodity PC or smartphone memory. One briefing on a major supplier’s strategy points out that AI memory, including HBM, has become a central driver of its revenue outlook, with capital expenditure tilting toward facilities that can support stacked DRAM and advanced packaging, according to recent investor commentary.

At the same time, the complexity of 16-layer HBM4 raises supply risk. Yield losses in any part of the stack, from DRAM wafers to interposers to final module test, can ripple through to GPU shipments and data center buildouts. That fragility gives large cloud providers a strong incentive to diversify suppliers and to support new entrants that can meet the technical bar.

What to watch next

As HBM4-based accelerators ship in volume, several fault lines will determine how transformative this generation becomes.

First, watch whether memory supply keeps pace with GPU demand. Previous cycles saw accelerators constrained not by compute die production but by HBM availability. With 16-layer stacks, any shortfall will be magnified because each GPU consumes more HBM capacity. The industry is already expanding cleanroom space for both DRAM and packaging, yet the competition for advanced suggests that allocation decisions will remain contentious, especially between hyperscalers and smaller cloud providers.

Second, the geography of advanced packaging will evolve. Regions that have invested in 2.5D and 3D integration capacity are likely to capture a larger share of the value chain as HBM4 spreads. Singapore’s focus on becoming a site for AI module assembly and test is one example, and similar moves are underway in other Asian and European hubs that see HBM packaging as a way to climb up from commodity electronics into higher margin segments.

Third, the industry will test new architectural ideas that take advantage of HBM4’s characteristics. GPU makers are already exploring tighter coupling between compute and memory, including chiplet-based designs where multiple compute tiles share large pools of stacked memory on a single interposer. Software stacks will need to adapt as well, with frameworks like PyTorch and JAX exposing more control over memory placement to help developers exploit the higher bandwidth and capacity.

Thermal and reliability challenges will also be in focus. Sixteen-layer stacks generate significant heat and place mechanical stress on interposers and substrates. Data center operators will scrutinize failure rates, especially for deployments in hotter climates or facilities that rely on air cooling. Any systemic reliability issues could slow the adoption curve or push vendors to introduce intermediate configurations with fewer layers.

Finally, policymakers will watch how HBM4 shapes the AI race among nations. Export controls that target specific accelerator models might need to be updated as new parts with 16-layer memory appear. Countries that lack domestic HBM capability may seek partnerships or incentives to attract at least the packaging stages of the supply chain, both to secure access and to capture some of the economic upside.

More from Morning Overview

*This article was researched with the help of AI, with human editors creating the final content.

IG

FB

PIN

LI

X

Global Font

The next wave of AI chips with 16-layer HBM4 memory enters volume production this quarter

What happened

Why it matters

What to watch next

Cassian Holt

Author

Card skimmers at the gas pump are draining more than $1 billion a year from drivers

The world’s oldest beer was brewed 13,000 years ago, before farming began

China’s new satellites could track every US warship, threatening naval stealth

A heartburn pill millions take was tied to a 33% higher dementia risk

A stone circle in the Sahara was tracking the stars thousands of years before Stonehenge

More in Hardware and Semiconductors

Hardware and Semiconductors

A humanoid home robot named NEO opened preorders, with first deliveries planned for 2026

Hardware and Semiconductors

Nvidia says a new chip will run advanced AI on your laptop without touching the cloud.

Hardware and Semiconductors

Phones are hitting 7,000 mAh with new silicon-carbon batteries and 15-minute charging

Hardware and Semiconductors

Google says its Willow chip ran an algorithm 13,000 times faster than a supercomputer

Hardware and Semiconductors

Nvidia revealed its first consumer PC chip in over a decade, the 20-core N1

Hardware and Semiconductors

Nvidia is jumping into Windows laptops with a new chip aimed squarely at Intel and AMD.

Hardware and Semiconductors

Nvidia’s confidential-compute hardware will encrypt Apple users’ AI data on Google servers.

Hardware and Semiconductors

Nvidia’s new Apple-silicon rival already trails Apple by about two years, a teardown found.

IG

FB

PIN

LI

X

IG

FB

PIN

LI

X

The next wave of AI chips with 16-layer HBM4 memory enters volume production this quarter

What happened

Why it matters

What to watch next

Author

Get weekly updates with the latest news and tips!

More in Hardware and Semiconductors

IG

FB

PIN

LI

X