A coalition of researchers from Stanford, Carnegie Mellon, the University of Pennsylvania, and MIT has demonstrated monolithic three-dimensional chip stacking at a commercial U.S. foundry, integrating silicon CMOS logic, resistive memory, and carbon-nanotube transistors in a single sequential build. Separately, two peer-reviewed studies have shown that two-dimensional materials can be laminated into ten stacked circuit tiers with vertical wiring densities of 62,500 input/output connections per square millimeter. Together, these results suggest that building active device layers directly on top of one another, rather than bonding finished chips face to face, could multiply the computing power available inside a fixed chip footprint at a time when AI hardware is pressing against power-delivery and floor-space limits.
Why vertical density gains matter for AI hardware right now
Modern AI accelerators already use chiplet-based packaging, where separately manufactured dies are connected through microbumps or hybrid bonding. Those techniques place chips side by side or stack them with relatively coarse vertical links. The pitch of hybrid-bonding pads has dropped below 10 micrometers in leading-edge production, but each bonded interface still occupies area that could hold transistors. Monolithic 3D, or M3D, sidesteps that tradeoff by fabricating new device layers sequentially on the same wafer, enabling far tighter vertical connections and freeing lateral area for additional logic or memory.
The practical question is whether M3D can stay within the thermal limits that protect lower layers from damage. The IEDM 2025 demonstration at SkyWater’s foundry kept its full heterogeneous stack, including Si CMOS, RRAM, and carbon-nanotube FET layers, under a thermal budget of roughly 415 degrees Celsius. That figure sits close to the threshold many process engineers consider safe for back-end-of-line processing on existing nodes. If future flows can hold temperatures at or below 400 degrees Celsius while using atomically thin channel materials, M3D stacks built on 2D-material tiers could realistically deliver at least three times the effective logic density of hybrid-bonded alternatives at the 2 nm node within a handful of tape-out cycles. That hypothesis, however, depends on solving thermal-resistance, yield, and design-tool challenges that no group has yet addressed in full.
For AI accelerators, those density gains are not just about squeezing more arithmetic units onto a die. Training large models is increasingly limited by memory bandwidth and the energy cost of moving data between compute cores and off-chip DRAM. By placing memory cells directly above logic and connecting them with dense vertical vias, M3D offers a way to shorten data paths and reduce the number of off-chip transactions. That, in turn, could lower the total energy per inference or training step, even if the underlying transistors do not switch faster than their planar counterparts.
Ten tiers and 62,500 connections per square millimeter
Two Nature-family papers provide the strongest quantitative evidence for the density claim. A study in Nature demonstrated tier-by-tier lamination via van der Waals bonding, with cross-sectional imaging confirming ten stacked circuit tiers. Van der Waals lamination exploits the weak interlayer bonds of two-dimensional materials such as molybdenum disulfide to transfer pre-grown device films onto a target wafer without the high temperatures that conventional epitaxial growth demands. Because the layers are atomically thin, the total stack height remains modest even after ten tiers, leaving room for additional wiring and heat-spreading structures.
A companion paper in Nature Electronics reported an interconnect density of 62,500 I/O per square millimeter in an M3D stack that combined graphene-based chemical sensors with MoS2 memtransistor circuits across different tiers. That wiring density is orders of magnitude higher than what microbump packaging achieves and well above current hybrid-bonding pitches. The heterogeneous mix of sensing and computing functions across tiers also illustrates a design freedom that conventional 2D scaling cannot easily replicate: engineers can assign specialized materials to each layer, optimizing each tier for a distinct task.
In practical terms, 62,500 vertical connections per square millimeter means that even a small region of a chip can host thousands of high-bandwidth links between logic and memory or between different functional blocks. For AI workloads that shuffle activations and gradients across many cores, such dense vertical wiring could cut communication delays and reduce the need for wide, power-hungry on-chip buses. It also opens the door to embedding sensor arrays directly above processing elements, enabling local preprocessing of analog signals before they reach digital accelerators.
A separate peer-reviewed comparison study in IEEE Transactions on VLSI Systems evaluated performance, power, and area tradeoffs across microbumping, hybrid bonding, and monolithic 3D using commercial-grade design assumptions. That analysis found M3D reduced latency and parasitic losses relative to the bonded alternatives, reinforcing the argument that sequential fabrication offers electrical advantages beyond raw density. Shorter interconnects between tiers translate into lower capacitance and resistance, which can improve both speed and energy efficiency for memory accesses and inter-core messaging.
Gaps in reliability, cost, and thermal data
The published record leaves several questions open. No long-term reliability data, such as electromigration or bias-temperature instability results, have appeared for the SkyWater monolithic stack beyond initial IEDM measurements. Chip designers evaluating M3D for production will need those numbers before committing to a process that places active transistors directly above other active transistors, where localized heating can accelerate wear-out mechanisms. Without multi-year stress tests, it is difficult to know whether the stacked devices will meet the lifetime requirements of data-center deployments.
Thermal-resistance measurements and hotspot maps for the ten-tier 2D-material stack under sustained AI-class workloads have not been published either. A ten-layer structure with atomically thin channels may conduct heat very differently from bulk silicon, and the absence of thermal characterization under realistic power densities is a significant unknown. Vertical temperature gradients could cause performance variation between tiers or force derating of upper layers to protect the base logic. Advanced cooling techniques, such as microfluidic channels or integrated heat spreaders between tiers, have been proposed in other contexts but have not yet been demonstrated in conjunction with these specific M3D flows.
Cost data is equally scarce: no public wafer-cost or yield figures from the foundry run have surfaced, leaving economic viability estimates to secondary analysis. Sequential stacking multiplies the number of processing steps per wafer, and any defect in an upper tier can potentially spoil the value of all the layers beneath it. That risk may be offset by the ability to fabricate smaller base dies and compensate with vertical scaling, but the balance will depend on defect densities and process control that are not yet documented. For AI chip vendors operating on tight cost-per-operation targets, these unknowns make it hard to compare M3D against more mature options like advanced chiplets and 3D packaging.
A direct comparison of interconnect yield at 62,500 I/O per square millimeter versus state-of-the-art hybrid bonding has not been reported in public literature. At such fine pitches, even minor alignment errors or particle contamination could significantly reduce usable connections. Researchers have shown that van der Waals interfaces can be remarkably clean at the laboratory scale, but scaling those techniques to high-volume manufacturing will require robust metrology and repair strategies. Redundant vias, error-correcting codes for inter-tier links, and design-for-test features tailored to stacked devices are all likely to play a role, yet detailed methodologies remain to be published.
From lab demonstrations to deployable AI stacks
Despite the gaps, the trajectory of the recent demonstrations is clear. The SkyWater work shows that a commercial foundry can integrate heterogeneous device types in a monolithic stack without exceeding back-end thermal budgets, while the 2D-material experiments push tier counts and interconnect densities to levels that conventional packaging cannot match. For AI hardware teams, the near-term path may involve hybrid approaches: using M3D to stack memory or specialized accelerators above a conventional logic base die, then combining those vertical modules with chiplets and advanced packaging at the system level.
Realizing that vision will require coordinated progress in process technology, electronic design automation, and system architecture. Design tools must learn to treat the vertical dimension as a first-class resource, co-optimizing floorplans, power delivery, and thermal distribution across tiers. Architects will need to rethink how they partition neural networks and data structures when memory and compute can be colocated in three dimensions. And foundries will have to provide standardized process design kits that capture the quirks of stacked devices, from inter-tier via limitations to layer-specific reliability rules.
The recent monolithic 3D milestones do not guarantee an easy path to mass-produced AI chips, but they do expand the feasible design space at a moment when traditional scaling is slowing. If researchers can close the reliability, thermal, and cost gaps, vertically integrated logic and memory could become a central tool for pushing AI performance within the power and footprint constraints of future data centers and edge devices.
More from Morning Overview
*This article was researched with the help of AI, with human editors creating the final content.