Amazon has placed its custom silicon strategy at the center of a broader effort to build and train AI models at lower cost, tapping senior vice president Peter DeSantis to lead the charge as the company’s AI chief. The bet on in-house chips is designed to reduce dependence on outside GPU suppliers and give AWS customers a cheaper path to training large generative AI models. With Trainium hardware now spanning two production generations and a third entering service, the initiative represents Amazon’s most direct challenge yet to the GPU-dominated economics of cloud AI.
DeSantis Takes the Reins on Cost-Driven AI
Amazon’s decision to name Peter DeSantis as its new AI czar signals that the company views chip economics, not just model quality, as the primary battleground in generative AI. Reporting from the Wall Street Journal describes how DeSantis, a longtime AWS infrastructure leader, is steering efforts to use Amazon’s proprietary Trainium processors to develop AI models more cheaply than rivals that depend heavily on third-party GPUs. That framing sets Amazon apart from competitors whose AI strategies revolve around securing ever-larger allocations of Nvidia hardware and passing those infrastructure costs through to customers.
The strategic logic is straightforward: if Amazon can train competitive models on chips it designs and manufactures through partners, it controls both the cost curve and the supply chain. That matters because, as the company disclosed in its 2025 Form 10-K, its business remains constrained by AI infrastructure supply, including GPUs, and it relies on a limited group of semiconductor suppliers. The filing warns that shortages or allocation changes could limit AWS capacity and increase costs, directly affecting customer workloads. Building an alternative silicon pipeline is therefore not just an engineering project; it is a hedge against the bottleneck that has slowed cloud AI expansion across the industry and a way to keep Amazon from being purely a price taker in the GPU market.
From Trainium2 to Trainium3: A Rapid Hardware Cadence
Amazon’s chip roadmap has moved quickly. AWS made Trainium2-based EC2 Trn2 instances generally available in late 2024, delivering concrete technical metrics around FP8 petaflops of compute, high-bandwidth HBM memory, and Elastic Fabric Adapter networking. The Trn2 offering was aimed squarely at large-scale training and inference, with instance sizes scaled up for multi-node clusters and tight integration into the Neuron software stack so customers could access the hardware through popular machine learning frameworks. In parallel, AWS introduced Trn2 UltraServers as a higher-density configuration, positioning both UltraServers and UltraClusters to support multi-trillion parameter models that previously would have required large GPU fleets.
The same announcement cycle also introduced Trainium3, with a separate AWS press release detailing how the Neuron software stack ties the hardware together, including framework integration and Neuron Kernel Interface (NKI) for low-level kernel access. The pace accelerated further when AWS announced general availability for Trainium3-based Trn3 UltraServers. These systems scale to 144 chips connected through NeuronSwitch fabric, and AWS published detailed per-chip and system-level specifications covering FP8 petaflops, HBM3e memory size, and memory bandwidth. AWS also made explicit performance and efficiency claims for Trn3 UltraServers relative to Trainium2, framing the newer generation as a significant step forward in price-performance for large-scale model training. That kind of rapid generational improvement is the core of the cost argument DeSantis is making: each chip revision closes the performance gap with top-tier GPUs while aiming to keep training bills lower for AWS customers.
Supply Chain Risk and the Anthropic Factor
Amazon’s SEC filing paints a candid picture of the supply pressures driving the in-house chip strategy. In the 2025 10-K, the company notes that its cloud operations are exposed to constraints in AI infrastructure supply and that it depends on a small number of semiconductor vendors for critical components. The risk language extends beyond generic component shortages and specifically highlights the potential impact of limited availability of advanced chips on AWS services. For a cloud provider whose largest customers increasingly treat AI capacity as a core utility, any inability to procure enough GPUs can translate directly into lost revenue, delayed projects, or the need to prioritize certain customers over others.
The same filing also describes Amazon’s strategic relationship with Anthropic, the AI safety company behind the Claude family of models, as part of a broader set of investments intended to deepen AWS’s role in generative AI. While the 10-K focuses on the structure and accounting treatment of the investment, the operational connection between Anthropic and Trainium is significant. Anthropic runs its workloads on AWS, and Amazon has a clear incentive to make its proprietary chips the most cost-effective platform for that traffic. If AWS can offer Anthropic, and by extension AWS customers who rely on Anthropic’s models, a cheaper training substrate than Nvidia GPUs, it strengthens both the commercial relationship and the broader AWS value proposition. The chip strategy and the model partnership reinforce each other: proprietary hardware lowers costs and improves capacity planning, while a flagship model partner validates the hardware in demanding production environments.
What This Means for Cloud AI Buyers
For enterprises evaluating where to run AI workloads, Amazon’s chip push changes the calculus in concrete ways. The Trn3 UltraServer architecture, with its 144-chip scaling through NeuronSwitch fabric and HBM3e memory, is designed to handle the largest training jobs that previously required expensive GPU clusters. If the performance and efficiency gains AWS claims over Trainium2 hold up under independent testing, organizations training custom models could see meaningful reductions in their cloud compute bills without sacrificing throughput. For customers already committed to AWS, the ability to access Trainium-based instances through familiar EC2 primitives and managed services also reduces the operational friction of experimenting with a new accelerator family.
There is a catch, though. Amazon’s Neuron software stack, while offering framework integration and low-level kernel access through NKI, is not as mature or widely adopted as Nvidia’s CUDA ecosystem. Developers who have spent years optimizing code for CUDA face real switching costs, from retooling training pipelines to validating numerical behavior on a new architecture. The lack of independent, third-party benchmarks for Trainium3 makes it difficult to verify AWS’s own performance claims, leaving early adopters to rely heavily on vendor-provided numbers and limited case studies. Amazon is effectively asking customers to trust its metrics and invest engineering time in a newer toolchain, a bet that will appeal most to cost-sensitive organizations with greenfield projects rather than those deeply entrenched in GPU-specific workflows.
Can Custom Silicon Reshape the AI Cost Curve?
Whether Amazon’s custom silicon strategy can materially reshape the AI cost curve will depend on execution across hardware, software, and ecosystem development. On the hardware side, the rapid cadence from Trainium2 to Trainium3 shows that AWS can iterate quickly and push performance per watt and per dollar in the right direction. If future Trainium generations continue to deliver sizable efficiency gains, AWS will be able to offer increasingly attractive price-performance tiers for training and inference, especially for customers willing to standardize on its accelerators. The ability to scale UltraServers to large clusters and integrate them into existing AWS networking and storage primitives also gives Amazon a path to support frontier-scale models without relying solely on third-party GPUs.
On the software and ecosystem side, the challenge is steeper. To fully capitalize on the hardware, AWS must make Neuron feel as seamless as CUDA for mainstream use cases, with robust framework support, debugging tools, and performance profiling. It also needs a critical mass of reference workloads, from open models to partner deployments like Anthropic, to demonstrate that Trainium can handle diverse architectures efficiently at scale. Over time, if customers see that Trainium-based instances consistently deliver lower training and inference costs for comparable performance, the economics could outweigh the inertia of existing GPU investments. In that scenario, DeSantis’s cost-first approach could give Amazon a durable competitive edge in cloud AI, turning custom silicon from a defensive hedge against supply risk into a primary driver of growth.
More from Morning Overview
*This article was researched with the help of AI, with human editors creating the final content.