Most AI data centers today run inference on a single type of chip, typically Nvidia GPUs. Intel and SambaNova Systems are betting that approach is about to hit a wall.
On April 8, the two companies unveiled a joint reference architecture that splits AI inference across three processor families, each assigned to the task it handles best. GPUs take the prefill stage, where a model digests an entire input prompt at once. SambaNova’s custom Reconfigurable Data Units (RDUs) handle decode, the sequential process of generating output one word at a time. And Intel’s Xeon 6 server CPUs manage the orchestration layer, routing the tool calls, database queries, and API requests that power AI agents.
The announcement also included a formal agreement: SambaNova will standardize on Xeon 6 as the host CPU shipped alongside its RDU accelerators, a commitment that locks out competing server processors from AMD and Arm-based vendors in SambaNova’s future deployments.
Why split inference across three chips?
The logic starts with how large language models actually work when they respond to a query. Prefill is computationally dense, requiring massive parallel matrix math across the full context window. GPUs were built for exactly that kind of workload. Decode is different: the model produces tokens one at a time, a sequential task where GPUs often sit partially idle because their parallel architecture is overkill for the job. SambaNova argues its RDUs are purpose-built for this phase, promising better hardware utilization and lower latency per token.
Then there is the agentic layer. AI agents do not just generate text. They call functions, look up records, chain multiple model calls together, and interact with external services. That orchestration work is classic CPU territory, and Intel is positioning Xeon 6 as the control plane that ties the system together.
The underlying pressure driving this design is cost. Training a large language model is a one-time expense, but serving it to millions of users racks up ongoing compute bills that can dwarf the original training budget. Agentic AI systems multiply the problem because a single user request can trigger dozens of inference calls, tool lookups, and API round-trips. Even modest per-request savings, compounded across millions of queries, could translate into significant reductions in operating costs.
The financial ties behind the blueprint
This is not a loose co-marketing arrangement. The technical collaboration sits on top of a deeper financial relationship. In February 2026, SambaNova closed a Series E funding round exceeding $350 million, with Intel Capital participating as a named investor. At the same time, SambaNova introduced its SN50 chip, which the company calls the fastest processor for agentic AI workloads.
SoftBank Corp. signed on as the first SN50 customer, with initial deployments in Japan targeting advanced AI workloads built on large language models and multi-step reasoning.
The heterogeneous inference blueprint arrived roughly six weeks later, suggesting the two companies developed the joint architecture in parallel with the funding and product announcements rather than bolting it on afterward. Intel’s capital investment and its technical collaboration appear to be parts of a single coordinated strategy aimed at the agentic AI market.
What the announcement does not include
For all the architectural ambition, several critical details are missing.
No published benchmarks. SambaNova and Intel describe the system as delivering the “most efficient inference solution,” but neither company has released latency measurements, throughput numbers, or energy efficiency comparisons against GPU-only inference clusters. Without reproducible data, enterprise buyers cannot evaluate the actual cost-benefit tradeoff.
No orchestration details. Splitting inference across three processor families requires sophisticated scheduling software that decides, in real time, which chip handles which part of a request. Neither company has published technical documentation, whitepapers, or API specifications for that layer. Poorly tuned scheduling can erase hardware-level gains through data movement overhead and idle resources, making the orchestration stack arguably the hardest engineering challenge in the entire design.
No pricing guidance. List prices, total cost of ownership estimates, and reference configurations are all absent. For procurement teams comparing this approach against existing GPU clusters, the economic case remains opaque.
Limited deployment evidence. SoftBank’s role as the first SN50 customer comes from SambaNova’s press release alone. No public statement from SoftBank describes production results, model types, or deployment scale. Whether SoftBank is running the full three-chip heterogeneous blueprint or standalone SN50 hardware is not specified in any available source.
The competitive question Nvidia will have to answer
The biggest variable hanging over this blueprint is how Nvidia responds. Nvidia dominates the inference accelerator market and has been expanding its own software stack to handle both prefill and decode efficiently on a single GPU architecture. If Nvidia can close the decode efficiency gap through software updates or next-generation hardware, the rationale for a multi-vendor heterogeneous approach weakens considerably.
Other players are also working the problem from different angles. AMD pairs its Instinct accelerators with EPYC server CPUs. Google runs inference on its own TPU hardware. Amazon Web Services has built custom Inferentia and Trainium chips specifically to reduce inference costs. SambaNova and Intel are entering a crowded field of companies trying to break Nvidia’s grip on AI compute, each with a different theory about the right hardware mix.
What sets the SambaNova-Intel approach apart is the explicit division of labor: three chip families, three pipeline stages, one integrated system. If the performance and cost claims hold up under independent testing, the architecture could offer a compelling alternative for enterprises running large-scale agentic AI. If they do not, it risks being remembered as an elegant diagram that never translated into a production advantage.
Where the evidence stands as of May 2026
Every confirmed fact in this story traces back to two SambaNova press releases distributed through Business Wire. These are primary sources from a party with a direct commercial interest in the narrative. The structural facts are solid: the processor role assignments, the Xeon 6 standardization deal, the $350 million funding round, Intel Capital’s investor status, and SoftBank’s early-customer role are all verifiable commitments. The performance and efficiency claims are not, and should be treated as hypotheses awaiting independent validation rather than established results.
For enterprise technology leaders watching this space, the blueprint is worth tracking but not yet worth building procurement plans around. The next milestones to watch for: published benchmarks from either company, independent testing from research groups or analyst firms, and public statements from SoftBank or other customers describing real-world results. Until those arrive, the SambaNova-Intel heterogeneous inference architecture remains a promising but unproven bet on the future shape of AI infrastructure.
More from Morning Overview
*This article was researched with the help of AI, with human editors creating the final content.