When DeepSeek released its R1 reasoning model in January 2025, the Chinese AI startup triggered a $1 trillion sell-off in U.S. tech stocks and forced Silicon Valley to confront an uncomfortable question: Could a lean, low-cost Chinese lab match America’s best? Fifteen months later, the answer appears to be no. DeepSeek’s long-awaited V4 preview, rolled out in April 2026 with what the Associated Press reported were Huawei-manufactured chips rather than the Nvidia hardware that powers most leading American labs, has failed to close the gap with top U.S. systems by multiple measures, according to federal evaluators, independent researchers, and financial analysts.
What federal testing revealed
The most concrete evidence comes from the U.S. Commerce Department’s National Institute of Standards and Technology. In September 2025, NIST’s Consortium for AI Safety and Integrity (CAISI) published a formal evaluation of DeepSeek’s models, testing them against U.S.-developed counterparts on software engineering, cybersecurity, and resistance to adversarial attacks.
The results were stark. DeepSeek’s systems underperformed on complex coding tasks, produced higher error rates in security-critical scenarios, and proved substantially more vulnerable to prompt-based jailbreaks than leading American models. The full CAISI report details the benchmark design, scoring rules, and test suites behind those conclusions, giving outside experts enough information to scrutinize the methodology. The evaluation included statements from the U.S. Commerce Secretary and quantified the performance gap across software tasks, cost comparisons, and jailbreak frequency.
“The results were not close on the security side,” said one researcher involved in federal AI testing who spoke on condition of anonymity because they were not authorized to discuss the findings publicly. “The jailbreak vulnerability rates alone should give any enterprise buyer pause.”
That evaluation predates the V4 preview by several months, meaning it tested earlier DeepSeek models rather than the newest release. But a Bloomberg analysis published April 24, 2026, reached a similar conclusion about V4 specifically: the new model did not narrow the American advantage, particularly in enterprise-critical tasks like secure code generation and automated vulnerability discovery. The Associated Press confirmed that DeepSeek positioned V4 as a direct challenge to U.S. competitors, emphasizing its lower price and reliance on a domestic Chinese supply chain.
The broader scoreboard
Stanford’s Human-Centered Artificial Intelligence institute adds a wider lens. Its AI Index Report, which tracks global AI capabilities, investment, and infrastructure on an annual basis, supports the case for overall U.S. leadership while identifying specific areas where China is gaining ground. Chinese researchers are approaching parity in the volume of AI-related publications and patents, and Beijing has expanded AI deployment in government services. But the most influential foundation models and the largest-scale training runs remain concentrated in American institutions, including labs like OpenAI, Anthropic, Google DeepMind, and Meta AI.
Comparing the 2025 and 2026 editions of the Index makes the gap measurable rather than anecdotal. Year over year, the data shows that while China has expanded its AI research footprint significantly, the frontier models shaping advanced commercial and defense applications still skew heavily toward U.S. labs. Metrics on compute concentration, venture funding, and cross-border talent flows help explain why: American companies continue to attract disproportionate investment and engineering talent, even as rivals like DeepSeek draw global attention.
On the policy side, the White House released its AI Action Plan in July 2025, explicitly framing U.S. AI dominance as a strategic objective. The plan outlined pillars centered on innovation, infrastructure, and security, and backed them with investments in domestic computing capacity, tightened export controls on high-end chips, and incentives for safety research. The message was direct: maintaining a technology edge over China is a matter of national and economic security.
What we still don’t know
Several important questions remain unanswered. DeepSeek has not published detailed benchmark methodologies for V4, and no primary technical disclosures from the company or Chinese regulators have surfaced. The performance claims against U.S. models rely on NIST’s evaluation framework and Bloomberg’s reporting rather than on head-to-head testing that DeepSeek has endorsed or contested. Without a response from DeepSeek’s researchers explaining how they tuned V4 and what trade-offs they prioritized, the picture is necessarily incomplete.
Adoption data is another blind spot. Neither the CAISI report nor the AI Index provides granular metrics on how many organizations have integrated DeepSeek into production systems, what sectors they operate in, or how those deployments perform over time. That gap matters because benchmark scores and real-world impact are different things: a weaker model can still gain market share if it costs less, integrates more easily, or faces fewer regulatory hurdles.
The Huawei chip question looms over everything. According to AP reporting, DeepSeek built V4 on domestically produced Huawei processors rather than Nvidia GPUs, partly in response to U.S. export controls that restrict China’s access to the most advanced semiconductors. DeepSeek itself has not confirmed the hardware specifications. Whether those export controls are effectively constraining DeepSeek’s next generation of models, or whether Huawei’s chip production can compensate, is a live debate among analysts. Without transparent performance data on those chips, it is difficult to determine whether hardware is the primary bottleneck or whether algorithmic and data advantages still favor American labs regardless.
And the biggest question of all: Does V4’s underwhelming debut reflect a temporary stumble in one product cycle, or a structural ceiling imposed by hardware limitations, regulatory constraints, and ecosystem differences? Current evidence cannot definitively answer that.
How the evidence stacks up for V4 skeptics and believers
Not all of the evidence behind this story carries equal weight, and readers should know the difference.
The NIST CAISI evaluation is primary-source material. A federal agency applied its own testing standards and published quantified results. NIST sets measurement benchmarks across U.S. industry and government, and its findings on jailbreak rates and cybersecurity task performance are the most concrete data points available, even though they cover only a slice of DeepSeek’s potential applications and predate V4.
Bloomberg and the Associated Press sit one step removed. They confirm the V4 rollout timeline, the reported Huawei chip connection, and the broad competitive framing, but their assessments of whether DeepSeek “failed” to close the gap rely partly on benchmark comparisons and expert commentary that readers cannot independently verify without access to the underlying test data. Bloomberg’s conclusion aligns with NIST’s findings but adds an analytical layer shaped by market context, investor expectations, and sourcing typical of financial journalism. Valuable, but analysis rather than experimental results.
Stanford’s AI Index occupies a middle ground: an institutional research product with named methodology and annual tracking, more transparent than a single news report but less granular than a federal evaluation focused on specific models. It is most useful for establishing trend lines across years rather than rendering a verdict on a single product launch.
Taken together, the evidence supports a clear but nuanced reading. The United States still leads in cutting-edge AI capability and safety performance. China is advancing rapidly but has yet to produce a model that matches or surpasses the top American systems in high-stakes domains. Until DeepSeek publishes its own technical disclosures, or independent labs outside the U.S. replicate NIST-style evaluations of V4, claims of a sudden shift in AI power should be treated with healthy skepticism.
More from Morning Overview
*This article was researched with the help of AI, with human editors creating the final content.