In January 2025, a Hangzhou-based AI lab called DeepSeek dropped a reasoning model that, by its own benchmarks, went toe-to-toe with OpenAI’s o1 on math, coding, and logic tasks. The kicker: DeepSeek-R1’s API pricing started at roughly $0.55 per million input tokens, while OpenAI’s o1 was charging $15 for the same volume. Within weeks, OpenAI, Google, and other major providers began trimming their own API rates. By mid-2025, the ripple effects are still reshaping how companies budget for AI inference.
The release did not just undercut prices. It challenged a core assumption in the industry: that only the best-funded Western labs could produce top-tier reasoning systems. DeepSeek did it with open weights, meaning anyone with the hardware can download and run the model without a licensing fee.
What the technical papers actually show
Two peer-reviewable documents, both hosted on Cornell’s arXiv preprint service, form the backbone of DeepSeek’s claims. The first, “DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning,” describes how the team used reinforcement learning to sharpen chain-of-thought reasoning. Put simply, the model learns to “think out loud” through multi-step problems rather than jumping to an answer. The R1 paper reports scores on widely used benchmarks: 79.8% on AIME 2024 (a competitive math exam), 97.3% on MATH-500, and performance on Codeforces coding challenges that the authors say is comparable to OpenAI’s o1.
The second document, the DeepSeek-V3 technical report, explains the architecture that makes this efficiency possible. V3 uses a mixture-of-experts (MoE) design, which activates only a fraction of the model’s total parameters for any given query. Think of it as a building where only the relevant offices turn their lights on when you walk in, rather than powering every floor at once. The report states the model was trained on 2,048 NVIDIA H800 GPUs, the export-compliant variant of the H100 chip, at a reported cost of approximately $5.6 million. For context, training runs for frontier models at U.S. labs have been estimated in the tens or hundreds of millions of dollars, though direct comparisons are difficult because compute accounting methods vary.
Distilled variants of R1 were built on top of existing open architectures from Qwen and Meta’s Llama family, which means the weights are freely downloadable. That openness is the engine behind the cost disruption: when third-party cloud providers can host the model without paying a license, competition drives per-token prices down fast.
Where the claims hold up, and where they get shaky
The benchmark numbers are self-reported, which always warrants caution. But they have not gone untested. Community-run evaluations on platforms like LMSYS Chatbot Arena, where users blind-test models head-to-head, showed DeepSeek-R1 performing competitively with top proprietary systems in reasoning-heavy conversations through early 2025. Multiple independent developers have reproduced strong results on math and code tasks using the public weights. That is more corroboration than most open-weight releases receive in their first months.
The cost comparison is murkier. DeepSeek’s API pricing is verifiably low, but “a third of the cost” oversimplifies the picture. On raw per-token rates, DeepSeek-R1 is not just cheaper than OpenAI’s o1; it is dramatically cheaper, sometimes by a factor of 20 or more for input tokens. But inference cost in production depends on hardware setup, batch sizes, quantization choices, and traffic patterns. A startup running occasional queries will see different economics than an enterprise processing millions of requests per day. No audited cost ledger from OpenAI, Anthropic, or Google exists that would allow a perfectly controlled comparison.
The price-cut response from rivals, however, is a matter of public record. OpenAI reduced pricing on several API tiers in the weeks following DeepSeek’s release and accelerated the rollout of more affordable reasoning model options. Google adjusted Gemini API rates. Smaller providers moved even faster. Attributing every price cut solely to DeepSeek would be an overreach, since efficiency gains across the industry had been building pressure for months, but the timing was hard to ignore.
The hardware question and the geopolitical backdrop
DeepSeek trained its models under U.S. chip-export restrictions that bar Chinese entities from purchasing NVIDIA’s most advanced GPUs, including the H100 and its successors. The V3 report specifies training on H800 chips, which were designed to comply with earlier export rules but have since faced tighter controls. Whether DeepSeek can sustain its efficiency edge as models scale further, without access to next-generation hardware, is one of the biggest unresolved questions in the industry.
There is also a content moderation dimension that Western coverage has sometimes glossed over. Independent testers have found that DeepSeek’s models decline to engage with certain politically sensitive topics, particularly those related to Chinese government policy, Taiwan, and Tiananmen Square. For organizations that need unrestricted output across all subject areas, this is a practical limitation that sits outside any benchmark score.
What this means for developers and companies right now
For teams already running inference workloads on proprietary APIs, the practical next step is straightforward: download the open weights, run them against a representative sample of production queries, and compare output quality, latency, and cost to the current provider. Because the weights are free, the only expense is compute time for the test.
Organizations with strict compliance or safety requirements should also audit the model’s alignment behavior. Open-weight models ship without the guardrails, red-teaming, and monitoring layers that closed providers build on top. That is not a reason to avoid them, but it is a reason to budget for your own safety testing.
The structural signal matters more than any single model release. Each time a capable system ships with downloadable weights, the ceiling on what proprietary labs can charge for API access drops. Meta’s Llama series started this pattern. Mistral’s open releases accelerated it. DeepSeek-R1 pushed it into the highest-value segment of the market: complex reasoning tasks that enterprises will pay a premium for. As of mid-2025, that premium is shrinking.
Replication will settle the debate
The durability of DeepSeek’s impact depends on what happens when more independent teams stress-test the public weights on realistic workloads, not just curated benchmarks. If those results hold, the case strengthens that open-weight models can genuinely compete with the best proprietary systems in production, not just on leaderboards. Cloud providers and startups would have strong incentive to standardize on open models for reasoning-heavy applications, and the pricing pressure on closed labs would become permanent.
If replication efforts consistently fall short, or if large-scale deployments expose reliability and safety gaps, the narrative shifts. DeepSeek’s release would still matter as a market signal, but closed providers could argue that their higher prices reflect the cost of hardening systems for real-world use: red-teaming, monitoring, rapid patching, and the kind of reliability guarantees that a set of downloadable weights cannot offer on its own.
Regulators are watching, too. Open weights combined with strong reasoning capability and low serving cost raise familiar dual-use questions. The same reinforcement-learning techniques that improve math performance could, in principle, be applied to more sensitive domains. The tension between open science and misuse prevention is not new, but it gets sharper every time a more capable model becomes freely available.
For now, the most honest read is this: DeepSeek’s technical documents are detailed enough to take seriously, and early independent results are encouraging. But the benchmarks are not yet settled fact, and the cost story is more nuanced than any single ratio can capture. The organizations that will benefit most are the ones running their own experiments right now, gathering firsthand data instead of waiting for someone else’s verdict. Across thousands of those experiments, the real answer will emerge.
More from Morning Overview
*This article was researched with the help of AI, with human editors creating the final content.