Robin Li, the co-founder and CEO of Baidu, used his keynote at the company’s Create 2026 conference to make a blunt declaration: the race to build bigger AI models is over, and the race to build smarter AI agents has begun.
Li told developers that the industry should stop measuring progress by token consumption, the volume of text a large language model processes, and start tracking what he called Daily Active Agents (DAA), a count of how many autonomous software agents are completing real tasks for real users every day. In Li’s framing, tokens represent cost; agents represent value.
“The AI industry has entered the agent era,” Baidu said in a formal statement distributed through PR Newswire alongside the keynote. The company described an “AI evolution theory” in which agents graduate from passive responders, systems that only answer when prompted, to active participants that initiate tasks, cross-check their own results, and refine their performance without waiting for a human to intervene.
What Baidu actually announced
The keynote was not just philosophical. Baidu rolled out an expanded portfolio of agent products and pointed to a concrete benchmark result: its Famou-Agent 2.0 system sat atop the MLE-Bench leaderboard as of the time of the Create 2026 announcement in May 2026, though leaderboard rankings can change as new systems are submitted. MLE-Bench is a third-party evaluation platform that tests autonomous systems on realistic machine learning engineering tasks such as model building and evaluation. It publishes system names, run dates, and scores publicly, giving it more transparency than most internal company claims.
A separate arXiv preprint examining a different agent system, AIBuildAI, confirms that MLE-Bench is actively used in the research community to evaluate autonomous agents on practical ML engineering work. That lends credibility to the benchmark as a meaningful yardstick rather than a vanity metric.
Reporting from Caixin Global described Li’s keynote as positioning agents as the next frontier in AI execution, while Chosun Biz characterized the rollout as a deliberate attempt to shift the competitive frame from “who can build a smarter model” to “who can build agents that actually get things done.” Both accounts align with Baidu’s own language about embracing the agent era, suggesting the DAA concept is guiding internal product planning, not just external marketing.
Why this matters beyond Baidu
For anyone unfamiliar with the term, an AI agent is software that does not just answer questions but takes actions: booking flights, writing and debugging code, managing supply chains, or filing regulatory paperwork. Unlike a chatbot that waits for a prompt, an agent can break a goal into steps, use external tools, and check whether its output is correct before delivering it.
Li is not the only executive making this argument. Google, Microsoft, and OpenAI have all invested heavily in agent capabilities throughout 2025 and into 2026. OpenAI’s Operator and Google’s Project Mariner represent direct bets on the same thesis: that the next layer of AI value sits in execution, not raw intelligence. What distinguishes Li’s pitch is the explicit proposal of a new industry metric and the willingness to frame Baidu’s entire strategy around it.
The DAA framing also carries specific business logic. Token consumption has become the default billing and benchmarking unit across cloud AI providers. By redefining success around agents completing tasks, Li is signaling that Baidu sees its competitive advantage in the execution layer, the software between a foundation model and a user’s actual workflow, rather than in the model itself. For Baidu, which faces intensifying competition in China from ByteDance, Alibaba, and upstarts like DeepSeek, repositioning the scoreboard is both a strategic move and a survival play.
What is still missing
Li’s vision is ambitious, but several critical details remain unresolved.
No full transcript or video of the Create 2026 keynote has been made publicly available, so the precise scope of his “AI evolution theory” rests on Baidu’s press release and secondary accounts. The headline claim that agents will “learn, verify, and optimize on their own” reflects Li’s stated vision as conveyed through Baidu’s corporate materials, not a finding from any independent technical evaluation. The exact mechanism by which agents would achieve this has not been detailed in any published technical paper. Baidu has not released a peer-reviewed study describing how Famou-Agent 2.0 achieves its MLE-Bench results or how its self-improvement loop works in practice.
The DAA metric itself is undefined in public documentation. Baidu has not disclosed how it counts a Daily Active Agent: whether that means a unique agent instance completing at least one task per day, a unique user session involving an agent, or something else. No independent auditor has verified Baidu’s internal DAA numbers, and no rival company has adopted the metric, making cross-company comparisons impossible for now.
Topping MLE-Bench is a concrete achievement, but benchmark leadership does not automatically translate into reliable real-world performance. Benchmark evaluations test specific task categories under controlled conditions. Systems can be tuned to excel on benchmark distributions without exhibiting the same robustness in production environments, where data is noisier, objectives are less clearly specified, and users behave unpredictably.
There is also the question of safety and oversight. Baidu’s public materials emphasize that agents will verify their own outputs but do not spell out what happens when verification fails, who approves high-stakes actions, or how often human review is required. In regulated sectors such as finance and healthcare, those details will likely determine whether customers can deploy autonomous agents at all.
What Baidu’s bet actually proves and what it does not
The strongest evidence here falls into two categories: Baidu’s on-the-record corporate statement, which carries the weight of a public company’s disclosure obligations, and the MLE-Bench leaderboard, which is independently hosted and publicly accessible. Li said what Baidu says he said, and Famou-Agent 2.0 scored what the leaderboard says it scored.
Secondary reporting from Caixin Global and Chosun Biz adds editorial accountability by confirming that Li delivered these remarks at a specific event and that the company launched corresponding products. The overlap between Baidu’s claims and the journalists’ accounts reduces the risk that the press release overstates what Li actually said on stage.
What is missing is the technical middle layer. Baidu has made a strategic claim (agents are the future), a metric claim (DAA should replace token consumption), and a benchmark claim (Famou-Agent 2.0 leads MLE-Bench). Each is verifiable at the level of words and scores, but none comes with the implementation detail that would allow outside experts to replicate, stress-test, or meaningfully compare Baidu’s systems to competitors’.
For businesses and developers watching this space, the prudent read is straightforward: Baidu is making a credible, well-resourced bet on agents, backed by at least one strong benchmark result. That is a meaningful signal of where the industry is heading. It is not yet proof that autonomous agents are ready to safely run complex workflows without close human supervision, or that DAA will become the standard the rest of the industry rallies around. The vision is clear. The receipts are still coming in.
More from Morning Overview
*This article was researched with the help of AI, with human editors creating the final content.