Xiaomi has put a humanoid robot to work on an actual electric vehicle assembly line, where it ran autonomously for three consecutive hours installing fasteners in a die-casting workshop. The deployment marks one of the first confirmed cases of a humanoid robot completing sustained, production-speed tasks inside a real automotive factory, and it arrives as competition between Chinese tech firms and Tesla intensifies over who can bring AI-driven manufacturing labor to scale first.
Three Hours on the Factory Floor
The robot operated at a self-tapping nut installation workstation inside a Xiaomi EV factory, where it grasped components from an automated feeder and placed them onto workpieces on a moving line. According to CnEVPost, the machine achieved a 90.2% success rate for simultaneous dual-side installation, a metric that matters because the task requires coordinating both arms to seat fasteners on opposite sides of a die-cast part in a single motion. That success rate, while short of the near-perfect reliability human workers typically deliver, is notable for a humanoid form factor performing real production work rather than a controlled lab demonstration.
The robot also kept pace with the line’s fastest 76-second takt time, the interval between completed units that dictates overall factory throughput, according to Chinese tech coverage. Missing that beat would stall the entire production sequence. Sustaining it for three hours straight suggests the system can handle the thermal, mechanical, and computational demands of continuous operation without degrading, though Xiaomi has not disclosed how many units the robot processed during the trial or whether any human intervention occurred during the run.
The AI Model Behind the Robot
Powering the robot is Xiaomi-Robotics-0, a 4.7 billion-parameter vision-language-action model built by Xiaomi’s robotics team. The architecture pairs a vision-language module, which interprets what the robot sees and receives as instructions, with a Diffusion Transformer that generates sequences of physical movements called action chunks. The model was trained on a mix of robot trajectory data and vision-language samples, giving it both the perceptual understanding to identify parts on a moving line and the motor planning to manipulate them precisely.
A detailed description of the system appears in a technical preprint hosted on arXiv, which outlines how Xiaomi-Robotics-0 uses asynchronous execution to support real-time task completion. That paper reports that the model underwent both simulation benchmarks and real-robot task evaluations before the factory deployment, emphasizing the gap between virtual performance and live industrial conditions. Simulation can approximate physics and tolerances, but real production lines introduce vibration, temperature swings, and part variability that are difficult to capture fully, so demonstrating that the robot can meet takt time in a working die-casting workshop is materially more significant than clearing a laboratory benchmark.
Open-Source Strategy and the Role of arXiv
One unresolved question is whether Xiaomi-Robotics-0 is already fully open-sourced or merely slated for a more complete release later. The preprint characterizes it as an open-sourced vision-language-action model, and code and model artifacts are visible on both GitHub and Hugging Face. Yet the same document also references Xiaomi’s intention to open-source the model, leaving ambiguity about whether the current repositories contain the production-grade weights used on the factory floor or a research-oriented subset. For outside developers and rival robotics teams, that distinction is crucial: a complete 4.7 billion-parameter action model trained on real industrial data would be a powerful foundation for transfer learning and benchmarking, whereas a partial or heavily pruned release would mainly serve as a proof of concept.
The dissemination path Xiaomi chose also highlights how central arXiv’s platform has become to robotics and AI research. The service, which lists institutions such as Cornell among its member organizations, provides rapid, open dissemination of preprints long before formal journal publication. Its help resources, documented on a dedicated support page, and its reliance on community donations underpin a model where cutting-edge industrial work like Xiaomi’s can be scrutinized and replicated by academics and competitors alike. That same ecosystem also hosts other robotics-related studies, including earlier humanoid work cataloged in papers such as a recent submission, giving observers a broader context for comparing Xiaomi-Robotics-0 to alternative control architectures.
What a 90% Success Rate Actually Means
A 90.2% success rate sounds impressive in a research context but demands scrutiny on a production line. In automotive manufacturing, defect rates are typically measured in parts per million, not percentages, because even small deviations can propagate into costly recalls or safety issues. A roughly one-in-ten failure rate on dual-side nut installation would require human workers or secondary systems to catch and correct errors downstream, adding inspection steps, buffer inventory, and potential rework that could erode the labor savings the robot is meant to deliver. Xiaomi has not disclosed what constitutes a “failure” in this metric (whether it includes misalignments that can be corrected in place, complete misses that require manual intervention, or any deviation from torque and position tolerances), and it has not said whether the line automatically flags suspect parts for additional checks.
This gap between demonstration capability and full production readiness is where many humanoid robotics programs stall. Companies such as Boston Dynamics, Agility Robotics, and Figure AI have all shown bipedal machines performing warehouse or factory tasks in controlled settings, but none has publicly documented a humanoid sustaining autonomous work at full line speed for hours under real takt constraints with quantified success rates. Xiaomi’s three-hour run, even with its imperfect performance, represents a rare, data-backed example of a humanoid integrated into a live automotive process. The open question is how quickly that 90.2% can be pushed toward the “four nines” quality thresholds common in auto manufacturing, and what additional sensing, redundancy, or process redesign will be required to get there. Without published roadmaps, defect breakdowns, or plans for multi-shift operation, it remains a promising pilot rather than a proven replacement for human labor.
Competitive Pressure in AI-Driven Manufacturing
The factory trial lands at a moment when Chinese EV makers are aggressively automating to maintain cost advantages against both domestic rivals and Western competitors. Tesla has repeatedly signaled its ambition to use the Optimus humanoid robot in its own plants, but outside of staged demonstrations it has released little verifiable data on cycle times, uptime, or quality metrics. Xiaomi’s decision to document a real factory deployment, pair it with a formal preprint, and hint at open-sourcing its control model positions the company as both a manufacturing innovator and a participant in the broader research community. That dual posture could help it recruit robotics talent and attract ecosystem partners, especially if third parties can adapt Xiaomi-Robotics-0 to other industrial tasks or mobile platforms.
At the same time, the move underscores how intertwined advanced robotics and AI research have become with university-linked infrastructure. The preprint pipeline that brought Xiaomi-Robotics-0 into public view relies on institutions like Cornell University, which plays a central role in operating arXiv, to maintain open access and long-term archiving. As more corporations race to deploy humanoids on shop floors, the ones that share models, datasets, and rigorous evaluations through these channels may shape de facto standards for safety, benchmarking, and interoperability. Xiaomi’s three-hour humanoid trial is still a small slice of a single EV factory, but the way it has been documented (and potentially opened to others) suggests a competitive landscape where publishing and production are increasingly inseparable.
More from Morning Overview
*This article was researched with the help of AI, with human editors creating the final content.