Morning Overview

Boston Dynamics and DeepMind add reasoning skills to Spot robot

Boston Dynamics’ Spot robot can already trot across rubble, open doors, and inspect industrial sites. Now, a research effort involving Google DeepMind is exploring whether the quadruped can learn to think through problems before it moves a single leg.

In a technical paper published on arXiv, DeepMind researchers describe Gemini Robotics, a framework that gives robots what they call “embodied reasoning.” The paper’s full title references version 1.5 of the Gemini foundation model, indicating the underlying large language model that powers the system rather than a separate product release. Rather than executing a fixed sequence of motions after receiving a command, a robot running this system continuously evaluates its surroundings, updates its plan, and adjusts mid-task. DeepMind has demonstrated the approach on Spot, showing the robot interpreting open-ended instructions and manipulating objects it had never encountered during training.

From web knowledge to robot legs

The work builds on RT-2, a 2023 DeepMind model that pioneered the vision-language-action (VLA) approach. RT-2 proved that knowledge absorbed from billions of web pages and images could be translated into specific motor commands a robot arm could execute. A robot trained with RT-2 could, for example, pick up an object it had never seen if it matched a verbal description.

Gemini Robotics extends that idea in a critical way. Where RT-2 mapped language to action in a single pass, the newer system interleaves reasoning steps with physical movements. Think of it as the difference between following a recipe card and cooking while tasting, adjusting seasoning, and checking the oven in real time. For Spot, which regularly operates on construction sites, in power plants, and across other unpredictable terrain, that adaptive loop could sharply reduce errors when conditions change mid-task.

What the research collaboration involves

Boston Dynamics, owned by Hyundai Motor Group since 2021, has been expanding Spot’s software ecosystem to accommodate third-party AI models. The company has signaled interest in integrating foundation models that could allow Spot to handle more open-ended tasks without requiring engineers to script every behavior in advance.

In demonstrations shared by DeepMind, Spot received natural-language instructions such as asking it to find a specific object in a room and bring it back. The robot used its onboard cameras to scan the space, identified candidate objects, reasoned about which one matched the request, and then planned a path to retrieve it. Each of those steps involved the embodied reasoning loop described in the paper, with the robot re-evaluating its choices as new visual information came in.

That capability matters because most commercial robots today operate on tightly scripted routines. If a warehouse robot encounters a box in an unexpected position, it typically stops and waits for human intervention. A reasoning-enabled Spot, at least in the lab setting, showed it could adapt without pausing.

What has not been proven yet

The gap between a research demo and a shipping product remains wide. Several important questions are still unanswered as of May 2026:

  • Field performance: The published experiments took place under controlled conditions. Neither Boston Dynamics nor DeepMind has released data showing how the reasoning models perform during extended shifts in live industrial environments, where lighting, obstacles, and network connectivity vary constantly.
  • Compute and battery trade-offs: Running a large language model’s reasoning loop on a mobile robot demands significant processing power. Whether Spot’s onboard hardware can sustain that workload without cutting patrol times or requiring more frequent charging has not been publicly quantified.
  • Independent benchmarks: No outside research group has yet published a head-to-head comparison of Gemini Robotics against competing approaches from labs at Stanford, Meta, or startups like Physical Intelligence. Until those benchmarks appear, the claimed advantages of interleaved reasoning remain self-reported.

Boston Dynamics has not announced a specific timeline for making reasoning capabilities available to Spot’s commercial customers, nor has it named any pilot sites. No official press release or direct executive statement has confirmed the scope of any integration or the operational metrics of a reasoning-enabled version of the robot.

Where this fits in the broader robotics race

The DeepMind research involving Boston Dynamics hardware arrives during an intense period of investment in AI-powered robots. Tesla continues developing its Optimus humanoid. Figure, backed by Microsoft and Nvidia, has been testing its Figure 02 robot in BMW factories. Chinese firms including Unitree are shipping low-cost quadrupeds with increasingly capable autonomy stacks. Each of these efforts is converging on the same core idea: robots that understand language, perceive their environment through vision, and translate both into fluid physical action.

What distinguishes the Gemini Robotics approach is the explicit emphasis on mid-task reasoning. Most competing systems still separate the “thinking” phase from the “doing” phase. DeepMind’s architecture weaves them together, which could prove decisive in tasks that require real-time judgment, such as sorting mixed debris after a natural disaster or navigating a half-built floor of a skyscraper.

Signals to watch before any commercial rollout

For companies evaluating robotic automation, the markers worth tracking in the months ahead are straightforward: official deployment announcements from Boston Dynamics naming customer sites, published latency and battery-life data from on-robot tests, and independent benchmarks that let the research community verify whether embodied reasoning delivers on its promise outside the lab. Until those signals arrive, the research represents a credible technical direction rather than a confirmed product upgrade.

More from Morning Overview

*This article was researched with the help of AI, with human editors creating the final content.