Mistral drops a 128B flagship model with agentic “Work mode” and async cloud-based coding sessions

Mistral AI has launched Mistral Medium 3.5, a 128-billion parameter dense model with a 256,000-token context window, alongside two features designed to turn its Le Chat interface into a full developer workbench. The first, called Work mode, lets the model break complex requests into multi-step plans and execute them with built-in tool use. The second, Vibe remote agents, pushes coding tasks into a persistent cloud sandbox that stays running even after a developer closes their laptop.

The release, announced by Mistral in late May 2026, marks the French AI company’s most direct challenge yet to the wave of agentic coding tools from American rivals, including GitHub Copilot Workspace, OpenAI’s Codex, and Anthropic’s Claude Code.

A bigger model with a longer memory

Mistral Medium 3.5 is one of the largest openly discussed dense architectures from a European AI lab. At 128 billion parameters, it sits in the same weight class as frontier models from OpenAI and Google, though Mistral has not published head-to-head benchmark results against GPT-4o, Claude 3.5 Sonnet, or Gemini 1.5 Pro on standard evaluations like MMLU or HumanEval. The 256,000-token context window is large enough to hold roughly 500 pages of text in a single session, which matters for agentic workflows where the model needs to reason across an entire codebase or a long chain of tool calls.

Parameter counts and context lengths are useful specifications, but they do not guarantee real-world performance on complex reasoning or code generation. Until independent benchmarks surface, the model’s competitive standing remains an open question.

Work mode: controlled agentic execution inside Le Chat

Work mode, documented in Mistral’s developer guide, wraps Mistral Medium 3.5 in what the company calls an “agent harness.” When a user submits a request, the system can decompose it into multiple steps, select which tools and connectors to invoke, fire parallel tool calls, and stream progress back in real time.

The most notable design choice is an approval gate for sensitive actions. Rather than letting the agent run autonomously through every step, Work mode pauses and asks the user to confirm before executing anything the system flags as high-risk. This positions the feature as a supervised agentic layer: the model proposes, the human disposes.

That said, Mistral’s documentation describes the approval system at a functional level without publishing a detailed threat model or explaining exactly how “sensitive” is defined. Organizations evaluating Work mode for production workflows that touch proprietary code will want to know how granular the permission prompts are, whether policies can be customized to match internal security standards, and how resilient the gating is to prompt injection. No independent security audit has been published alongside the launch.

Vibe remote agents: coding that keeps running after you walk away

Vibe remote agents target a specific frustration familiar to any developer who has watched a long refactor or test suite churn through a local machine: the inability to do anything else while the process runs. Vibe sessions execute inside a cloud sandbox connected to a GitHub repository. When the agent finishes, it produces a draft pull request with its changes.

Sessions can be kicked off from Le Chat, from the command line, or by typing a “/teleport” command that migrates an active local coding session into the cloud. Once running remotely, the session persists across devices. A developer can start a task on a workstation, close the lid, and check progress from a phone or tablet hours later.

The async model is the key differentiator here. Most competing agentic coding tools still require an active session or browser tab. Vibe’s design is closer to a CI/CD job that happens to be powered by a language model: fire it off, go do something else, review the output when it lands. For long-running tasks like codebase-wide migrations, documentation passes, or large-scale test generation, that could meaningfully change how developers allocate their time.

Mistral has not disclosed adoption numbers, usage limits, or whether Vibe is available on free tiers or restricted to paid plans. The company also has not published data on how often Vibe-generated pull requests compile cleanly, pass tests, or require significant human revision.

Speculative decoding with EAGLE

Alongside the product launch, Mistral released a companion implementation referencing EAGLE, a speculative decoding method described in a 2024 arXiv paper. Speculative decoding works by letting a smaller, faster “draft” process propose tokens that the main model then verifies or corrects, reducing latency without retraining the core model.

The technique could be especially valuable for agentic workloads, where the model generates many sequential tool calls and every millisecond of latency compounds. However, Mistral has not published documentation confirming exactly how EAGLE is integrated into the Vibe or Work mode pipelines, nor has it released latency benchmarks showing the speedup in practice. The arXiv paper provides the theoretical grounding; the production details remain undisclosed.

Where Mistral fits in a crowded field

The launch arrives at a moment when nearly every major AI company is racing to ship agentic coding tools. GitHub’s Copilot Workspace lets developers plan and implement changes inside a cloud environment. OpenAI’s Codex operates as an async coding agent with GitHub integration. Anthropic’s Claude Code offers terminal-based agentic coding with extended thinking. Cursor and Windsurf have built IDE-native agents that can run background tasks.

Mistral’s entry stands out on two fronts. First, the “/teleport” command that moves a live local session into the cloud is a workflow detail none of the major competitors have replicated. Second, Mistral is the only European company shipping a product in this category at this scale, which may matter for organizations with data residency requirements or preferences for non-U.S. AI providers.

But Mistral also faces a steeper climb. The company’s developer ecosystem is smaller than those of OpenAI, Google, or Anthropic, and Le Chat has less market penetration than ChatGPT or the Copilot family. Whether a strong model and clever workflow features can overcome that distribution gap is the central strategic question for this launch.

What developers should watch for next

For teams considering these tools, the practical first step is low-risk experimentation. Testing Vibe remote agents on a non-critical GitHub repository, asking it to implement small features, refactor a module, or add tests, can reveal how often the agent produces compilable code and how much human review is still required. Similarly, running Work mode on bounded tasks like data transformation scripts or documentation updates can help teams gauge how comfortable they are with the agent’s decision-making and the approval prompts that gate sensitive actions.

The bigger picture will come into focus as independent benchmarks, security audits, and real-world usage data emerge over the coming weeks. Mistral Medium 3.5’s scale, the structured harness for multi-step workflows, and the ability to offload coding to persistent cloud sessions are all promising building blocks. Whether they translate into measurable productivity gains will depend on how the system performs under scrutiny from developers, security teams, and researchers who are not on Mistral’s payroll.

More from Morning Overview

*This article was researched with the help of AI, with human editors creating the final content.

IG

FB

PIN

LI

X