Microsoft and Google are rolling out new coding models to challenge OpenAI and Anthropic

Developers who rely on GitHub Copilot now face a real choice between coding models built by Microsoft, Google, OpenAI, and Anthropic, all competing inside the same editor. At Build 2026, Microsoft’s AI Superintelligence Team released seven in-house models, including MAI-Code-1, which the company describes as an “inference efficient coding model tuned for GitHub,” available in Copilot and VS Code. A smaller variant, MAI-Code-1-Flash, started rolling out to Copilot users on the same day, while Google’s Gemini 3.5 Flash appeared on the platform’s supported-models list. The result is a four-way contest for the tool that millions of programmers use every day.

Why new coding models inside Copilot change the competitive math

Until now, OpenAI’s GPT family and Anthropic’s Claude models dominated the Copilot experience. Microsoft’s decision to ship its own purpose-built coding model directly into the same interface shifts the dynamic. MAI-Code-1 is not a general-purpose assistant repurposed for code. According to the Build 2026 announcement, it was specifically tuned for GitHub workflows, which means completions, chat, and agent-driven tasks inside the repository environment where most professional software gets written.

Google’s entry adds a second outside challenger. Gemini 3.5 Flash now sits alongside Microsoft’s and OpenAI’s options in the Copilot model picker, giving developers a low-latency alternative from a company that controls Android, Chrome, and its own cloud platform. For teams already embedded in Google Cloud, the ability to use a Google model inside Copilot without switching tools removes a friction point that previously kept some organizations tied to OpenAI defaults.

The practical question is whether these new models will change how developers actually write software. A testable expectation is that within six months of the MAI-Code-1 and Gemini 3.5 Flash integrations, repositories using Copilot should show a measurable rise in multi-file refactors and external-tool calls compared with repositories relying only on older models. Both Microsoft and Google have optimized their entries for the kind of extended, tool-heavy tasks that single-file autocomplete models handle poorly. If adoption data eventually confirms that pattern, it would validate the bet that coding-specific models outperform general-purpose ones on real engineering work.

For individual developers, the shift may feel incremental at first. Copilot still appears as the same sidebar and inline suggestion surface. The change is that behind those suggestions sits a configurable roster of models with different strengths and trade-offs. Teams that care about latency might default to the Flash variants, while those focused on complex refactors or codebase-wide reasoning may favor the larger options from Microsoft, OpenAI, or Anthropic. The presence of Google’s model inside the same picker further encourages experimentation, because trying a different vendor no longer requires leaving the familiar GitHub and VS Code environment.

Benchmarks, model tiers, and what the evidence actually shows

Two new benchmarks frame how these models are being evaluated. SWE-Bench Pro, defined in a paper on arXiv, introduces longer-horizon software engineering tasks designed to reduce data contamination and better reflect the kind of work professional developers do across multiple files and dependencies. Traditional coding benchmarks often test isolated function generation; SWE-Bench Pro pushes models to resolve bugs and implement features that span entire codebases.

A second benchmark, MCP-Atlas, also defined on arXiv, measures tool-use competency against real MCP servers. This matters because the next generation of coding agents does not just write code. It calls external tools, queries databases, and interacts with APIs. MCP-Atlas tests whether a model can reliably operate in that agentic mode rather than simply generating text that looks like a tool call.

Microsoft’s model lineup now includes two tiers for coding. MAI-Code-1 serves as the full-size model, while MAI-Code-1-Flash occupies the small-tier slot. The Flash variant began rolling out in VS Code first, according to the GitHub Changelog, with phased expansion to other surfaces. Google’s Gemini 3.5 Flash appears on GitHub’s supported-models documentation alongside the Microsoft entries. Both the Microsoft and Google models are listed as available options that Copilot users can select, though availability may vary by plan tier and region.

What the evidence does not yet include is published performance numbers for MAI-Code-1 on either SWE-Bench Pro or MCP-Atlas. The benchmark definitions exist as research papers, but no institutional source has released model-card results tying Microsoft’s or Google’s specific models to scores on these tests. That gap matters. Without head-to-head numbers, developers are choosing models based on brand trust and informal experience rather than verified performance data.

In practice, engineering leaders will likely run their own lightweight evaluations. That might mean replaying recent production incidents through different Copilot configurations, or measuring how often each model produces compilable patches on a curated internal benchmark. The absence of public scores on SWE-Bench Pro and MCP-Atlas does not prevent comparison; it simply shifts the burden of proof from vendors to customers, at least in the short term.

Open questions for developers choosing between four model families

Anthropic’s position in this race is the least visible. Claude models remain available inside Copilot, but Anthropic has not publicly addressed the competitive implications of Microsoft and Google placing their own models on the same platform. Whether Anthropic responds with a coding-specific model variant or doubles down on its general-purpose strengths will shape how the four-way competition plays out over the next two quarters.

Rollout telemetry is another blind spot. Microsoft and GitHub have confirmed that MAI-Code-1-Flash is rolling out, but neither has published adoption rates, latency comparisons, or user-satisfaction metrics against the OpenAI and Anthropic alternatives already in the system. Google has similarly not released methodology details or scores tying Gemini 3.5 Flash to specific coding workloads inside Copilot. For now, developers see the new names in the model picker but do not have official data that would justify switching at scale.

That uncertainty leads to several practical questions. Enterprise teams must decide whether to standardize on a single model across their organization or allow per-developer choice. Security and compliance groups will want clarity on where prompts and code snippets are processed, especially when mixing models from different vendors inside the same tool. Procurement teams, meanwhile, may see the new Microsoft and Google options as leverage in negotiations with existing AI providers, even before technical differences are fully understood.

There is also a cultural dimension. Many developers have built habits around specific models, learning how GPT or Claude tends to respond to certain prompts. Introducing MAI-Code-1 and Gemini 3.5 Flash into that workflow means those habits may need to evolve. Some teams may discover that different phases of the development cycle benefit from different models: one for exploratory design discussions, another for implementation, and a third for test generation and refactoring.

Looking ahead, the most important unknown is how quickly Copilot will expose richer configuration and observability around these choices. If GitHub surfaces per-repository analytics on which model produced which suggestion, along with accept and edit rates, teams could make evidence-based decisions about their defaults. If not, the four-way competition could remain a largely invisible contest occurring behind a single Copilot brand, with developers only dimly aware of which model is helping them ship code.

For now, the arrival of MAI-Code-1, MAI-Code-1-Flash, and Gemini 3.5 Flash inside Copilot marks a clear transition. Coding assistance is no longer synonymous with a single vendor’s model family. Instead, Copilot is becoming a marketplace where Microsoft, Google, OpenAI, and Anthropic compete for developer attention on every keystroke. How quickly those developers experiment, measure, and adapt will determine whether the new options amount to a subtle tuning change or a genuine reshaping of how modern software gets written.

More from Morning Overview

*This article was researched with the help of AI, with human editors creating the final content.

IG

FB

PIN

LI

X

Global Font

Microsoft and Google are rolling out new coding models to challenge OpenAI and Anthropic

Why new coding models inside Copilot change the competitive math

Benchmarks, model tiers, and what the evidence actually shows

Open questions for developers choosing between four model families

Dorian Maddox

Author

Volvo is recalling heavy trucks whose wheel lug nuts can loosen and let a wheel come off

6 everyday foods doctors say to limit to protect your memory

The Pentagon is testing OpenAI and Google models to potentially replace Claude in classified systems

NOAA now expects as many as 14 hurricanes in the Eastern Pacific this year

8 everyday habits doctors link to faster memory loss in older adults

More in AI

AI

The Pentagon is testing OpenAI and Google models to potentially replace Claude in classified systems

AI

Microsoft built its own coding AI to lean less on OpenAI and cut costs for developers

AI

Google’s Gemini 3.5 Flash is already the default across Search and the Gemini app

AI

Microsoft’s new in-house models aim to cut its dependence on OpenAI and lower developer costs

AI

Microsoft’s new MAI-Code model turns plain-English descriptions into working app code

AI

OpenAI struck a deal to acquire the coding startup Ona

AI

Microsoft’s new MAI-Code tool turns plain-English descriptions into working app code

AI

Microsoft unveiled its own MAI models to lean less on OpenAI and cut costs for developers

IG

FB

PIN

LI

X

IG

FB

PIN

LI

X

Microsoft and Google are rolling out new coding models to challenge OpenAI and Anthropic

Why new coding models inside Copilot change the competitive math

Benchmarks, model tiers, and what the evidence actually shows

Open questions for developers choosing between four model families

Author

Get weekly updates with the latest news and tips!

More in AI

IG

FB

PIN

LI

X