Google, Microsoft, and xAI will now let the government test their AI models before the public sees them

Starting in May 2026, three of the biggest names in artificial intelligence have opened their doors to federal safety testers. Google DeepMind, Microsoft, and xAI each signed agreements with the U.S. government’s AI safety body, giving federal evaluators access to their most powerful models before those systems reach the public. The deals mark the largest expansion yet of a voluntary testing framework that began with just two companies in 2024, but they also spotlight a persistent gap: the government can look under the hood, but it cannot pull the keys.

What the agreements actually do

The agreements were signed with the Critical and Emerging Technologies AI Safety Institute, known as CAISI, which operates within the National Institute of Standards and Technology under the Department of Commerce. According to NIST’s announcement, the deals grant CAISI access to frontier AI models for pre-deployment evaluations focused on national security risks. They also allow post-deployment evaluations, meaning the government can continue probing models after they go live.

The framework is not new. In August 2024, the then-named U.S. AI Safety Institute signed similar agreements with Anthropic and OpenAI, securing access to major new models before and after public release. Those earlier deals grew out of a 2023 executive order signed by President Biden that directed federal agencies to develop safety standards for advanced AI. The institute was later rebranded as CAISI, but the core mission carried over: test frontier models for dangerous capabilities before they spread.

With Google DeepMind, Microsoft, and xAI now in the fold, five of the world’s leading AI developers are participating. Their models power products used by hundreds of millions of people, from Google’s Gemini and Microsoft’s Copilot to xAI’s Grok.

The government is also vetting its own AI purchases

CAISI’s ambitions extend beyond testing models for public release. Earlier in 2026, the institute signed a memorandum of understanding with the General Services Administration to embed evaluation methods into federal procurement through the USAi program. That means the government is not only scrutinizing models headed for consumers but also vetting the AI tools its own agencies buy and deploy.

Internationally, Microsoft is now subject to government-led testing on both sides of the Atlantic. The UK AI Security Institute partnered with Microsoft on frontier safety work covering high-risk capability evaluation, safeguard effectiveness, and societal resilience research. That cross-border coordination hints at the early stages of a networked evaluation regime, even if it remains informal.

Access without authority

The most important caveat is one the official announcements do not emphasize. As The Washington Post reported, these agreements are voluntary and contain no specific standards that companies must meet. CAISI can evaluate a model and flag serious risks, but nothing in the current framework requires a company to delay a launch, alter a product, or even respond publicly to the findings.

That distinction is critical. A pre-deployment evaluation without enforcement power is fundamentally different from a regulatory gate. If CAISI discovers that a model can be easily manipulated to assist with cyberattacks or generate step-by-step instructions for dangerous materials, the company still makes the final call on whether and when to ship it.

The Post’s reporting also notes that the broader political environment around AI regulation has shifted, with policy rollbacks complicating the picture. Parts of Biden’s original executive order were rescinded by the Trump administration in early 2025, raising questions about the legal foundation underpinning CAISI’s work. The institute continues to operate, but its long-term authority depends on whether Congress passes legislation giving it binding power over AI deployments.

What we still don’t know

Several significant details remain undisclosed. NIST’s announcement references national security evaluations but does not specify the exact criteria CAISI uses, the timelines for completing reviews, or how deeply evaluators can probe a model’s architecture and training data. No public statements from xAI leadership about the company’s participation have surfaced as of late May 2026.

Information-sharing rules are also unclear. It has not been disclosed whether CAISI’s findings will be shared across federal agencies, with state and local governments, or with international partners beyond the existing UK arrangement. Equally uncertain is how much the public will ever learn. If evaluation results stay behind closed doors, independent researchers and civil society groups will have no way to scrutinize the risks CAISI identifies.

There is also the question of who is missing from the table. Meta, whose Llama models are open-source and widely used, has not signed a comparable agreement. Neither has Amazon, whose Bedrock platform distributes frontier models to enterprise customers. Whether the framework eventually covers open-weight models and cloud distributors will shape how comprehensive it becomes.

What this means for businesses and consumers

For companies that build products on top of Google, Microsoft, or xAI models, the agreements offer a partial reassurance: the most advanced systems are at least being probed for extreme national security risks before broad release. That could reduce the chance of a model with easily exploitable dangerous capabilities shipping without anyone in government noticing.

But federal testing is not a substitute for internal due diligence. CAISI’s mandate, as described so far, focuses on frontier and security-relevant capabilities. It does not cover the everyday harms that surface in hiring tools, customer service bots, or productivity assistants. Enterprise buyers still need their own evaluation processes for bias, robustness, privacy, and compliance with sector-specific regulations.

For consumers, the takeaway is similarly mixed. These agreements show that major AI companies and the federal government recognize the stakes of frontier models and are building mechanisms to manage the risks. But a voluntary framework with no enforcement teeth is closer to a safety net with large holes than a locked gate.

A wider circle, but still no fence

The new agreements with Google DeepMind, Microsoft, and xAI represent a real expansion of government involvement in AI safety. Five major developers now grant federal evaluators pre-release access, CAISI is embedding evaluation into procurement, and international coordination with the UK is underway. None of that existed three years ago.

Yet the system still runs on corporate willingness. No law compels participation. No regulator can block a deployment. Whether this voluntary circle of cooperation hardens into genuine oversight will depend on how rigorously CAISI uses its access, how transparently it shares what it finds, and whether lawmakers eventually decide that testing frontier AI models should not be optional.

More from Morning Overview

*This article was researched with the help of AI, with human editors creating the final content.

IG

FB

PIN

LI

X