Google, Microsoft, and xAI just agreed to hand the U.S. government every new AI model for classified testing before the public ever sees them

Before Google DeepMind, Microsoft, or Elon Musk’s xAI can ship their next frontier AI model to the public, the U.S. government will get to examine it first. All three companies signed agreements on May 5, 2026, granting the federal government pre-release access to their most advanced AI systems for national security testing. The evaluations will be run by the Center for AI Standards and Innovation (CAISI), a Commerce Department body housed within NIST. According to Bloomberg, CAISI already holds similar arrangements with OpenAI and Anthropic, though CAISI’s own public materials reference “existing agreements” without naming those companies individually.

That means five of the world’s most prominent AI developers have now voluntarily agreed to let government reviewers probe their models for dangerous capabilities before anyone outside the lab can use them.

A note on the headline

The headline of this article uses the phrase “classified testing.” No official NIST or Commerce Department source describes the evaluations in those terms. The government’s own language is “national security testing.” Whether any portion of the review process involves classified procedures, facilities, or outputs is not addressed in any public document. Readers should treat “classified” as editorial shorthand for the national-security focus of the program, not as a confirmed description of how results are handled or stored.

What the agreements actually require

CAISI was born from a reorganization. After Commerce Secretary Howard Lutnick directed the former U.S. AI Safety Institute to be restructured into a “pro-innovation, pro-science” standards body, CAISI took over as the federal government’s primary contact point for AI developers. Its stated focus is narrow but consequential: evaluating whether frontier models cross risk thresholds in three domains: cybersecurity, biosecurity, and chemical weapons.

Under the new agreements, each company submits frontier models to CAISI before public deployment. CAISI’s team then runs evaluations targeting those three categories. Think of it as a national security screening layer that sits between a lab’s internal “ready to ship” decision and the moment a model becomes available through a cloud API or consumer product.

The arrangements are voluntary. No statute compels participation, and the public documents do not specify penalties for noncompliance or withdrawal. Any of the five companies could, in theory, walk away. The system runs on a calculation that cooperating with federal reviewers costs less than the political and reputational fallout of refusing.

What the public record confirms

CAISI’s own program page confirms the existence, date, and parties of the new agreements, along with its mission to lead evaluations of capabilities that may pose national security risks. The page also references “existing agreements” with unnamed AI developers but does not identify them. NIST’s communications hub lists the May 5, 2026, announcement and identifies Google DeepMind, Microsoft, and xAI as the newest signatories. Bloomberg’s reporting fills the gap by placing those three alongside earlier CAISI arrangements with OpenAI and Anthropic, bringing the total to five. That attribution rests on Bloomberg’s sourcing, not on a NIST primary document naming OpenAI or Anthropic directly.

The institutional chain of authority is straightforward: CAISI sits within NIST, which operates under the Department of Commerce. Secretary Lutnick’s earlier statement established the center’s mission and framing. NIST also maintains related technical infrastructure, including the Computer Security Resource Center and the National Vulnerability Database, that could supply evaluation frameworks, though neither currently publishes AI-specific testing protocols tied to these agreements.

What we still do not know

Several critical details are missing from every available public document. No primary source lists the specific models covered, the capability thresholds that trigger a submission, or the timelines companies must follow. The phrase “national security testing” suggests restricted handling of results, but official releases do not define data-handling rules for company submissions or clarify whether evaluation findings are shared back with developers, kept within government, or both.

None of the three new signatories have issued public statements describing the scope of their obligations. Without those, it is unclear whether the agreements cover every new model or only those above a certain capability level, and whether pre-release access means days, weeks, or months of government review time.

The enforcement question is equally open. If a company decided to release a model without completing the CAISI review, the consequences would be political and reputational, not legal. No public document describes what happens if CAISI identifies a risk: whether the government can block a release, request modifications, or only issue an advisory. That ambiguity leaves the real power of the arrangement undefined.

The Biden-era backstory

This program did not appear from nowhere. President Biden’s October 2023 executive order on AI safety created the original U.S. AI Safety Institute within NIST, tasking it with developing testing standards for advanced AI systems. When the Trump administration took office, Secretary Lutnick rebranded and restructured the institute into CAISI, shifting the emphasis from broad safety research to a narrower focus on national security risks and voluntary industry cooperation. The agreements announced in May 2026 are the most visible product of that pivot.

The approach stands in contrast to the European Union’s AI Act, which imposes legally binding obligations on developers of high-risk AI systems, including mandatory conformity assessments before deployment. Where the EU chose regulation with teeth, the U.S. is betting on voluntary partnerships. Whether that lighter touch can keep pace with the speed of AI development is the central question hanging over the entire program.

What this means for release schedules and product roadmaps

For startups and enterprises building applications on top of frontier models, the most immediate concern is unpredictability. A model that appears complete in a lab demo may still be sitting in a CAISI review queue before it becomes available through an API or product update. That uncertainty complicates planning for any team that builds features around rumored or previewed capabilities.

The existence of a government evaluation step could also reshape how labs design their systems. Knowing that external reviewers will probe for cyber, bio, and chemical misuse potential may push developers to invest more heavily in safety mitigations and red-teaming before they ever submit a model. Even without legal enforcement power, a negative CAISI assessment could be politically damaging enough that companies choose to delay or revise releases on their own.

A voluntary checkpoint with real but untested power

For now, the public record supports only a narrow set of firm conclusions: five major frontier AI developers have agreed to submit certain models to CAISI for pre-release evaluation; the testing focuses on three national security risk domains; and participation remains entirely voluntary. The depth of the testing, the speed of reviews, and the consequences of adverse findings are all still undefined.

What is clear is that a new, largely unseen checkpoint now sits between the lab and the wider world. Whether it functions as a meaningful filter or as political theater will only become apparent when a specific model launch runs through it and the results, or the silence around them, speak for themselves.

More from Morning Overview

*This article was researched with the help of AI, with human editors creating the final content.

IG

FB

PIN

LI

X