Google, Microsoft, and xAI are now letting the Commerce Department test their AI models in classified conditions before the public ever sees them

Before Google DeepMind, Microsoft, or xAI release their next frontier AI models to the public, federal evaluators will get to examine them behind closed doors, inside classified government facilities.

All three companies signed agreements in May 2026 with the Commerce Department’s Center for AI Standards and Innovation, known as CAISI, granting the government pre-deployment access to their most advanced AI systems. The testing will focus on national security threats, specifically whether these models could be exploited to aid cyberattacks, create biosecurity hazards, or help develop chemical weapons, according to an announcement from NIST, the standards agency that houses CAISI.

The agreements create a quiet but significant checkpoint between the lab and the launch button. They also raise a question the government has not yet answered publicly: what happens if testers find something dangerous and the company wants to ship the model anyway?

A pattern that started in 2024

This is not the first time the Commerce Department has brokered these arrangements. In August 2024, the predecessor body, then called the U.S. AI Safety Institute, signed similar agreements with Anthropic and OpenAI covering AI safety research, testing, and evaluation. Those earlier deals gave NIST access to major new models before and after public release.

The new CAISI agreements extend that approach to three more companies, but with a notable addition: an explicit classified-environment component that did not appear in the 2024 announcements. That means some portion of the evaluation will occur at security clearance levels that put the process beyond public view.

The institutional shift from the AI Safety Institute to CAISI was formalized last year, when Commerce Secretary Howard Lutnick described CAISI as the government’s point of contact for testing and collaborative research on commercial AI systems, including identifying vulnerabilities and threats. The rebranding signaled a pivot from broad safety research toward standards-setting and national security, a shift now visible in the classified-testing provisions of the new deals.

Voluntary, not mandatory

The agreements carry no binding force. According to reporting from The Washington Post, participation is voluntary and does not establish mandatory standards. That distinction matters more than it might seem: a company could, in theory, release a model even if federal testers flagged serious concerns. No public enforcement mechanism exists to prevent that, and no executive order or statute currently grants CAISI the authority to block a product launch.

The published versions of the agreements do not describe pass/fail criteria, specific test protocols, or classification levels. No data has been released on whether any models have already undergone classified review or what, if anything, those reviews uncovered. No timeline has been disclosed for how quickly results are shared back with the companies.

NIST does bring established technical infrastructure to the process. Its National Vulnerability Database and cybersecurity standards catalog provide evaluation frameworks that federal testers can apply to frontier AI systems. But the agency’s role as a standards body, not a regulator, shapes the entire arrangement: this is cooperation, not oversight with teeth.

Five companies in, many more out

Five companies now have agreements in place: Anthropic, OpenAI, Google DeepMind, Microsoft, and xAI. That covers several of the most prominent U.S.-based AI developers, but it leaves out others building increasingly capable systems. Meta has released powerful open-source models. Mistral, based in France, is competing at the frontier. And a growing number of Chinese AI labs are advancing rapidly with no connection to this framework whatsoever.

Whether the CAISI process creates genuine safety assurance or simply offers a selective preview for a handful of willing participants is a question the current structure cannot answer on its own. The voluntary nature of the program means its reach is limited to companies that see value in participating.

The competitive edge hiding in plain sight

For the five companies inside the process, there may be a practical payoff beyond good citizenship. A model that has passed through a federal security review, even a voluntary one, carries an implicit credibility advantage that competitors without such agreements cannot match. In defense contracting, intelligence procurement, and federal IT, security vetting is often a prerequisite for deployment. A CAISI review could function as an unofficial seal of approval in those markets.

That dynamic could quietly create a two-tier landscape: companies inside the CAISI process with a path to government contracts, and companies outside it facing additional scrutiny. How federal agencies and private-sector buyers treat these reviews in procurement decisions will determine whether that split becomes real.

When the framework faces its first real test

The program’s credibility ultimately hinges on a scenario that has not yet played out publicly. If a classified evaluation surfaces a serious vulnerability, one that could genuinely threaten national security, the voluntary framework will face its hardest question: can the government do anything about it beyond asking nicely?

Right now, the answer appears to be no. There is no disclosed mechanism for CAISI to delay or block a release. The classified setting means the public, independent researchers, and Congress have no direct window into what is being tested or what is being found. And the voluntary structure means companies set the terms of their own participation.

That does not make the program meaningless. Early government visibility into frontier AI systems is a tangible benefit, and the alternative, finding out about dangerous capabilities only after a model is already in the wild, is clearly worse. But the gap between “we looked at it” and “we can do something about what we found” remains wide open. Closing it would require either new legislation or a willingness by companies to accept constraints that, so far, no one has agreed to.

More from Morning Overview

*This article was researched with the help of AI, with human editors creating the final content.

IG

FB

PIN

LI

X

Google, Microsoft, and xAI are now letting the Commerce Department test their AI models in classified conditions before the public ever sees them

A pattern that started in 2024

Voluntary, not mandatory

Five companies in, many more out

The competitive edge hiding in plain sight

When the framework faces its first real test

Author

Get weekly updates with the latest news and tips!

More in AI

IG

FB

PIN

LI

X