Sanket Mishra/Pexels

OpenAI has drawn a rare bright line around its own technology, warning that the next wave of its artificial intelligence systems is likely to create a “high” cybersecurity risk even as it races to ship more capable tools. The company is effectively telling governments, enterprises, and attackers alike that future models will be powerful enough to help breach digital defenses at scale, not just automate office work or write code faster.

That admission marks a turning point in how one of the most influential AI developers talks about its products, shifting from generic safety language to a concrete forecast that its own Large Language Models could materially change the balance between offense and defense in cyberspace. It also lands at a moment when real-world incidents, from state-backed espionage to automated vulnerability discovery, are already showing how quickly AI is being woven into the fabric of hacking campaigns.

OpenAI’s “high risk” warning, in its own words

OpenAI is no longer couching its security concerns in vague hypotheticals. In internal assessments that have now been described publicly, the company says that Future OpenAI Large Language Models could reach a level of capability that meaningfully boosts both cyber offense and cyber defense. Those systems are being slotted into the second-highest tier of the company’s own Preparedness Framework, a category explicitly labeled “high” cybersecurity risk, which signals that OpenAI expects them to help identify vulnerabilities, craft exploits, and orchestrate complex operations that many attackers currently struggle to execute.

The company is also telling the outside world that this is not a distant scenario. OpenAI has said it expects upcoming models to continue on the same trajectory of rapidly improving performance, even if it has not specified when the first systems rated at this “high” level will be released, and it has warned that its next generation of powerful models could pose cybersecurity risks as they become more advanced. In parallel, OpenAI has acknowledged that these tools will be embedded in products that control sensitive data and infrastructure, which is why it is already flagging that its own future LLMs are likely to be treated as potential hazards inside its monitoring systems rather than just neutral software components.

Why OpenAI thinks its own models could breach defenses

When OpenAI says its next models could help breach defenses, it is not talking about abstract science fiction. The company has warned that its future LLMs could be capable of walking less skilled operators through the full lifecycle of an intrusion, from reconnaissance and phishing to lateral movement and data exfiltration, in a way that compresses the learning curve for cybercrime. Internal risk assessments describe scenarios in which a model can translate high level intent into step by step instructions, generate and debug exploit code, and adapt to defensive responses in near real time, all of which would have been out of reach for many attackers even a few years ago.

OpenAI has also acknowledged that these systems will be tightly integrated with tools that can act on the physical and digital world, such as code execution environments and function calling interfaces, which raises the stakes if a model is misused or compromised. In its own technical write up on external testing, the company notes that it has already used red teams to probe how models behave when given access to code execution and function calling, and it has found that external red teaming is particularly valuable for uncovering subtle failure modes in these more capable systems. That experience is one reason OpenAI now expects that its next generation of models could, if left unchecked, help attackers breach defenses that were designed for a pre generative AI era.

From generic safety talk to a formal “high” risk label

What stands out in OpenAI’s latest posture is not just the content of its warning but the form. Instead of relying on broad assurances about “safety by design,” the company has formally labeled its own upcoming models as a “high” cybersecurity risk within a structured Preparedness Framework that ranks threats by severity. That framework is used internally to decide how much testing, monitoring, and mitigation a given capability requires before it can be deployed, so placing future LLMs in the second highest tier is a signal that OpenAI expects them to materially change the threat landscape rather than simply add incremental automation.

OpenAI has also started to describe specific mitigations that will be required for these high risk systems, including segmented permissions, stricter access controls, and more granular monitoring of how models are used in sensitive environments. The company has warned that its next generation of powerful models could pose cybersecurity risks as they become more advanced, and it has reportedly released an internal incident report that explains why its monitoring systems are already labeling some AI activity as an “incident or hazard” when it involves potentially harmful code or instructions. That shift from generic safety language to a formal risk label is what makes the current admission so significant.

How OpenAI is trying to contain the threat

OpenAI is not just sounding the alarm, it is also trying to build a containment strategy around its own technology. The company has announced that it is establishing a new advisory structure to focus specifically on cyber risks posed by its models, and it has said that this group will help shape policies for how the systems can be used in both offensive and defensive contexts. In parallel, OpenAI has emphasized that it wants its models to support defensive cybersecurity tasks, such as helping security teams triage alerts, analyze malware, and harden configurations, even as it acknowledges that the same capabilities could be misused by attackers.

To stress test its assumptions, OpenAI has leaned heavily on external experts. In a detailed description of its testing process, the company explains that it has found external red teaming to be particularly effective at uncovering novel failure modes, especially when models are connected to tools like code execution and function calling. It has also highlighted the role of end to end red teaming, in which Expert red teaming organizations evaluate and improve safety mitigations across the full system, from the model itself to the surrounding infrastructure and user interfaces. That kind of holistic testing is now being treated as a prerequisite for deploying any model that falls into the “high” cybersecurity risk category.

OWASP’s LLM Top 10 shows the attack surface is already here

OpenAI’s warning lands in an ecosystem that is already grappling with concrete vulnerabilities in Large Language Model deployments. Security practitioners have begun to catalog the most common weaknesses in LLM applications, and the emerging consensus is that these systems introduce a distinct attack surface rather than just extending traditional web security concerns. The fact that OpenAI expects its own models to be high risk simply amplifies the urgency of addressing those known weak points before more powerful systems are widely embedded in products and workflows.

The Open Worldwide Application Security Project has already inspired detailed checklists for LLM security, including guidance that highlights Prompt Injection, Insecure Output Handling, Training Data Poisoning, and Model Denial of Service as core risks that developers must address. A separate breakdown of the OWASP LLM Top 10 describes the list as the Key Security Risks for GenAI and LLM Apps, and it underscores that these issues are not theoretical but are being observed in real deployments today. When those vulnerabilities are combined with OpenAI’s own forecast that future models will be significantly more capable, the result is a clear picture of an attack surface that is both expanding and becoming more dangerous.

Real world proof: Anthropic’s AI powered espionage case

OpenAI’s forecast of high risk is not happening in a vacuum. Another major AI developer, Anthropic, has already disclosed that its own tools were abused in a sophisticated espionage campaign, providing one of the clearest public examples of how generative AI can be woven into real world cyber operations. In that case, Anthropic said a suspected state linked actor used its AI system to support an espionage effort, which involved tasks like drafting phishing emails, analyzing stolen data, and planning lateral movement inside target networks.

Subsequent analysis of the incident described it as the first reported AI orchestrated cyber espionage campaign, and security researchers deconstructed how the attackers used Anthropic’s model to assist with reconnaissance, exploit development, lateral movement, and data triage. Another detailed account framed the episode as Anthropic Uncovers Landmark AI Led Cyberattack Campaign and reported that Anthropic has revealed that Chinese state sponsored hackers were behind the operation, which it disrupted before it could scale into an even larger threat. Taken together, those reports show that the kind of AI enabled intrusion OpenAI is now warning about is not hypothetical, it has already happened with a different vendor’s system.

Why OpenAI’s warning matters for governments and enterprises

For governments and large enterprises, OpenAI’s admission that its own future models are likely to pose a high cybersecurity risk should be treated as a strategic signal, not just a technical footnote. When the developer of some of the most widely deployed LLMs says that its next generation of systems could help breach defenses, it effectively tells defenders that their current security architectures, training programs, and incident response playbooks are not calibrated for what is coming. That is particularly true for sectors like finance, healthcare, and critical infrastructure, where AI tools are already being integrated into sensitive workflows.

OpenAI has warned that its next generation of powerful models could pose cybersecurity risks as they become more advanced, and it has reportedly released an internal assessment that explains why its monitoring systems are already flagging certain AI generated activity as an incident or hazard. At the same time, the company has argued that those same models could deliver meaningful benefits for cyberdefense if they are deployed with the right guardrails, for example by helping security teams detect anomalies faster or automate routine hardening tasks. The tension between those two realities is what makes the current moment so fraught for policymakers and CISOs.

Defensive potential versus offensive misuse

OpenAI is trying to walk a narrow line between warning about the offensive potential of its models and promoting their defensive value. On one hand, the company has said that its future LLMs could significantly enhance the capabilities of attackers, particularly those who lack deep technical expertise but are willing to experiment with AI guided intrusion techniques. On the other hand, OpenAI has also argued that the same systems could be used to strengthen cyberdefense, for example by helping analysts sift through large volumes of logs, generate detection rules, or simulate attacker behavior in controlled environments.

That dual use tension is already visible in how OpenAI describes its own roadmap. The company has warned that its upcoming models could be powerful enough to breach defenses, and it has placed them in a high risk category within its Preparedness Framework, yet it continues to invest in features that make those models more useful for security teams. OpenAI has also pointed to its work with external red teams and Expert red teaming organizations as evidence that it is taking the risks seriously, but the underlying reality remains that any capability that helps defenders understand and exploit vulnerabilities can, in principle, be turned around and used by attackers who gain access to the same tools.

What needs to change before the next generation ships

If OpenAI’s own assessment is correct, the window for preparing for high risk AI models is closing fast. Security teams will need to treat LLMs as both assets and potential adversaries, building controls that assume the models themselves can be manipulated, subverted, or used as force multipliers by attackers. That means hardening prompt interfaces against Prompt Injection, validating and sanitizing outputs to avoid Insecure Output Handling, monitoring training pipelines to detect Training Data Poisoning, and designing systems that can withstand Model Denial of Service attacks without cascading failures.

Vendors and regulators will also need to align on clearer expectations for testing and oversight. OpenAI’s own experience suggests that external red teaming, including end to end exercises that involve Expert red teaming organizations, is essential for understanding how models behave in realistic attack scenarios, especially when they are connected to tools like code execution and function calling. At the same time, the Anthropic espionage case shows that even well intentioned safeguards can be circumvented by determined actors, including Chinese state sponsored hackers, which is why any deployment of high capability models should be paired with continuous monitoring, incident response plans, and a willingness to suspend or modify access when abuse is detected.

The uncomfortable honesty of a “high risk” label

OpenAI’s decision to publicly acknowledge that its own future models are likely to pose a high cybersecurity risk is an uncomfortable but necessary step in an industry that has often preferred to emphasize innovation over downside. By labeling its upcoming LLMs as high risk within a formal Preparedness Framework, the company is effectively conceding that it is building tools that could help both defenders and attackers reshape the digital battlefield, and that no amount of internal optimism can erase that dual use reality.

Whether that honesty leads to better outcomes will depend on what happens next. If governments, enterprises, and other AI vendors treat OpenAI’s warning as a call to invest in robust mitigations, from OWASP aligned controls to rigorous external red teaming, then the next generation of models could still tilt the balance toward stronger defense. If, instead, the industry treats the “high risk” label as just another line in a marketing document, the world may discover that the first clear warning about AI enabled cyber threats came not from a regulator or a watchdog, but from the very company building the tools that attackers will use.

More from MorningOverview