Morning Overview

Anthropic warns new Claude model could speed hacks, urges defenses

Anthropic has flagged its own Claude AI model as a potential accelerant for cyberattacks, warning that the system can discover and exploit software vulnerabilities in real-world open-source projects at a pace that outstrips traditional human-led methods. The company’s research, published in an academic paper on arXiv, tested AI agents against actual codebases and found they could identify previously unknown flaws through workflows that closely mirror advanced hacking techniques. The disclosure puts fresh pressure on software developers and infrastructure operators to adopt stronger, AI-aware defenses before attackers do.

What is verified so far

The central piece of evidence is a research paper titled “CyberGym: Evaluating AI Agents’ Cybersecurity Capabilities with Real-World Vulnerabilities at Scale,” bearing the arXiv identifier 2506.02548. The study evaluates how AI agents perform when tasked with finding and exploiting security flaws in genuine open-source software, not contrived lab exercises. Its scope includes previously undiscovered vulnerabilities, meaning the agents surfaced bugs that human reviewers had not yet caught. The research focuses on discovery and exploitation-adjacent workflows, a term that describes the chain of steps an attacker would follow to move from spotting a weakness to taking advantage of it.

Anthropic’s involvement in the paper connects to a broader internal initiative. The evidence trail references Anthropic’s “Project Glasswing,” which appears oriented toward securing critical software as AI capabilities grow. While the full details of Project Glasswing are not laid out in the available sources, its citation trail leads to a CISA retrospective on the 2021 Colonial Pipeline ransomware attack, a case that forced the U.S. government to rethink how it protects essential services from digital threats. That federal analysis, published as a CISA report, cataloged lessons learned over two years, emphasizing rapid detection and coordinated response as the primary tools for limiting damage from large-scale breaches.

The link between these two documents is telling. Anthropic appears to be framing its own AI’s offensive potential against the backdrop of real infrastructure failures, acknowledging that the same speed advantage Claude offers defenders could just as easily benefit attackers. That framing is unusual for a company promoting a commercial product, and it signals a degree of self-imposed accountability that the broader AI industry has not uniformly adopted. By grounding its work in a high-profile incident that disrupted fuel supplies across a major region, the company is effectively arguing that AI-driven vulnerability discovery is not an abstract research problem but a live operational concern.

What remains uncertain

Several important gaps exist in the public record. The CyberGym paper establishes that AI agents can find real vulnerabilities at scale, but the available reporting does not include specific benchmark numbers, such as how many flaws were found, how quickly, or how those results compare to human security researchers working the same codebases. Without those figures, the claim that Claude “speeds hacks” rests on the general finding rather than a precise measurement of acceleration. Readers should be cautious about extrapolating from capability demonstrations to blanket statements about superiority over human experts.

Anthropic executives have not, based on available sources, issued public statements or press releases expanding on the paper’s findings. The absence of on-the-record commentary from company leadership means the warning is currently conveyed through academic channels rather than a direct corporate advisory. Whether Anthropic plans to issue formal guidance to customers, restrict certain Claude capabilities, or integrate new safeguards remains unclear from the documents at hand. That silence leaves open questions about how the company will balance openness in research with controls on potentially dangerous applications.

Project Glasswing itself is only partially visible. The name surfaces in the citation trail connecting the CyberGym paper to the CISA Colonial Pipeline retrospective, but no standalone document or press release describes the initiative’s full scope, funding, or timeline. It is not yet possible to determine whether Glasswing is an internal research program, a product feature set, or a public-private partnership effort. Readers should treat references to Project Glasswing as preliminary until Anthropic provides additional detail. For now, it functions more as a signal of intent than as a clearly defined security offering.

There is also no primary data on how often Claude-like models have been used, or misused, for vulnerability discovery in production environments outside the controlled conditions of the CyberGym study. The paper tests AI agents in a research setting where access, inputs, and evaluation criteria are carefully managed. Whether those results translate directly to real-world attack scenarios, where network defenses, access controls, and monitoring tools add friction, is an open question that the study alone cannot answer. The gap between lab performance and field behavior is especially important in cybersecurity, where small environmental differences can dramatically change outcomes.

How to read the evidence

The strongest evidence here is the CyberGym paper itself, a peer-submitted academic study that describes a concrete experimental setup: AI agents tested against real open-source code, producing results that include the discovery of previously unknown vulnerabilities. That makes it primary evidence of capability, not speculation. When Anthropic says its model can find flaws humans missed, the paper provides the experimental basis for that claim. It also demonstrates that AI can follow multi-step workflows (code inspection, hypothesis generation, exploit construction) that resemble the process human attackers use.

The CISA Colonial Pipeline report serves a different function. It is not evidence of Claude’s capabilities but rather contextual grounding for why those capabilities matter. The Colonial Pipeline attack shut down fuel delivery across much of the U.S. East Coast, and the federal government’s post-incident analysis stressed the need for faster detection and better coordination. By citing this report in its research trail, Anthropic is drawing a line between abstract AI benchmarks and the kind of real-world damage that software vulnerabilities can cause when exploited at scale. The CISA document is institutional evidence of consequence, not of AI performance, and it underscores that weaknesses in widely used systems can ripple far beyond the organizations that own them.

Readers should weigh these two sources differently. The arXiv paper supports factual claims about what Claude can do in a lab setting. The CISA report supports the argument that those capabilities carry serious stakes. Neither source, on its own, proves that Claude has been or will be used maliciously. The gap between “can” and “will” is where policy, corporate responsibility, and defensive investment all come into play. How regulators, vendors, and operators respond will determine whether the net effect of AI-accelerated vulnerability discovery is greater resilience or greater risk.

One common assumption in current coverage of AI and cybersecurity deserves scrutiny: the idea that AI-powered vulnerability discovery primarily benefits attackers. The CyberGym research actually cuts both ways. If an AI agent can find a zero-day flaw before a human attacker does, the same tool in a defender’s hands becomes a powerful audit mechanism. The real question is not whether AI accelerates hacking but who gets access to these tools first and under what constraints. Anthropic’s decision to publish its findings openly, rather than keeping them proprietary, suggests the company is betting that transparency and shared knowledge will favor defenders over time. That bet is far from guaranteed, but it reflects a strategic choice that differs from simply hoarding capability.

The practical takeaway for organizations that rely on open-source software is direct. AI-augmented code review is no longer a speculative idea; it is an operational necessity. Development teams should assume that sophisticated adversaries will experiment with models like Claude to scan public repositories for exploitable bugs. In response, maintainers can integrate similar agents into their continuous integration pipelines, using them to flag suspicious patterns, generate test cases, and propose patches before code is widely deployed.

At the same time, security leaders need to adjust their threat models. Traditional assessments that focus on known vulnerabilities and manual penetration testing may underestimate the speed and breadth of AI-driven reconnaissance. Logging, anomaly detection, and incident response plans should assume that probing activity could be partly automated and more persistent than in the past. Investments in basic hygiene, such as prompt patching, least-privilege access, and segmentation, remain crucial, but they now sit alongside a new category of controls aimed at monitoring how AI tools are used inside their own organizations.

Finally, policymakers and standards bodies face a narrowing window to set expectations around responsible deployment. The CyberGym study shows that advanced AI models can already participate meaningfully in vulnerability discovery workflows. Waiting for clear evidence of widespread misuse before acting would ignore the lead time required to update regulations, procurement rules, and best-practice frameworks. Anthropic’s willingness to surface these risks early offers an opportunity for a more proactive response, if industry, government, and the open-source community are prepared to treat AI-accelerated security as a shared challenge rather than a competitive advantage to be kept in the dark.

More from Morning Overview

*This article was researched with the help of AI, with human editors creating the final content.