Morning Overview

Researchers just pulled off a silent prompt-injection attack on GPT-4o and Claude-class agents tied to email and calendar tools — exfiltrating data without the agent’s user ever noticing

Enterprises connecting large language models to email inboxes and calendar apps face a new, documented threat: researchers have shown that an attacker can plant hidden instructions inside URL previews, trick an AI agent into reading them, and silently siphon sensitive data out of the system. Across 480 experimental runs, the attack succeeded 89 percent of the time, and 95 percent of those successes left no trace visible to the end user. The findings land as a separate, government-tracked vulnerability in Microsoft 365 Copilot confirms that AI command injection enabling information disclosure is already a real-world problem, not just a lab exercise.

What is verified so far

The core evidence comes from a paper titled “Silent Egress: When Implicit Prompt Injection Makes LLM Agents Leak Without a Trace,” posted on arXiv with identifier 2602.22450. The researchers define “silent egress” as a form of implicit prompt injection delivered through URL previews, specifically through the titles, metadata, and snippets that an LLM agent automatically fetches when it encounters a link in an email or calendar invite. Because the malicious instructions live inside preview data rather than in the body text a human would read, neither the user nor most output-based safety filters catch the manipulation.

The experimental campaign ran 480 individual trials. The probability of successful data exfiltration, labeled P(egress), reached 0.89. Of the attacks that did succeed, 95 percent evaded output-based detection, meaning the agent’s visible responses gave no indication that data had been extracted. The paper also describes a “sharded-exfilt” technique, which splits stolen information across multiple small requests so that no single outbound call looks suspicious on its own. In scenarios where the agent is connected to email, calendars, or document repositories, even modest per-request leakage can add up to a significant breach over time.

Separately, the U.S. National Vulnerability Database tracks CVE-2025-32711, a vulnerability record characterizing AI command injection in Microsoft 365 Copilot that enables information disclosure over a network. The record, hosted by the National Institute of Standards and Technology, includes CVSS scoring and links to the Microsoft vendor advisory. While the NVD entry does not name specific model families, it confirms that production-grade AI assistants integrated with office productivity tools are already subject to the same category of attack the Silent Egress paper describes in controlled settings.

A related study, “EchoLeak: The First Real-World Zero-Click Prompt Injection Exploit in a Production LLM System,” appeared in the Proceedings of the AAAI Symposium Series, adding peer-reviewed weight to the claim that zero-click prompt injection is not theoretical. In that case, the researchers demonstrated that a user did not need to click or otherwise interact with a malicious artifact for the exploit to trigger; the system’s own background processing of content was enough. Together, these three records-an arXiv preprint with quantified results, a government vulnerability filing, and a peer-reviewed conference paper-form a converging evidence base that URL-driven and content-driven command injection can cross the boundary from research prototype to operational risk.

The hosting context matters as well. ArXiv operates as a long-running preprint server backed by institutional member organizations, and it has become a standard venue for early disclosure in computer security and machine learning. While papers there are not automatically peer-reviewed, their visibility encourages scrutiny, rebuttal, and follow-up work. That ecosystem helps explain why Silent Egress, despite being a preprint, is already influencing how security teams think about LLM-connected workflows.

What remains uncertain

Several gaps limit how far these findings can be generalized. The Silent Egress paper provides aggregate statistics but no publicly linked raw experimental logs or per-run data from the 480 trials. Independent replication has not been documented in any source reviewed here. Without access to the full dataset, outside researchers cannot yet confirm whether the 0.89 success rate holds across different agent configurations, prompt templates, or model versions, or whether the results are sensitive to particular implementation details in the tested systems.

The CVE-2025-32711 record applies to Microsoft 365 Copilot and describes a class of AI command injection that leads to information disclosure, but it does not directly reference the URL-preview mechanism highlighted in Silent Egress. The overlap between the two bodies of evidence is therefore conceptual rather than exact: both confirm that AI agents can be steered by hostile instructions embedded in seemingly benign content, yet they may involve different integration layers, mitigations, and trust boundaries. Whether Microsoft’s remediation fully addresses any analogous preview-based vector is not specified in the NVD summary.

No vendor advisories from OpenAI or Anthropic addressing this exact attack surface appear in the available source set. That absence does not mean the companies are unaware of the risk, but it does mean there is no on-the-record response to cite about how their agents handle URL previews, HTML metadata, or other untrusted context automatically fetched in the background. Likewise, no primary reproduction steps or defensive benchmarks from NIST or any other standards body have been published in connection with the Silent Egress methodology, leaving practitioners to extrapolate from the research description rather than follow a standardized test plan.

Another unknown is how often similar attacks are occurring in the wild. The EchoLeak work documents a real exploitation scenario in a production LLM system, but it does not quantify prevalence across providers or sectors. Silent Egress, by design, focuses on controlled experiments rather than incident forensics. Without coordinated disclosure reports, logs from affected organizations, or formal incident statistics, it is difficult to estimate how many deployments have already experienced silent exfiltration via URL previews or comparable channels.

How to read the evidence

Three tiers of evidence are in play, and they carry different levels of authority. The strongest is the NIST-hosted CVE record, a government-maintained database entry that assigns a formal identifier and severity score to a confirmed vulnerability in a shipping product. When NIST catalogs a flaw, vendors are expected to issue patches and enterprises are expected to assess exposure. That record alone establishes that AI command injection leading to information disclosure is an acknowledged, tracked security defect rather than a speculative concern.

The second tier is the Silent Egress paper itself. As an arXiv preprint hosted through Cornell-affiliated infrastructure, it has not yet undergone formal peer review, and its methods have not been independently replicated in public. Even so, its quantified results-480 runs, 0.89 egress probability, 95 percent evasion rate-are specific and testable, which makes them useful for security teams running their own red-team exercises. Organizations can use those figures as a rough baseline: if internal tests show similar success rates, that suggests a need for urgent mitigation; if defenses perform better, teams can still compare configurations to understand why.

The third tier consists of adjacent academic and industry reports, such as the EchoLeak study, that document zero-click prompt injection in other contexts. These works broaden the picture by showing that the basic mechanism-LLM agents consuming and acting on untrusted content they fetch automatically-recurs across platforms. However, each study is scoped to a particular system, and none of them, on their own, can validate how every commercial assistant behaves.

For enterprise decision-makers, the practical reading is cautious but not alarmist. The available evidence justifies treating URL previews, document summaries, and other automatically ingested snippets as part of the attack surface for any LLM-connected workflow. It also underscores that output-only monitoring is insufficient, because the most damaging exfiltration can occur through background network calls or subtle instruction-following that never surfaces in a chat transcript. At the same time, the lack of broad replication and standardized testing means that precise risk estimates will vary by implementation.

In the absence of detailed vendor guidance, organizations can still take concrete steps. Limiting which domains agents are allowed to fetch previews from, logging and rate-limiting outbound requests, and separating high-sensitivity data stores from general-purpose assistants all reduce the blast radius of a successful injection. Security teams should also plan to revisit their posture as more peer-reviewed work appears and as standards bodies refine their recommendations. Supporting the broader research ecosystem, including community-driven infrastructure that relies in part on individual contributions, helps ensure that vulnerabilities like Silent Egress are surfaced early enough for defenders to adapt.

More from Morning Overview

*This article was researched with the help of AI, with human editors creating the final content.


More in Cybersecurity