Hackers just hid their phishing trap inside the Markdown links and images ChatGPT renders for you — turning the AI assistant itself into the attack channel

Ask ChatGPT to summarize a web page and you expect a tidy set of bullet points, maybe a helpful link or two. What you probably do not expect is a phishing trap baked into those links, placed there not by the AI’s designers but by an attacker who poisoned the page before you ever pasted the URL. Two peer-reviewed research efforts published on arXiv have now demonstrated exactly that scenario, showing how hidden instructions buried in a web page’s HTML can hijack an AI assistant’s Markdown output and turn it into a delivery vehicle for credential theft.

The findings, which security professionals have been circulating since early 2025 and which remain unpatched in several major platforms as of June 2026, raise a pointed question: if the content an AI renders for you can be manipulated before you see it, how much should you trust what appears on your screen?

How the attack works, step by step

The core mechanic is indirect prompt injection, a technique in which an attacker embeds instructions inside content that a language model will later process. Here is what the chain looks like in practice:

An attacker publishes or compromises a web page, inserting hidden HTML elements. These might be invisible <div> blocks, off-screen text, or metadata fields that a human visitor would never notice but that a language model will ingest as part of the page’s content.
A user pastes the page’s URL into ChatGPT (or a similar tool) and asks for a summary.
The model reads the page, including the hidden instructions. Those instructions tell it to include a specific Markdown link, formatted to look like a legitimate citation or reference, with anchor text such as “official source” or “full report.”
ChatGPT renders the summary. The phishing link appears as a normal, clickable hyperlink. Because the interface styles it identically to any other link the model might produce, there is no visual warning.
The user clicks. The URL resolves to a credential-harvesting page, a malware download, or a tracking beacon that leaks the user’s IP address and browser fingerprint.

The same logic applies to images. A hidden instruction can tell the model to embed a Markdown image tag pointing to an attacker-controlled server. When the chat interface loads that image, the HTTP request itself becomes a data exfiltration channel, transmitting metadata about the user without any click required.

What the research actually proves

The first study, “Decoding Latent Attack Surfaces in LLMs: Prompt Injection via HTML in Web Summarization,” built a reproducible evaluation benchmark to measure how reliably HTML-based prompt injections alter a model’s summarization output. The researchers found that hidden HTML directives consistently slipped past safety filters and appeared as benign-looking Markdown in the user-facing response. Because the benchmark is public, other teams can replicate the experiments and verify the results, a level of methodological transparency that sets this work apart from one-off blog demonstrations.

The second study, “Prompt Injection Attack against LLM-integrated Applications,” took a broader approach. Using black-box methods, meaning no access to a model’s internal weights, the researchers showed that simple injection payloads succeed across multiple LLM platforms and configurations. Their conclusion: the vulnerability is structural, not product-specific. Any application that feeds untrusted external content into a language model and then renders the output is exposed.

Taken together, the two papers establish that the attack is reproducible, cross-platform, and does not require sophisticated tooling. What they do not establish is how often it happens in the wild.

The gap between lab and field

No public incident report has confirmed that this technique has been used against real users at scale. OpenAI, Anthropic, and Google have not released telemetry showing how frequently Markdown-rendered phishing links appear in live conversations, and no incident response team has published a case study documenting a successful campaign delivered through an AI chat interface.

That gap matters. A working proof of concept tells defenders what to prepare for. It does not tell them how many attacks are already underway. Security teams evaluating this threat should treat it as a validated, high-plausibility risk rather than a confirmed, high-frequency one.

Some LLM providers have begun tightening Markdown rendering. Restrictions on loading external images, warnings on outbound links, and content-stripping filters have appeared in various products over the past year. But none of these mitigations have been independently benchmarked against the specific injection techniques described in the research, so whether they hold up under adversarial pressure remains an open question.

Why users fall for it

Traditional phishing relies on impersonating a trusted sender. This variant skips that step entirely. The “sender” is the user’s own AI assistant, a tool they chose to open, in a conversation they initiated. The output is styled in the same clean Markdown the model always uses. There is no misspelled domain in a suspicious email, no unexpected attachment, no sender address to scrutinize. The trust signal is baked into the interface itself.

Research on user behavior with AI tools, while still emerging, consistently shows that people treat AI-generated content as pre-vetted. When a summary arrives with neatly formatted links, the assumption is that the model has done some form of quality control. In reality, the model is transforming whatever it ingested, and a well-crafted injection rides along with that transformation invisibly.

What you can do right now

For individual users, the defensive playbook borrows from email security basics but applies them to a new context:

Hover before you click. Check the actual URL behind any link in an AI-generated summary. If the domain does not match the source you asked about, do not click.
Question unexpected links and images. If a summary includes a resource you did not ask for, treat it as suspicious until you verify the destination independently.
Avoid pasting sensitive URLs into public AI tools. If the page contains confidential data, summarizing it through a third-party model introduces risks beyond phishing, including data leakage to the model provider.

For organizations integrating LLMs into internal workflows, the stakes are higher and the mitigations more technical:

Strip or sandbox external content before it reaches the model. Pre-processing that removes hidden HTML elements, unusual attributes, and script-like patterns reduces the injection surface.
Separate trust levels. System prompts and tool configurations should be architecturally isolated from untrusted input such as web page content. The more clearly a system distinguishes between “instructions I trust” and “content I’m analyzing,” the harder injection becomes.
Log and monitor AI-generated links. If your platform renders clickable URLs from model output, instrument those clicks. Correlating them with threat intelligence feeds can surface abuse before it scales.
Demand transparency from vendors. Ask LLM providers whether they track injection attempts in production, what mitigations they have deployed, and whether those mitigations have been independently tested.

A threat that grows with adoption

Prompt injection is not new. Researchers have been documenting variations since at least 2023, and the OWASP Top 10 for Large Language Model Applications lists it as the number-one risk. What has changed is the delivery surface. As ChatGPT and its competitors have added browsing capabilities, Markdown rendering, and tool integrations, the path from a poisoned web page to a user’s screen has shortened dramatically. The attacker no longer needs to trick someone into opening an email. They just need to poison a page that someone, somewhere, will eventually ask an AI to summarize.

The research is clear, the proof of concept is reproducible, and the defenses are still catching up. Whether this channel sees widespread abuse in the coming months depends on how quickly vendors harden their rendering pipelines and how fast users learn to extend their phishing instincts to a new kind of interface. For now, the safest assumption is a simple one: if an AI generated the link, verify it yourself before you click.

More from Morning Overview

*This article was researched with the help of AI, with human editors creating the final content.

IG

FB

PIN

LI

X