Researchers warn of Vertex AI agent flaw that could expose cloud data and code

Security researchers have identified a vulnerability in Google’s Vertex AI agent framework that could allow attackers to extract sensitive cloud data and proprietary code through indirect prompt injection. The flaw, which affects AI agents operating within cloud environments, raises serious questions about the readiness of agentic AI systems for enterprise deployment. As organizations increasingly hand off automated workflows to AI-powered tools, the discovery exposes a gap between the speed of adoption and the maturity of defenses protecting these systems.

How Indirect Prompt Injection Threatens AI Agents

The core issue lies in a class of attacks known as indirect prompt injection, where malicious instructions are embedded in external data sources that an AI agent processes during routine operations. Unlike direct prompt injection, which requires an attacker to interact with the AI system through its input interface, indirect attacks are far more insidious. A poisoned document, a manipulated web page, or a compromised database entry can silently redirect an agent’s behavior without any visible user interaction.

For Vertex AI agents, which operate within Google Cloud and can access storage buckets, code repositories, and enterprise databases, the risk is not abstract. An agent that retrieves and acts on external data could be tricked into exfiltrating credentials, leaking proprietary source code, or forwarding sensitive customer records to an attacker-controlled endpoint. The attack surface grows with every integration point the agent touches, turning the very connectivity that makes these tools useful into a liability.

A preprint paper on web agent security provides a systematic evaluation framework for measuring how web-based AI agents perform under exactly these conditions. The research offers a structured benchmark for testing whether agents can resist indirect prompt injection across realistic scenarios, filling a gap in how the security community assesses agentic AI systems.

Benchmarking Reveals Weak Defenses

The benchmarking approach is significant because it moves the conversation from theoretical warnings to measurable results. Rather than relying on anecdotal demonstrations or one-off proof-of-concept exploits, the research applies systematic evaluation methods to determine how reliably agents can be manipulated. This kind of empirical rigor has been largely absent from the public debate over AI agent security, and vendor assurances often outpace independent testing.

What the benchmark reveals is that even agents built on state-of-the-art language models struggle to distinguish between legitimate instructions and injected commands when those commands arrive through trusted data channels. The fundamental design of most agentic systems treats retrieved content as informational input rather than potentially adversarial code. That architectural assumption is the root cause of the vulnerability, and no amount of prompt engineering or output filtering has yet proven sufficient to close it entirely.

This matters for Vertex AI specifically because Google markets the platform as an enterprise-grade solution for building and deploying AI agents that interact with cloud infrastructure. Businesses using these agents to automate tasks like data analysis, code review, or customer service are implicitly trusting that the agent will not act against their interests when processing external inputs. The benchmarked results suggest that trust may be premature, especially in environments where agents are granted broad access to internal systems.

Supply-Chain Risk in Connected Cloud Services

Most security analyses of prompt injection treat each agent as an isolated system. But in real enterprise environments, Vertex AI agents rarely operate alone. They connect to BigQuery datasets, interact with Cloud Storage, pull from version control systems, and trigger downstream workflows. A single compromised agent could serve as an entry point for lateral movement across an organization’s cloud infrastructure.

Consider a scenario where an AI agent tasked with summarizing internal documents encounters a poisoned file in a shared drive. If the injected instructions direct the agent to copy sensitive data to an external location, the breach would not stop at the contents of that one file. The agent’s access permissions, which typically mirror those of the service account running it, could expose every resource that account can reach. In cloud environments with permissive identity and access management configurations, that scope can be enormous.

This supply-chain dimension is largely unaddressed by current benchmarking approaches, which focus on individual agent-level evaluation rather than cascading effects across interconnected services. The gap is not a criticism of the research itself but rather a signal that the problem is deeper than any single study can capture. Organizations deploying Vertex AI agents need to think beyond agent-level defenses and consider the blast radius of a successful injection attack across their entire cloud footprint.

Google’s Response and the Disclosure Gap

No official public statement from Google addressing the specific Vertex AI vulnerability has been identified in available reporting. This silence is notable given that Google has previously engaged with the security research community on prompt injection issues and has published its own work on AI safety. The absence of a vendor response makes it difficult to assess whether patches, mitigations, or architectural changes are in progress.

Without direct input from Vertex AI developers, the security community is left to rely on academic analysis and independent testing. That reliance creates an information asymmetry. Enterprises making purchasing and deployment decisions about Vertex AI agents may not have access to the same risk assessments that researchers are producing. Google’s AI safety documentation discusses general principles for responsible AI deployment, but those guidelines do not specifically address the indirect prompt injection attack vectors that researchers have demonstrated.

The disclosure gap also raises a broader question about accountability. When a cloud platform markets AI agents as ready for enterprise use, customers reasonably expect that known vulnerability classes have been addressed or at least disclosed. Indirect prompt injection is not a novel threat; it has been discussed in AI security literature for years. The fact that it remains exploitable in production-grade agent frameworks suggests that defensive measures have not kept pace with deployment timelines.

What Enterprises Should Evaluate Now

For organizations already running or planning to deploy Vertex AI agents, the research findings point to several concrete areas that deserve immediate attention:

Review the permissions granted to service accounts that power AI agents. Apply the principle of least privilege so that a compromised agent cannot access resources beyond its immediate task scope.
Audit the data sources that agents consume. Any external or shared input channel is a potential injection vector, and organizations should treat agent-accessible data with the same scrutiny applied to executable code.
Implement monitoring for anomalous agent behavior, particularly unexpected data transfers, API calls to external endpoints, or access patterns that deviate from the agent’s intended function.
Avoid granting agents write access to sensitive systems unless the workflow explicitly requires it, and pair any necessary write capabilities with compensating controls such as approval gates or human-in-the-loop review.
Segment cloud resources so that even if an agent is compromised, its ability to move laterally across projects, datasets, or environments is constrained by network and identity boundaries.

Enterprises should also pressure vendors for clearer security roadmaps. That includes requesting detailed documentation on how Vertex AI handles untrusted inputs, what safeguards exist against indirect prompt injection, and how quickly the platform can respond to newly discovered attack patterns. In regulated industries, those assurances may need to be formalized in contracts or compliance attestations rather than left to marketing materials.

Toward More Resilient Agent Architectures

Longer term, the Vertex AI vulnerability underscores the need for architectural changes in how AI agents are integrated into cloud environments. Treating agent prompts as a kind of executable policy rather than mere text could enable stricter validation, sandboxing, and allowlisting of permissible actions. Separating the components that interpret natural language from those that execute high-privilege operations would reduce the likelihood that a single injected instruction can trigger a catastrophic outcome.

Security teams will also need new testing practices. Traditional penetration testing focuses on network services and application endpoints; agentic systems require red-teaming of data flows, content pipelines, and the logic that binds language model outputs to operational actions. Benchmarks that simulate realistic multi-step attacks, including supply-chain style compromises, can help organizations understand where their defenses are brittle before attackers discover those same weaknesses.

For now, the safest assumption is that any AI agent capable of reading untrusted data and acting on cloud resources is vulnerable to some form of indirect prompt injection. Until vendors like Google provide transparent, verifiable mitigations, enterprises deploying Vertex AI agents should treat them as high-risk components and design their cloud architectures accordingly. The promise of agentic automation remains compelling, but the current generation of tools appears to be running ahead of the security controls needed to use them safely.

More from Morning Overview

*This article was researched with the help of AI, with human editors creating the final content.

IG

FB

PIN

LI

X