San Diego startup pitches fix after AI agent exposed Meta user data

A San Diego cybersecurity startup called Manifold Security is betting that the AI industry needs a new layer of defense after a string of incidents that exposed how autonomous agents can leak sensitive user data. The company announced an $8 million seed round just days before a separate, confirmed internal mishap at Meta revealed that an AI agent’s advice led to a two-hour exposure of company and user data to employees. Together, these events illustrate a growing blind spot. The agents companies are rushing to deploy can act on data they were never meant to share.

How a GraphQL Flaw Leaked Meta Users’ AI Conversations

The headline risk is not theoretical. Cybersecurity firm AppSecure discovered a flaw in Meta’s AI system that allowed attackers to extract users’ private prompts and responses through GraphQL query and parameter manipulation. The technique exploited the way Meta’s backend handled API requests, letting an outside party read what users had typed into the AI assistant and what the system had answered. Meta acknowledged the vulnerability and paid AppSecure a $10,000 bounty through its bug-reporting program.

What makes this flaw alarming is its simplicity. GraphQL is a standard query language used across thousands of web applications, and the manipulation AppSecure described did not require sophisticated zero-day exploits or insider access. It required understanding how Meta’s API parameters could be twisted to return data belonging to other users. For anyone who has asked Meta’s AI assistant a personal question, typed a medical symptom, or shared a business idea in a prompt, this kind of exposure turns a convenience tool into a privacy liability.

The incident also underscores how conversational interfaces can mask serious risk. Users often treat AI chats as private and ephemeral, forgetting that each prompt and response is stored and routed through complex backends. When those backends can be tricked into returning another person’s data, the result is not just a technical bug but a breach of trust that is difficult to reverse. Once sensitive prompts are exfiltrated, there is no practical way to claw them back from whoever accessed them or any systems where they may have been copied.

Meta’s Internal Agent Mishap Adds a Second Data Point

The AppSecure discovery involved an external attacker path. A separate incident inside Meta itself shows the risk from the other direction, an AI agent acting on its own authority within a trusted environment. According to reporting confirmed by Meta, an AI agent responded to an engineering question posted on an internal company forum. The advice it gave led to actions that exposed sensitive company and user data to Meta employees for roughly two hours before the leak was contained.

Meta confirmed the incident but has not disclosed the volume of records exposed or the specific type of user data involved. The two-hour window is significant because internal data-access events at a company of Meta’s scale can ripple outward. Employees with access to leaked records could, in theory, copy, screenshot, or forward that information before containment. The episode demonstrates that even when an AI agent operates inside a corporate perimeter with no malicious intent, its outputs can override the access controls that were supposed to keep data compartmentalized.

This kind of failure is harder to address with traditional security playbooks. There is no obvious exploit to patch, no malicious IP address to block. The system worked as configured: the agent had access, gave advice, and a human followed it. The problem lies in how much discretion is delegated to autonomous or semi-autonomous systems without guardrails that understand context, sensitivity, and downstream consequences.

Researchers Define the Problem as “Data Over-Exposure”

Academic researchers have started giving this failure mode a formal name. A preprint paper titled AgentRaft, published on arXiv, introduces the concept of “data over-exposure” in large language model agents and evaluates how well automated systems can detect it. The research examines agent-tool environments where an LLM interacts with external tools, databases, or APIs, and measures how often those interactions result in data being surfaced beyond its intended scope.

The AgentRaft framework matters because it shifts the conversation from anecdotal incidents to measurable risk. Rather than treating each leak as a one-off bug, the researchers treat over-exposure as a systemic property of how LLM agents request and relay information. The paper’s quantitative evaluation tests detection across a range of tool configurations, offering a benchmark that security vendors and platform operators could use to stress-test their own deployments.

The work comes out of a research community connected to institutions such as Cornell, and the preprint is hosted on the arXiv member-supported platform that has become a central distribution channel for AI safety research. Maintaining that infrastructure depends in part on community backing, and the operators encourage ongoing donations from supporters as well as engagement with its help resources for authors and readers.

By framing over-exposure as a formal property, AgentRaft provides a vocabulary that connects incidents like the Meta leak to broader design patterns. It suggests that any system where agents can call tools or query databases needs continuous monitoring of what data is requested, what is returned, and what is ultimately surfaced to end users.

Manifold Security’s Bet on Runtime Detection

This is the gap Manifold Security says it can fill. The San Diego startup announced an $8 million seed funding round led by Costanoa Ventures, with the money directed toward building what the company calls Agentic AI Detection and Response, or AIDR. The product is designed to monitor autonomous AI agents at runtime, meaning it watches what agents do while they are actively operating rather than scanning code before deployment or auditing logs after a breach.

The runtime angle is a direct response to the kind of failures seen at Meta. Traditional security tools focus on perimeter defense or static code analysis. Neither approach catches an AI agent that is authorized to access a database but then surfaces records it should not have included in a response. Manifold’s pitch is that AIDR sits between the agent and its environment, flagging or blocking actions that cross data-access boundaries in real time.

In principle, such a system could recognize when an agent is about to include too many fields from a user profile, or when it attempts to join datasets in a way that violates internal policy. It might also detect anomalous tool calls that suggest prompt injection, compromised credentials, or misuse of privileged APIs. The goal is not to replace existing security layers but to add a specialized monitor tuned to the behavior of agents themselves.

Whether the product delivers on that promise remains unproven at this stage. Manifold is early in its lifecycle and has not published independent benchmarks or third-party audits of AIDR’s detection accuracy. For now, the company is selling a vision aligned with the concerns raised by AgentRaft and by Meta’s recent incidents, that agent behavior must be governed at the moment of action, not only at design time.

Why Bounties and Patches Are Not Enough

The $10,000 bounty Meta paid AppSecure is standard practice for responsible disclosure, but bounty programs are reactive by design. They reward researchers for finding flaws that already exist in production systems, which means users are exposed until someone outside the company notices the problem and reports it. The Meta internal agent incident did not even involve an external vulnerability. It was an AI system doing what it was asked to do, just with consequences no one anticipated.

This distinction matters for how companies should think about agent security going forward. Patch management, perimeter firewalls, and access controls remain necessary, but they assume that the main threats are either human attackers or static software bugs. Autonomous agents blur that line. They can generate their own queries, chain together tools, and reinterpret instructions in ways that designers did not foresee. When such systems sit close to sensitive data, the risk is not only exfiltration but also inadvertent over-sharing inside the organization.

Proponents of runtime monitoring argue that the industry needs to treat agent behavior the way it treats network traffic or endpoint activity: as a continuous stream to be analyzed for anomalies and policy violations. That implies new kinds of observability, new taxonomies of what counts as “sensitive,” and new feedback loops so that when an agent crosses a line, the system can intervene immediately.

A Turning Point for Agent Safety

The combination of a relatively simple GraphQL flaw and a high-profile internal mishap at Meta has given concrete shape to what might otherwise feel like abstract AI risk. Agent-centric research like AgentRaft, and startups like Manifold that are trying to commercialize runtime defenses, suggest that a distinct discipline of agent safety is emerging alongside traditional cybersecurity.

For organizations experimenting with AI assistants, copilots, and autonomous workflows, the message is increasingly clear: securing the infrastructure is not enough. The behavior of the agents themselves—what they ask for, what they see, and what they reveal—has become a first-class security concern. Companies that ignore that shift may find that their next breakthrough AI feature doubles as their next breach headline.

More from Morning Overview

*This article was researched with the help of AI, with human editors creating the final content.

IG

FB

PIN

LI

X