Two out of three companies using AI tools have already experienced a data leak connected to those systems, and fewer than one in four have written security policies designed to address the risk. Those findings, drawn from Metomic’s 2025 State of Data Security Report and reinforced by the Purple Book Community’s State of AI Risk Management 2026 survey, expose a widening gap between the speed at which organizations adopt generative AI and the pace at which they build safeguards around it. Separate telemetry from Harmonic Security adds a concrete dimension: 22 percent of all files and 4.37 percent of prompts employees submit to generative AI tools contain sensitive data, meaning the volume of exposed information is already large and growing.
Why the 68 Percent Leak Rate Demands Immediate Action
The core tension is straightforward. Organizations rolled out AI assistants, code generators, and chat-based productivity tools faster than their security teams could write rules for them. Metomic’s data security survey found that 68 percent of organizations had suffered an AI-related data leak, yet only 23 percent maintained proper AI data security policies. That ratio, roughly three leaking companies for every one with a tailored policy, suggests that generic information-security frameworks are not catching the new ways data escapes through AI workflows.
A working hypothesis tested across the available evidence is that companies publishing and enforcing AI-specific data-handling policies within six months of tool rollout experience at least 40 percent lower measured leakage rates than peers relying on existing general policies. The data does not yet confirm that exact threshold, but the direction is consistent. The 23 percent of organizations with dedicated AI policies represent the minority that recognized early that general controls, built for email and cloud storage, do not account for the way large language models ingest, store, and sometimes regurgitate sensitive inputs. The remaining 77 percent are, in effect, running AI tools under rules designed for a different threat model.
Harmonic Security’s analysis sharpens the picture. When employees interact with generative AI tools, its telemetry findings show that 22 percent of files they upload contain sensitive data, and 4.37 percent of prompts carry sensitive content as well. Those numbers mean that on any given workday, a meaningful share of confidential records, customer details, or proprietary code is being fed into systems whose data retention and training practices vary widely across vendors. Without AI-specific guardrails, each prompt becomes a potential leak vector.
Three Data Sources That Anchor the 68 Percent Finding
The headline statistic traces back to Metomic’s 2025 State of Data Security Report, a survey-based study released in spring 2025. Metomic, a data security vendor, surveyed organizations about their AI tool usage and security posture. The 68 percent leak figure and the 23 percent policy figure both originate from that report. Because Metomic sells data security products, readers should weigh the findings with that commercial interest in mind, though the numbers have not been contradicted by independent research.
The Purple Book Community’s 2026 survey, released with ArmorCode as the State of AI Risk Management 2026, provides a second benchmark from a community of security leaders. Its research summary focuses on how organizations govern AI risk and aligns with the broader pattern: rapid adoption, lagging governance. The report indicates that many organizations are experimenting with generative AI across software development, customer service, and internal operations while still relying on ad hoc controls and informal norms to manage risk. However, the limited visibility into its full methodology and respondent demographics constrains how far analysts can generalize its conclusions.
Harmonic Security’s contribution is different in kind. Rather than surveying executives about whether leaks occurred, Harmonic analyzed actual usage telemetry from employees interacting with generative AI tools. The 22 percent file sensitivity rate and 4.37 percent prompt sensitivity rate come from observed behavior, not self-reporting. That distinction matters because self-reported breach data tends to undercount incidents that go undetected or are not escalated internally. Telemetry-based studies can still miss context-such as whether data was shared with a vendor using strict enterprise controls-but they reduce the bias that comes from reputation concerns and uneven incident detection capabilities.
Open Questions About AI Leak Measurement and Policy Gaps
Several pieces of the puzzle are still missing. Metomic’s full methodology, including sample size, industry mix, and how respondents defined an “AI-related data leak,” has not been disclosed in granular detail. Without that, it is difficult to know whether the 68 percent figure reflects a broad cross-section of enterprises or is weighted toward early adopters in data-intensive sectors such as technology and financial services. Similarly, the 23 percent figure for organizations with AI-specific policies does not differentiate between minimal acceptable-use guidelines and robust, enforceable frameworks backed by monitoring and training.
The Purple Book Community’s research raises its own questions. Because the public summary emphasizes themes-like the need for better AI governance and the prevalence of shadow AI projects-without publishing detailed statistics, readers cannot directly map its findings against Metomic’s numbers. It is plausible that security leaders participating in a specialized community are more mature than the average organization, meaning the gap between AI use and AI governance in the broader market could be even wider than suggested.
Even Harmonic Security’s telemetry, while concrete, captures only a slice of the risk surface. The analysis focuses on content employees explicitly send to generative AI tools, not on downstream uses of that content. Sensitive information might be redacted or transformed before submission, or the AI vendor might apply strong isolation and retention controls that reduce the chance of later exposure. Conversely, the telemetry does not capture model outputs that could inadvertently reconstruct or reveal sensitive data learned from prior interactions.
These measurement gaps complicate efforts to benchmark progress. A company might report “no AI-related leaks” simply because it has not yet detected any, not because risky behavior is absent. Another might classify an incident as a policy violation rather than a data leak, keeping it off official tallies. Until there is more standardized reporting around AI incidents-similar to what has emerged for traditional data breaches-headline numbers like 68 percent should be treated as directional signals rather than precise indicators.
Closing the Policy and Practice Gap
Despite the uncertainties, the convergence of survey data and telemetry points clearly toward the need for stronger safeguards. Organizations that have not yet drafted AI-specific data security policies should prioritize a few foundational steps. First, they need an inventory of where and how employees are using generative AI, including sanctioned tools, embedded AI features inside existing platforms, and unsanctioned web services. Without that visibility, policies risk being either too restrictive to follow or too narrow to matter.
Second, policies should explicitly define what types of sensitive data may never be entered into external AI systems, and under what conditions internal AI services may process confidential information. Clear examples-such as customer identifiers, payment data, source code, and internal strategy documents-help employees translate abstract rules into day-to-day decisions. Training programs should reinforce these boundaries with realistic scenarios that mirror how staff actually use AI tools to write emails, analyze data, or draft code.
Third, technical controls need to match the written rules. That can include data loss prevention filters on AI-related traffic, enterprise configurations that disable model training on customer data, and approval workflows for connecting internal systems to AI agents. For high-risk functions, organizations may choose to deploy private models or vendor offerings that keep data within a dedicated environment, reducing exposure to multi-tenant training pipelines.
Finally, incident response plans must evolve to include AI-specific playbooks. When a potential AI-related leak is discovered-such as an employee pasting a customer list into a public chatbot-teams should know how to assess the scope, engage the vendor, and determine notification obligations. Lessons learned from each incident can feed back into policy updates, training, and tooling improvements.
The emerging evidence paints a consistent picture: generative AI is already intertwined with everyday work, and sensitive data is flowing through these systems at scale. While the exact leak rate may shift as methodologies improve, the direction of travel is clear enough to justify immediate action. Organizations that move quickly to align policies, training, and technical safeguards with the realities of AI use will be better positioned to harness its benefits without adding avoidable data exposure to their risk ledger.
More from Morning Overview
*This article was researched with the help of AI, with human editors creating the final content.