Google appears to be building its most ambitious AI video generator yet. Multiple industry reports have pointed to an internal project called Omni, described as a Gemini-based system capable of producing synthetic footage so realistic that viewers cannot reliably distinguish it from material shot on a physical camera. If those reports prove accurate, and if Google showcases the system at its annual I/O developer conference in May 2026, it would mark a significant escalation in a race that already includes OpenAI’s Sora, Runway’s Gen-4, and several Chinese competitors like Kling and Hailuo.
None of this has been officially confirmed. Google has not published a technical paper, released a demo, or even acknowledged the Omni name publicly. But a growing body of independent research suggests the underlying capability is not only plausible but already here, at least in controlled settings. And Google’s own track record with video generation, from Lumiere in early 2024 through the rapid Veo, Veo 2, and Veo 3 releases, shows a company that has been compressing years of progress into months.
The research that makes the claim plausible
The most concrete evidence that AI video has crossed a critical realism threshold comes not from Google but from an academic study titled “Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans?” Published as a preprint on arXiv, the paper tested whether current AI video generators could produce clips that pass as authentic when judged by both human evaluators and advanced vision-language models (VLMs).
The researchers deliberately chose ASMR content for their test bed. ASMR videos rely on extreme close-ups, subtle textures, and fine motor movements (hands folding fabric, liquid pouring into a glass, fingernails tapping on surfaces) that tend to expose the telltale artifacts of synthetic media. If AI-generated footage can survive scrutiny in a genre that punishes rendering flaws, the researchers reasoned, then less visually demanding formats like landscape B-roll or talking-head clips would be even easier to fake.
Their core finding was striking: under structured evaluation conditions that mimicked real-world viewing behavior (limited time, no side-by-side comparisons), both human judges and automated classifiers struggled to reliably separate AI-generated clips from genuine footage. The study did not test Omni or any Google system. But it established an empirical baseline showing that the broader field has already reached a point where synthetic video can fool trained observers some of the time, under realistic conditions.
That baseline matters because it transforms the conversation around Omni from pure speculation into something grounded. If publicly available or academically accessible tools can already clear this bar, a well-resourced effort from Google’s DeepMind division, with access to massive compute, proprietary training data, and the Gemini model family, could plausibly push further.
Google’s video AI trajectory
Even without confirmed details about Omni, Google’s public record in video generation provides important context. The company’s DeepMind division released Lumiere in early 2024, a research model that demonstrated temporally coherent video generation. Later that year, Google introduced Veo, followed quickly by Veo 2, which offered higher resolution and longer clip durations. By 2025, Veo 3 had arrived with native audio generation, producing video with synchronized sound in a single pass.
Each iteration closed specific gaps. Veo addressed basic coherence. Veo 2 improved visual fidelity and motion consistency. Veo 3 tackled the audio-visual synchronization problem that had made earlier AI video feel uncanny even when individual frames looked convincing. If Omni represents the next step in that sequence, the logical capability jump would involve further gains in photorealism, longer generation windows, and possibly real-time or near-real-time output.
Google has also invested in detection infrastructure alongside generation. Its SynthID watermarking system, developed by DeepMind, embeds imperceptible signals into AI-generated content that can be detected by automated tools even after the media has been cropped, compressed, or re-encoded. The existence of SynthID suggests Google is aware that releasing increasingly realistic video generators without safeguards would create serious trust and safety problems. Whether SynthID or a successor would ship alongside Omni remains unknown.
The competitive pressure
Google is not working in isolation. OpenAI’s Sora, first previewed in early 2024 and gradually rolled out through 2025, demonstrated that large language model architectures could be adapted for high-quality video generation. Runway, a startup that has been iterating on commercial video tools since 2023, released Gen-4 with improved temporal consistency and style control. Chinese companies including Kuaishou (Kling) and MiniMax (Hailuo) have shipped video generators that rival or exceed Western competitors on certain benchmarks, often with fewer usage restrictions.
This competitive environment creates pressure to announce and ship quickly. It also raises the stakes for any I/O 2026 reveal. A demonstration that merely matches what Sora or Kling can already do would land as a disappointment. For Omni to justify the anticipation building around it, Google would likely need to show a meaningful leap in either quality, controllability, or integration with its broader product ecosystem (YouTube, Google Photos, Workspace).
What we still do not know
The list of unknowns is long. No official Google source has confirmed the Omni name, its connection to Gemini, or a planned I/O 2026 debut. The secondary reports that introduced the project have not been accompanied by technical documentation, benchmark results, or independent verification. Key questions remain open:
- Architecture: Is Omni a single unified model or a pipeline of specialized components? Some reports describe it one way, some the other, and the distinction matters for both capability and latency.
- Training data: What footage was used to train the system? Models trained on licensed or synthetic data carry different legal and ethical profiles than those trained on scraped web video. Google has faced scrutiny over training data practices before.
- Resolution and duration: Can Omni generate footage at 4K or higher? For how many seconds or minutes? Current commercial tools typically max out at 720p or 1080p for clips under 30 seconds.
- Access model: Will Omni be a consumer product, a developer API, an enterprise tool, or some combination? The answer shapes who benefits and who faces disruption.
Until Google publishes technical details or stages a public demonstration, the strongest factual anchor remains the academic research showing that the broader field has reached a capability level where synthetic video can pass as real under targeted evaluation. Omni may exceed that level. It may fall short. There is no way to know from the outside.
Why the realism threshold matters beyond tech demos
The practical consequences of video that passes for real extend well beyond developer conferences. Journalism, legal evidence, insurance claims, political advertising, and social media trust all depend on the assumption that video footage corresponds to something that actually happened in front of a lens. Once that assumption breaks down at scale, the effects ripple outward.
Detection tools exist, but the arXiv study’s findings suggest they are not a reliable backstop. If vision-language models, which represent some of the most capable automated classifiers available, cannot consistently identify synthetic clips, then platforms relying on automated moderation face a growing gap between what gets uploaded and what gets flagged. Human moderators fare no better under the time-constrained conditions the study tested.
Industry-wide efforts like the Coalition for Content Provenance and Authenticity (C2PA) are working on provenance standards that attach verifiable metadata to media at the point of creation. Google, Adobe, Microsoft, and several camera manufacturers are members. But provenance only works if the entire chain, from capture device to distribution platform to end viewer, supports and enforces the standard. That infrastructure is still being built.
For now, the safest way to interpret bold claims about Omni or any similar system is to treat them as projections built on demonstrated research trends. The evidence shows that AI video is already good enough to fool people and machines some of the time, under some conditions. It does not yet show that any single commercial product has rendered the line between synthetic and real footage meaningless. As new papers, demos, and product announcements surface through the rest of 2026, the task for anyone consuming this coverage will be the same: separate what has been rigorously measured from what is merely promised.
More from Morning Overview
*This article was researched with the help of AI, with human editors creating the final content.