AI can now spin a minute of photorealistic video from a single sentence — and experts warn it’s getting genuinely hard to tell what’s real

In February 2024, a finance worker in Hong Kong wired $25 million to fraudsters after joining a video call in which every other participant, including someone who appeared to be the company’s chief financial officer, turned out to be a deepfake. The episode was jarring, but the tools behind it were primitive compared with what is available now. As of mid-2026, AI video generators can produce photorealistic, minute-long clips from a single typed sentence, and the research community is sounding an alarm: the software built to catch these fakes is falling further behind the software that creates them.

A new generation of synthetic video

OpenAI’s Sora, which launched publicly in December 2024, demonstrated that a short text prompt could yield up to 60 seconds of fluid, visually convincing footage. It was not alone. Google’s Veo, Runway’s Gen-3 Alpha, and several open-source projects have pushed output quality to the point where casual viewers struggle to spot artifacts. What once required a visual-effects studio and weeks of rendering now takes a browser tab and a few minutes of patience.

The speed of improvement matters because each leap in generation quality puts new pressure on detection systems. And according to two independent research efforts published over the past year, those systems are buckling.

Detection tools are losing ground

The most direct evidence comes from a multi-modal benchmark study titled “Deepfake-Eval-2024,” posted as an arXiv preprint in early 2025. The researchers assembled a test set not from lab-generated samples but from deepfakes that actually circulated online during 2024, spanning video, audio, and image formats. When they ran leading detection models against this “in-the-wild” collection, AUC scores (area under the receiver operating characteristic curve, the standard measure of how well a classifier separates real from fake) dropped substantially compared with results on older, curated benchmarks. In plain terms, detectors that looked reliable under controlled conditions performed far worse against the kind of synthetic media people actually encounter on social platforms.

A separate federal effort reached a similar conclusion. The National Institute of Standards and Technology, through its Forensics@NIST 2024 program, evaluated analytic tools against current-generation AI deepfakes in a paper titled “Guardians of Forensic Evidence.” The study found that rising production quality is straining existing forensic methods. NIST now treats deepfake analysis as a core technical challenge, housing the work within its broader Computer Security Resource Center.

Two independent teams, one academic and one federal, using different datasets and institutional frameworks, converging on the same warning carries real weight. Detection infrastructure is not keeping pace with generation capability.

Regulators are moving, but enforcement lags

Governments on both sides of the Atlantic have shifted from general warnings to concrete rules, though the gap between publishing a regulation and enforcing it remains wide.

In the United States, the Federal Trade Commission issued a supplemental notice of proposed rulemaking in February 2024 that specifically targeted AI impersonation of individuals. The proposal would expand existing protections to cover AI-generated content used in scam operations, and the agency has directed consumers to report suspected misuse through ReportFraud.ftc.gov. As of June 2026, the rulemaking process has advanced through public comment, but a finalized rule with binding enforcement teeth has not yet been confirmed in the Federal Register. Until it is, the FTC’s existing authority under Section 5 remains the primary lever.

In Europe, the Artificial Intelligence Act (Regulation (EU) 2024/1689) entered into force on August 1, 2024, with provisions phased in over subsequent months. Article 50, which requires providers and deployers of AI systems that generate or manipulate content resembling real people, places, or events to clearly label that content as artificially produced, became applicable in 2025. In principle, the labeling mandate gives platforms and audiences a tool for distinguishing authentic footage from synthetic clips. In practice, no EU member state has yet published enforcement metrics, penalty cases, or compliance audits under the new framework, so the real-world bite of the law remains untested.

The provenance gap

One piece of the puzzle that neither the research papers nor the regulations fully address is provenance infrastructure. The Coalition for Content Provenance and Authenticity (C2PA) has developed a technical standard that embeds cryptographic metadata into media files at the point of capture or creation, allowing downstream viewers to verify where a photo or video originated and whether it has been altered. Major camera manufacturers, newsrooms, and tech companies including Adobe, Microsoft, and Google have begun adopting the standard.

But adoption is uneven. Most consumer smartphones do not yet sign images with C2PA credentials, and social media platforms strip or ignore provenance metadata during upload and compression. Until the chain of custody is end-to-end, from camera sensor to viewer’s screen, provenance tools will cover only a fraction of the content ecosystem. They are a promising complement to detection algorithms and legal mandates, not a replacement for either.

What still needs answering

Several important questions remain open. The Deepfake-Eval-2024 benchmark has not yet undergone formal peer review, and the authors have not released full detector output files or granular breakdowns that would let outside researchers independently verify the exact magnitude of each AUC decline. The study follows standard arXiv documentation norms, which do not require releasing complete code or data.

The NIST paper’s full result tables and methodology appendix are contained in a separate PDF referenced on its publication page. Without examining those details side by side with the Deepfake-Eval-2024 protocol, it is difficult to determine whether the two teams tested overlapping or distinct sets of detectors and generators. If their pipelines diverged, the converging conclusions are stronger evidence of a systemic problem. If they overlapped, the apparent agreement may reflect a narrower finding about a specific family of tools.

There is also the question of pace. Benchmarks like Deepfake-Eval-2024 are designed to spur new model architectures and training strategies, but there is typically a lag between the release of a challenging dataset and the appearance of detectors that can handle it. During that lag, generative systems continue to improve, potentially widening the gap further. Whether future detection gains will merely keep pace with, or actually outstrip, the next generation of video models is an open research question with no guaranteed answer.

Where that leaves the rest of us

For anyone scrolling through a feed in 2026, the practical takeaway is uncomfortable but straightforward. The technology to create persuasive synthetic video is here and broadly accessible. The tools to reliably catch it, whether algorithmic detectors, provenance standards, or legal frameworks, are all advancing but none has reached the point where it can guarantee authenticity on its own.

That does not mean the situation is hopeless. Detection research, provenance infrastructure, and regulation are three legs of a response that, together, could narrow the gap. But right now, the balance of power tilts toward the generators. Readers should treat polished video with the same skepticism they have learned to apply to screenshots and quotes: check the source, look for provenance metadata when available, and remember that seeing is no longer believing.

More from Morning Overview

*This article was researched with the help of AI, with human editors creating the final content.

IG

FB

PIN

LI

X