A retinal photograph lands on an ophthalmologist’s screen. The colors are right, the optic disc looks textured and organic, and the blood vessels branch the way they should. Nothing about it screams “fake.” But the image was never captured by a fundus camera. It was generated by OpenAI’s GPT-4o, and according to a peer-reviewed study published in the journal Eye (part of the Nature portfolio), trained clinicians found the synthetic output convincing enough to raise serious concerns about deepfakes infiltrating medical imaging.
That finding, published in early 2025, is one piece of a broader pattern researchers have been documenting through spring 2026: the realism of AI-generated images is improving faster than the tools built to catch them.
The detection gap is widening
The most direct evidence comes from a benchmark study posted on arXiv that collected deepfakes actually circulating on social platforms and messaging apps during 2024. When researchers ran those images through widely used detection tools, accuracy dropped well below the scores those same tools had achieved on older, curated academic datasets. The takeaway was blunt: detection performance measured in the lab does not hold up against the fakes people encounter in the real world.
The study has not yet passed peer review, an important caveat. But its methodology is straightforward and its conclusion aligns with what the Eye journal team found in a controlled clinical setting. Together, the two papers point in the same direction from different angles. One shows that automated detectors falter against current-generation fakes scraped from live platforms. The other shows that human experts falter when the fakes target a specialized domain like retinal imaging.
Federal authorities have taken notice. The National Institute of Standards and Technology published guidance on deepfake evaluation (the document is identified as NIST IR 8535, though its exact publication date is not specified on the landing page) that shifts the conversation away from visual spotting and toward provenance and authentication. The document’s core message: organizations should stop expecting reviewers to eyeball fakes and start building verification workflows around forensic methods and digital chain-of-custody records. That framing implicitly concedes a hard truth. As realism improves, the naked eye is no longer a dependable filter.
Why medical imaging raises the stakes
Deepfakes in political memes or celebrity hoaxes grab headlines, but the medical angle may carry graver consequences. Fundus photographs are used to diagnose glaucoma, diabetic retinopathy, and macular degeneration. If a fabricated image enters a patient’s record, it could trigger unnecessary treatment or, worse, mask a real condition. Insurance claims, malpractice disputes, and clinical trials all depend on the assumption that diagnostic images are authentic.
The Eye study did not find evidence that fabricated fundus images have already entered clinical workflows. What it demonstrated is that the technical barrier to doing so has dropped sharply. GPT-4o is a consumer-facing product, available to anyone with a ChatGPT subscription. No specialized medical-imaging model or custom training was required to produce the synthetic retinal photos the researchers evaluated.
As of spring 2026, no major ophthalmology or radiology society has issued widely adopted protocols for verifying the provenance of clinical images in light of these capabilities. Frontline clinicians are still operating under older assumptions about what a “real” image looks like and how likely it is to have been manipulated.
What researchers still do not know
Several gaps remain. No publicly available benchmark has isolated GPT-4o’s image outputs as a separate category and tested them against the latest commercial detectors. The arXiv study covers a broad set of in-the-wild fakes but does not break results down by source model. That means the precise failure rate for ChatGPT-generated images specifically is still unquantified in open research.
OpenAI has not released detailed technical documentation explaining which architectural changes in GPT-4o drive the realism gains. Without that transparency, independent researchers are left to reverse-engineer capabilities by testing outputs rather than evaluating the model’s design. That asymmetry complicates efforts to build targeted countermeasures and limits regulators’ ability to tie safeguards to specific model features. OpenAI has publicly committed to embedding C2PA metadata in images generated through its tools, but enforcement and adoption across downstream platforms remain inconsistent.
There is also an open question about whether the detection shortfall is permanent or temporary. If detection-tool developers retrain their systems on large volumes of contemporary fakes, including clinically oriented images, they may recover some lost accuracy. Until those results are published, the safest assumption is that existing detection scores overstate real-world performance.
Provenance checks offer the strongest near-term defense
For anyone who handles images in a professional capacity, whether in a newsroom, a hospital, an insurance office, or a courtroom, the NIST guidance offers the most actionable starting point: verify origin before trusting content.
In practice, that means checking embedded metadata, running reverse-image searches, and, where possible, relying on content-authentication standards like the Coalition for Content Provenance and Authenticity (C2PA) framework. C2PA attaches cryptographic signatures to media files, creating a verifiable chain of custody. Adobe, Microsoft, and several camera manufacturers have already adopted the standard, though platform-level support is still uneven.
None of these steps will catch every fake. But they represent a far more reliable defense than visual inspection alone, especially as generative models continue to close the realism gap. The research published over the past year does not predict an inevitable collapse in our ability to separate truth from fabrication. It does, however, make clear that trust needs to migrate: away from what an image looks like and toward provable records of where it came from and who vouches for it.
More from Morning Overview
*This article was researched with the help of AI, with human editors creating the final content.