Doctors struggle to spot AI-made X-rays, raising fraud concerns

Radiologists are struggling to distinguish AI-generated chest X-rays from authentic clinical images, according to a growing body of research that raises serious questions about fraud, misdiagnosis, and the integrity of medical records. Multiple reader studies now show that synthetic scans produced by generative models can fool trained physicians and automated detection systems alike. The findings carry direct implications for insurance claims processing, diagnostic accuracy, and patient safety across the healthcare system.

Synthetic Scans That Fool Trained Eyes

The core problem is straightforward: generative AI has gotten good enough at producing medical images that the people trained to read them cannot reliably tell the difference. A gaze-tracking experiment described in the paper on radiologist eye movements found that clinicians fixated on similar visual features whether they were examining real or synthetic images. Their eye patterns, which typically reveal diagnostic reasoning, did not consistently shift when viewing AI-generated content. That similarity in gaze behavior suggests the fakes are not just superficially convincing but structurally close enough to real pathology to bypass expert scrutiny.

Separate research reinforces this finding. In a study covered by a Science Daily report, deepfake X-rays created by AI proved convincing enough to fool both doctors and AI models, with radiologists showing limited success in tests designed to measure their detection ability. The technology is no longer theoretical or confined to computer science labs. It has reached a level of fidelity that challenges the assumptions underlying clinical image review and threatens to blur the line between legitimate and fabricated evidence in patient records.

From Chest Scans to Cancer Fabrication

The threat extends well beyond chest X-rays. A foundational study known as CT-GAN demonstrated that generative models can tamper with three-dimensional CT scans to insert or delete lung tumors, deceiving both human clinicians and machine learning diagnostic systems. That work, later presented at a major security conference, explicitly framed real-world motives for such attacks, including insurance fraud and targeted harm. The ability to fabricate or erase signs of a life-threatening disease in a medical scan represents a qualitatively different kind of risk than most deepfake concerns, because the consequences land directly on individual patients and the financial systems that pay for their care.

Mammography faces similar vulnerabilities. A peer-reviewed study in a Nature Communications article showed that GAN-synthesized adversarial images could insert or remove lesions in breast imaging, fooling AI diagnostic systems that had been trained on large datasets. That same research discussed the limits of expecting radiologists to catch such manipulations, noting that subtle pixel-level perturbations can create or hide clinically meaningful findings without obvious visual artifacts. This complicates the common assumption that human oversight will serve as a reliable backstop against AI-generated fakes.

How Diffusion Models Raise the Bar

Newer generative architectures are making the problem harder to contain. RoentGen, a diffusion-based chest X-ray generator, can produce images conditioned on text prompts using domain-specific radiology language. Human experts who evaluated the outputs found them visually convincing and controllable, meaning a user can specify a clinical condition in plain medical terminology and receive a synthetic scan that matches the description. This kind of precision turns image generation from a blunt instrument into a targeted one, enabling tailored forgeries that mirror real disease patterns.

A related study on diffusion models for lung imaging asked radiologists to label samples as real, fake, or unsure. The results showed persistent ambiguity when judging authenticity even when the generated images were not clinically perfect. The gap between “not perfect” and “good enough to deceive” turns out to be smaller than many clinicians expected, and that gap continues to narrow as model architectures improve and training datasets expand. As diffusion models become more accessible through open-source code and cloud platforms, the technical barrier to producing realistic medical deepfakes is likely to fall further.

Why Existing Safeguards Fall Short

Most medical imaging systems rely on the DICOM standard for storing and transmitting scans. DICOM includes digital signature profiles designed to verify the integrity and authenticity of both header metadata and pixel data. But a peer-reviewed analysis of these protections found significant practical weaknesses across real-world deployments. Digital signatures are not universally enabled across hospital networks, and key management practices can be inconsistent or incomplete. Moreover, the standard was designed primarily to prevent accidental corruption or unauthorized modification, not to detect sophisticated generative forgeries that mimic legitimate device output.

This gap matters because the chain of trust in medical imaging depends on the assumption that a scan arriving in a radiologist’s viewer is an authentic capture from an imaging device. If that assumption breaks down and no reliable automated check exists to flag synthetic content, the entire diagnostic workflow becomes vulnerable. A fraudulent scan submitted for an insurance claim, for example, would pass through the same systems and review processes as a legitimate one, potentially triggering unnecessary procedures, payouts, or denials of needed care based on fabricated evidence.

Detection Efforts and the Insurance Threat

Researchers at the University at Buffalo are developing tools specifically designed to spot AI-generated medical reports, acknowledging that while such fakes are not yet common, they have the potential to cause serious problems in the medical and insurance industries. Their work emphasizes that general-purpose deepfake detectors, which are often tuned for natural images and videos, perform poorly when applied to radiology because clinical scans follow different visual conventions and noise patterns. That distinction points to a broader challenge: medical imaging requires bespoke detection strategies that account for modality-specific physics and acquisition artifacts.

Most AI detection research to date has focused on faces, voices, and everyday photographs, where datasets are abundant and manipulation cues are more familiar. Medical images, by contrast, encode subtle grayscale textures, anatomical structures, and device-specific signatures that are less intuitive even for machine learning experts. When generative models learn these patterns well enough to reproduce them, traditional forensic techniques that look for compression anomalies or simple inconsistencies may fail. The result is a widening asymmetry between the ease of generating convincing fakes and the difficulty of reliably catching them.

For insurers, this asymmetry is particularly concerning. A malicious actor could, in principle, generate a chest X-ray or CT slice that appears to show a fracture, tumor, or other billable condition, attach it to a claim, and rely on the fact that both human reviewers and automated systems struggle to distinguish it from a legitimate scan. Conversely, tampering with existing images to erase evidence of injury could be used to dispute liability or reduce payouts. The CT-GAN work on adding and removing lung cancer findings, and the mammography research on adversarial lesions, illustrate how such manipulations might be technically executed, even if large-scale abuse has not yet been documented.

Building a New Chain of Trust

Addressing these risks will likely require changes at multiple layers of the healthcare ecosystem. At the technical level, imaging devices and picture archiving systems may need stronger, hardware-rooted attestation mechanisms that cryptographically bind pixel data to a specific acquisition event. Standards bodies could update DICOM profiles to make robust signing and verification mandatory rather than optional, and to incorporate tamper-evident logs that track every transformation applied to an image.

On the clinical side, radiology workflows may need to incorporate explicit authenticity checks alongside diagnostic interpretation. That could mean flagging images that lack verified provenance, training clinicians to recognize suspicious patterns associated with known generative models, and integrating specialized detectors tuned to modalities such as chest X-ray, CT, and mammography. However, research to date suggests that human perception alone will not be sufficient, reinforcing the need for automated support.

Finally, insurers and regulators will have to adapt policies and oversight mechanisms to a world where pixel-level evidence can no longer be taken at face value. Claims review processes may require stronger documentation of image origin, while legal frameworks might evolve to treat deliberate medical deepfakes as a distinct category of fraud. The same generative technologies that promise to augment training datasets and reduce radiation exposure can, if left unsecured, also undermine trust in the images that modern medicine depends on. The emerging research on gaze patterns, adversarial lesions, and diffusion-based synthesis makes clear that the line between real and synthetic in radiology is already blurring, and that rebuilding a reliable chain of trust will be essential to keeping patients safe.

More from Morning Overview

*This article was researched with the help of AI, with human editors creating the final content.

IG

FB

PIN

LI

X