AI model reconstructs molecules from Coulomb explosion fragments

Researchers at the Department of Energy’s SLAC National Accelerator Laboratory have built a generative AI model that reconstructs the three-dimensional geometry of molecules from the scattered ion fragments produced when intense light pulses blow them apart. The technique addresses a long-standing barrier in physical chemistry: turning the messy debris of a Coulomb explosion back into a precise picture of the molecule that existed before the blast. With reconstruction accuracy below one Bohr radius, the approach could reshape how scientists observe the fleeting structural changes that drive chemical reactions.

How Coulomb Explosions Reveal Molecular Shape

Coulomb explosion imaging, or CEI, works by stripping electrons from a molecule so rapidly that the remaining positively charged nuclei repel each other and fly apart. The directions and speeds of those fragments encode information about where the atoms sat relative to one another just before ionization. The basic concept dates back decades; a 1989 study in Science demonstrated that fragment trajectories and momenta could be used to reconstruct small molecular structures. Since then, the method has grown far more powerful. Experiments at the European XFEL’s SQS instrument station have used a COLTRIMS reaction microscope to capture momentum images of molecules containing 10 to 11 atoms, including all hydrogen positions, in species such as 2-iodopyridine and 2-iodopyrazine.

Yet the raw data from these explosions is not a simple photograph. Each molecule shatters differently depending on how charges redistribute during ionization, a process complicated by stepwise electron removal and rapid charge migration across the molecular frame. A 2024 analysis in Communications Physics showed that the mapping from measured momenta back to the original geometry is mathematically ill-defined, meaning multiple starting structures can produce similar fragment patterns. That study contrasted two strategies: forward modeling, which simulates explosions from a guessed geometry and compares to data, and inversion modeling, which attempts to run the physics backward. Neither approach alone has reliably solved the problem for molecules larger than a few atoms.

A Diffusion Transformer Tackles an Ill-Posed Problem

The new model, described in a paper published in Nature Communications, sidesteps the traditional physics inversion entirely. Instead, it uses a generative diffusion-based Transformer architecture trained on paired datasets of simulated molecular geometries and their corresponding ion-momentum distributions. The training data was split into 80 percent for training, 10 percent for validation, and 10 percent for testing, according to the paper’s supplementary methods and authentication materials associated with the publication.

Rather than solving equations of motion in reverse, the model learns the statistical relationship between explosion patterns and molecular shapes. Given a new set of fragment momenta, it generates a probability distribution over possible source geometries and samples the most likely structure. The result, as reported in the Nature Communications article, is a mean absolute reconstruction error below one Bohr radius, roughly 0.53 angstroms, which is smaller than a typical covalent bond length. That level of precision is sufficient to distinguish bond-length changes, angular distortions, and conformational shifts in small polyatomic molecules.

This generative strategy offers a conceptual advantage over both forward and direct-inversion methods. Forward modeling requires expensive iterative simulation for every new measurement. Direct inversion struggles with the many-to-one ambiguity of the momentum-to-geometry map. The diffusion model, by contrast, absorbs those ambiguities during training and produces fast, probabilistic answers at inference time. Because it returns a distribution over structures rather than a single best guess, it can also quantify uncertainty, flagging cases where the fragment data simply do not constrain the geometry well enough.

Building on Machine Learning Successes in CEI

The SLAC team’s work did not emerge in isolation. Machine learning has been creeping into Coulomb explosion analysis for several years. A separate 2025 study demonstrated that neural networks trained on multi-ion coincidence data could differentiate molecular isomers, species with the same chemical formula but different spatial arrangements, by learning correlations in coincidence patterns. That work used event-by-event three-dimensional momentum reconstruction and coincidence filtering to feed clean training data into classification algorithms.

Tabletop laser experiments have also shown that CEI fragment patterns carry enough structural information to distinguish three-dimensional molecular structures without requiring a large-scale X-ray facility. And earlier experimental milestones proved that CEI can recover stereochemical details as specific as the absolute configuration of a chiral epoxide in the gas phase, showing that the explosion fragments preserve subtle spatial information about left-handed versus right-handed molecular forms.

What sets the new diffusion model apart from these predecessors is ambition. Classification tells you which of several known structures best matches the data. Reconstruction tells you the actual atomic coordinates, even for a structure the model has never seen before. That jump from sorting to generating is what makes the approach potentially useful for tracking molecules as they change shape during a reaction, where intermediate geometries may never have been cataloged in advance.

Why Molecular Imaging Options Remain Limited

The practical appeal of this work becomes clearer against the backdrop of existing imaging tools. X-ray crystallography requires ordered crystals, which many biologically and chemically interesting molecules refuse to form. Cryo-electron microscopy works on large complexes but struggles with small, flexible species. Nuclear magnetic resonance spectroscopy provides structural constraints but not direct spatial snapshots on ultrafast timescales.

By contrast, Coulomb explosion imaging can, in principle, capture structures of isolated molecules in the gas phase with femtosecond temporal resolution, simply by adjusting the timing between a pump pulse that initiates dynamics and a probe pulse that triggers the explosion. The bottleneck has been interpretation: turning fragment clouds into coordinates quickly and reliably enough to follow a reaction pathway. A generative AI model that learns this mapping from data could make CEI a more routine tool for time-resolved chemistry.

From Simulation to Experiment

For now, the SLAC diffusion model has been trained primarily on simulated explosions, where the underlying geometries are known exactly. That strategy is standard in fields where labeled experimental data are scarce or expensive to obtain. It also allows the training set to span a wide range of molecular shapes and charge distributions, including configurations that might be difficult to isolate in the lab.

The crucial next step is to validate performance on real measurements. Experimental data introduce complications absent from simulations: detector inefficiencies, background noise, imperfect alignment, and missing fragments when ions escape the collection volume. The generative framework is flexible enough to incorporate these effects, either by augmenting the training data with realistic noise or by fine-tuning the model on a modest set of well-characterized experimental explosions.

If that bridge can be crossed, the payoff could be significant. A trained model could sit at the end of a beamline, ingesting fragment momenta in real time and outputting most-likely structures within seconds. That feedback would let experimentalists adjust laser parameters, pulse sequences, or molecular targets on the fly, rather than waiting for lengthy post-processing.

Opportunities and Limits

Even with its impressive accuracy, the diffusion approach will not solve every CEI challenge. The ill-posed nature of the inversion means that some geometries will always be ambiguous, especially for larger molecules with many internal degrees of freedom. In those cases, the model’s probabilistic output can still be useful, identifying families of compatible structures or highlighting which bonds and angles are well constrained by the data.

There is also the question of generalization. A model trained on a particular set of molecules and charge states may struggle when confronted with very different chemistries, such as transition-metal complexes or highly delocalized π-systems. Expanding the training corpus, and perhaps incorporating some physics-based constraints into the architecture, will be important for broad applicability.

Still, the combination of CEI and generative AI marks a notable shift. Instead of treating the explosion as a destructive endpoint, the new work treats it as a rich measurement channel that, with the right statistical tools, can be run in reverse. In doing so, it opens a path toward routine, ultrafast, three-dimensional imaging of molecules in motion, turning violent fragmentation into a window on the most delicate rearrangements of chemical bonds.

More from Morning Overview

*This article was researched with the help of AI, with human editors creating the final content.

IG

FB

PIN

LI

X