AI uses virtual sunspots to find rare magnetic events in solar data

Solar flares strong enough to knock out satellites and buckle power grids are, by definition, rare. That rarity is exactly the problem for the machine-learning models tasked with predicting them: they cannot learn to recognize a pattern they have barely seen. A team led by researchers at the Southwest Research Institute is now attacking that bottleneck with a generative AI system that creates realistic virtual sunspots, then uses them as search templates to dig matching real events out of more than a decade of NASA observations.

The approach, described in a preprint posted to arXiv and reported by Phys.org in April 2026, trains a generative adversarial network (GAN) on magnetic-field data from NASA’s Solar Dynamics Observatory. Once the network can produce convincing synthetic active regions with tunable physical traits, a companion retrieval model scans the real SDO archive for observations that closely match those generated profiles. The result is a shortcut past one of space-weather science’s most persistent frustrations: finding enough examples of dangerous magnetic configurations to study and train on.

Why rare events stall forecasting

The sun’s most consequential outbursts, X-class flares and the coronal mass ejections that sometimes accompany them, account for a tiny fraction of all solar activity. The May 2024 geomagnetic storm, the strongest to hit Earth in more than two decades, offered a vivid reminder of what is at stake: aurora visible from the tropics, GPS degradation, and precautionary measures across satellite and grid operations. Yet events of that magnitude appear so infrequently in the historical record that conventional classifiers trained on past data are starved for positive examples.

A peer-reviewed dataset paper in Scientific Data lays out the challenge in detail, documenting how active-region magnetogram datasets suffer from severe class imbalance, inconsistent labeling, and a structural shortage of the high-energy cases that matter most for forecasting. The new GAN method is designed to address that gap without fabricating observations. Instead of inventing data points, it generates synthetic magnetic profiles that guide a search through real archives, surfacing genuine events that would be nearly impossible to locate through manual review.

How the system works

The GAN is trained on SHARP (Space-weather HMI Active Region Patches) magnetic-field data, a well-established product from SDO’s Helioseismic and Magnetic Imager that characterizes active regions on the sun’s surface. According to the preprint, the model learns directions in its latent space that correspond to physical parameters: polarity, total magnetic flux, structural complexity, and flaring tendency. Researchers can then dial up a specific combination of traits, generate a synthetic sunspot that embodies them, and hand that profile to a self-supervised retrieval model that combs the SDO archive for real matches.

SHARP parameters already underpin operational flare-forecasting pipelines. A study indexed by the U.S. Department of Energy’s research repository used SHARP features with a random-forest algorithm to predict flares, and peer-reviewed work published in Frontiers in Astronomy and Space Sciences has documented how these data, publicly available through the Joint Science Operations Center, feed into machine-learning workflows built on the foundational Bobra et al. 2014 study of SDO/HMI vector magnetic data. The new generative layer plugs into that existing infrastructure rather than replacing it.

Beyond simple retrieval, the GAN’s latent-space navigation gives researchers a controlled experimental tool. By nudging a synthetic active region toward higher complexity or stronger magnetic shear and then finding real regions that resemble the result, scientists can test which features most strongly correlate with dangerous eruptions. The technique creates a bridge between theoretical models of flare triggers and the messy, incomplete observational record.

Open questions before operational use

Promising as the method appears, several hurdles stand between a research prototype and a tool that feeds into real-time space-weather warnings.

First, the preprint has not yet completed formal peer review. While the arXiv listing confirms the technical design and the results look internally consistent, independent replication or journal acceptance would provide a stronger foundation for the accuracy claims. External groups will need to test whether the same latent-space manipulations and retrieval metrics hold up against independently curated datasets or slightly different magnetogram products.

Second, computational scalability is unproven at full archive scale. The SDO has been collecting high-cadence observations continuously since 2010, amassing a vast dataset. Whether the GAN retrieval pipeline can run efficiently across that entire record, let alone in near-real-time as new data arrive, has not been addressed in any publicly available technical assessment outside the preprint itself.

Third, generalization across the solar cycle is an open concern. Solar magnetic behavior shifts over the roughly 11-year activity cycle, and subtle changes in instrument calibration can accumulate over time. If the GAN has absorbed artifacts tied to a particular cycle phase or calibration regime, its synthetic regions may not translate cleanly to earlier or later epochs, undermining retrievals drawn from the full SDO timeline.

Neither the SDO/HMI instrument team nor NASA has publicly announced plans to integrate GAN-based retrieval into the Joint Science Operations Center’s pipelines. NOAA’s Space Weather Prediction Center, the agency responsible for issuing operational forecasts, has not commented on the approach.

What it could change for downstream forecasting models

The GAN system does not claim to predict flares on its own. Its contribution sits upstream: expanding the pool of well-characterized examples of the rare magnetic configurations that precede the most damaging eruptions. If the retrieval method proves scalable and survives peer review, every downstream forecasting model that relies on SHARP data could benefit from a richer, more balanced training set. That means fewer missed warnings for the satellite operators, grid reliability planners, and communications engineers who depend on accurate space-weather alerts.

None of that eliminates the deep uncertainty in solar forecasting. The sun remains a turbulent, nonlinear system, and no amount of data augmentation changes the underlying physics. But the approach targets one of the most stubborn bottlenecks in the current pipeline: a shortage of high-quality examples of the very events that matter most. For a field that has spent years watching powerful flares catch forecasting models off guard, a smarter way to find and study those events in the historical record is a concrete step forward.

More from Morning Overview

*This article was researched with the help of AI, with human editors creating the final content.

IG

FB

PIN

LI

X

Global Font

AI uses virtual sunspots to find rare magnetic events in solar data

Why rare events stall forecasting

How the system works

Open questions before operational use

What it could change for downstream forecasting models

Dorian Maddox

Author

A Boston-area trial just showed a single once-a-year shot of an mRNA vaccine pushed pancreatic-cancer patients into remission — built on each tumor’s own genetic fingerprint

The government just greenlit a daraxonrasib expanded-access program for pancreatic-cancer patients with the KRAS G12C mutation — opening the new targeted drug to patients outside the trial

A 5.4 earthquake just rattled the Tongan Trench off Neiafu — another strong South Pacific jolt as the Ring of Fire stays wide awake this week

Mount Bulusan just logged 442 volcanic earthquakes in 10 days as pressure builds under the Philippines’ restless cone — PHIVOLCS warning a sudden eruption is still on the table

The U.N. just put the odds of the next five years smashing 2024’s heat record at 91% — and gave a 75% chance the stretch tops 1.5°C over pre-industrial

More in AI

AI

The newest Anthropic model just took the top spot on the Super-Agent benchmark — the only AI to finish every test case end-to-end and beat OpenAI’s GPT-5.5

AI

Wikipedia just started catching AI-generated articles flooding the encyclopedia at scale — volunteer editors now pulling down hundreds of machine-written pages a week

AI

Anthropic’s run-rate revenue just crossed $47 billion earlier this month — more than doubling OpenAI’s last-reported pace as the two AI giants race toward their IPOs

AI

Wikipedia volunteers are now hunting down AI-written articles flooding the encyclopedia — racing to keep machine-generated fakes out of the world’s reference source

AI

The biggest sites on the web are now slamming their doors on AI crawlers — charging millions for the data that has quietly been training the world’s chatbots

AI

Micron just crossed $800 billion in market value for the first time — the memory-chip maker up 750% in a single year as AI servers hoard every DRAM wafer

AI

Anthropic just vaulted past OpenAI as Earth’s most valuable AI company — raising $65 billion at a $965 billion valuation the same day Claude Opus 4.8 went live

AI

Wikipedia volunteers are now quietly hunting down AI-written articles flooding its pages — racing to keep machine-generated fakes out of the world’s encyclopedia

IG

FB

PIN

LI

X

IG

FB

PIN

LI

X

AI uses virtual sunspots to find rare magnetic events in solar data

Why rare events stall forecasting

How the system works

Open questions before operational use

What it could change for downstream forecasting models

Author

Get weekly updates with the latest news and tips!

More in AI

IG

FB

PIN

LI

X