NASA’s Perseverance rover completed its first drive on Mars planned entirely by generative AI in December 2025, a pair of demonstrations that tested whether machine learning can safely replace human route-planners millions of miles from Earth. That same season, aviation regulators on both sides of the Atlantic released frameworks designed to bring AI-driven risk testing into cockpits, maintenance bays, and air traffic systems. Together, these developments signal a shift in how aerospace engineers verify safety when algorithms, not people, make split-second decisions.
Generative AI Takes the Wheel on Mars
On December 8 and December 10, 2025, Perseverance executed two autonomous drives in which generative AI planned the route from start to finish, according to a mission update from NASA’s Jet Propulsion Laboratory. JPL roboticist Vandi Verma, a key figure behind the rover’s autonomy software, helped oversee the tests. NASA Administrator Jared Isaacman highlighted the milestone as evidence that AI can extend mission reach on distant planets where communication delays make real-time human control impractical.
The demonstrations built on years of incremental progress. Perseverance already relied on a system called Enhanced AutoNav, or ENav, for surface navigation. That software uses a safety checker known as Approximate Clearance Evaluation to screen candidate paths for collision risk. But computational cost forces the onboard processor to rank those paths using heuristics rather than exhaustive analysis, meaning the rover sometimes picks a safe but suboptimal route. Generative AI offered a way to evaluate more options faster, potentially covering more ground per driving cycle while still staying within strict safety envelopes.
Research Behind the Rover’s New Brain
Two research efforts fed directly into the capability Perseverance demonstrated. A preprint titled MLNav describes a safety-constrained machine learning planner that claims reduced collision-check workload compared to the baseline ENav system. The researchers validated MLNav on both real Martian terrain data collected by Perseverance and synthetic landscapes, testing whether the model generalizes beyond training conditions. That distinction matters because Mars terrain shifts with dust storms and seasonal frost, and any planner trained on static data risks failing when conditions change.
A separate line of work paired foundation-model-style generative AI with explicit risk objectives for robotics. Hardware experiments conducted at JPL’s Mars Yard reported failure-rate reductions while maintaining goal-reaching performance. The key technical insight was that safety gains came through inference-time compute, meaning the model spends more processing cycles evaluating risk at the moment of decision rather than requiring expensive retraining on new data. For missions where uploading a retrained model across interplanetary distances is impractical, that approach solves a real operational bottleneck and aligns with broader NASA technical reporting on autonomy and verification.
These research strands sit within a larger institutional push to embed AI into space exploration. Across human spaceflight, Earth science, and planetary missions, NASA programs increasingly rely on machine learning for tasks like hazard detection, science targeting, and anomaly diagnosis. Yet each new AI capability raises the same core question: how do engineers prove that a system trained on past data will behave safely when it encounters something genuinely new?
Why Aviation Regulators Are Writing AI Rules Now
The same tension between speed and safety plays out in commercial aviation, where sensors, predictive maintenance systems, and decision-support tools increasingly rely on machine learning. The Federal Aviation Administration published its Roadmap for AI Safety Assurance, which draws a distinction between “learned” AI, whose behavior is fixed after training, and “learning” AI, which continues to adapt in operation. That distinction carries real regulatory weight: a sensor algorithm that updates itself in flight poses different certification challenges than one frozen at the factory.
The FAA’s roadmap sets an incremental “safety continuum” approach, treating AI both as a regulated capability embedded in aircraft systems and as a tool that supports safety lifecycle processes like inspection scheduling and fault prediction. A companion human-factors study from the U.S. Department of Transportation examines how AI integration affects monitoring, alerting, and decision support within FAA operational systems. Its explicit human-factors lens reflects a concern that automation can degrade operator awareness if poorly designed, a risk documented in decades of cockpit research and now resurfacing as AI systems propose actions rather than simply displaying data.
The DOT’s involvement underscores that AI in aviation is not just a technical certification issue but a broader transport-policy question. Within its wider safety mandate, the Transportation Department has signaled that data-driven tools must enhance, not erode, systemic resilience. For AI, that means regulators are as interested in how algorithms change human workflows as they are in the raw performance metrics of the models themselves.
Europe Proposes Binding AI Trustworthiness Standards
Across the Atlantic, the European Union Aviation Safety Agency took a more prescriptive step. EASA’s NPA 2025-07 is the agency’s first regulatory proposal on artificial intelligence for aviation, and it is open for public consultation. The proposal includes detailed specifications plus acceptable means of compliance and guidance material for what EASA calls “AI trustworthiness,” labeled DS.AI in the regulatory text.
Part A of the proposal, the explanatory note, lays out why EASA is acting now and which aviation domains are targeted. Part B contains the draft specifications that would turn AI risk testing into auditable requirements with clear compliance pathways. The proposal carries explicit linkage to the EU AI Act, meaning aviation-specific rules are being designed to fit within Europe’s broader regulatory architecture for artificial intelligence. For manufacturers and airlines, that means AI systems in onboard avionics or ground-based maintenance tools would need to meet defined trustworthiness criteria before certification, including traceability of training data, robustness to rare events, and mechanisms for human oversight.
The Gap Between Lab Results and Operational Trust
A pattern runs through all of these developments: the technical capability to deploy AI in safety-critical aerospace systems is advancing faster than the frameworks to certify it. Perseverance’s December demonstrations showed that generative AI can plan a safe drive on Mars, but those were two tests on a single rover. Scaling that to routine operations, where the AI planner handles every drive cycle across years of mission life, requires sustained validation that no single demonstration can provide.
Researchers face a familiar dilemma. On one hand, high-fidelity simulations and controlled testbeds such as Mars-yard facilities allow teams to expose AI systems to thousands of edge cases that would be infeasible to encounter naturally. On the other, simulations are only as good as the assumptions that underlie them, and rare real-world failures often arise from combinations of factors that no one thought to model. This gap between lab conditions and operational reality is precisely what keeps regulators cautious about approving AI that can directly command aircraft or spacecraft without human veto.
Regulatory frameworks are starting to encode ways of managing that uncertainty. The FAA’s continuum approach, coupled with human-factors guidance, encourages early deployment of AI in advisory roles where errors are unlikely to be catastrophic and can be monitored. EASA’s DS.AI proposal goes further by specifying how evidence should be gathered and documented, effectively turning abstract ideas like “trustworthiness” into checklists and audit trails. In space exploration, mission teams lean on internal standards and peer review, supported by repositories such as NASA’s technical archive, to justify autonomy upgrades before they are uploaded to a spacecraft already millions of kilometers away.
Yet even with these structures, a residual leap of faith remains. No finite test campaign can prove the absence of failure in systems that learn from data. Instead, engineers and regulators are converging on a more modest goal: demonstrate that AI-driven systems are at least as safe as the human-designed baselines they replace, and design them so that when they do fail, they fail in ways that are detectable, recoverable, and thoroughly analyzed afterward.
The December 2025 drives on Mars, the FAA’s AI roadmap, and EASA’s trustworthiness proposal all point in the same direction. AI is no longer an experimental add-on at the edges of aerospace operations, it is moving into the core of navigation, maintenance, and traffic management. The challenge now is to ensure that the methods used to test and certify these systems evolve as quickly as the algorithms themselves. Without that, the promise of faster, smarter, more autonomous exploration and flight will remain constrained by a lack of confidence that, when something unexpected happens, the machine will make the right call, or at least a safe one.
More from Morning Overview
*This article was researched with the help of AI, with human editors creating the final content.