Morning Overview

Are LTMs the next LLMs? New AI claims powers current models just can’t

Large language models turned natural language into a programmable interface, but they still struggle when the world stops being text and starts being traffic, physics and risk. A new wave of “large trajectory models” is trying to do for motion and decision making what LLMs did for words, promising capabilities that current chatbots and copilots simply cannot reach. The bet is that if you scale models around actions and futures instead of sentences, you get systems that can plan, predict and act with far more reliability.

That shift matters because the biggest AI opportunities now sit in cars, factories, hospitals and insurance back offices, not just in search boxes. If LTMs can turn messy real‑world dynamics into something as tractable as next‑token prediction, they could quietly become the backbone of autonomy and operations while LLMs remain the public face of AI.

From predicting words to predicting worlds

The core idea behind large trajectory models is deceptively simple: treat motion and decision sequences the way LLMs treat text, as long chains of tokens that can be learned at scale. In work labeled Inspired, researchers argue that the same scaling laws that powered chatbots can be repurposed for driving, robotics and other control problems. Instead of predicting the next word in a sentence, these models predict the next position, velocity or action in a trajectory, then roll that forward over long horizons.

That reframing is not cosmetic. In detailed experiments on autonomous driving, teams behind large trajectory models show that scaling up parameters and data lets a single model understand diverse road topologies, reason about traffic dynamics over extended time windows and interpret heterogeneous sensor inputs. Instead of hand‑engineering separate modules for perception, prediction and planning, the LTM learns a unified representation of how agents move and interact, which is precisely what current LLMs lack when they hallucinate about physical processes or misjudge spatial constraints.

Inside the LTM toolbox: trajectories, tokens and traffic

To make this work, LTM researchers have had to rethink how they represent the world. In one influential line of work, Qiao Sun, Shiduo Zhang, Danjiao Ma, Jingzhe Shi, Derun Li and colleagues compress continuous motion into discrete tokens that can be fed into transformer architectures, then train at scale on real driving logs. Their study on Large Trajectory Models describes how this tokenization lets a single network act as both motion predictor and planner without expensive high‑level annotations, effectively turning the road into a language the model can speak.

Others are going a step further by explicitly fusing language and motion. A system dubbed TrajLLM starts with sparse context joint encoding to break down agent and scene features into a format that a language model can understand, then uses that representation to reason about future movements. This hybrid approach hints at a near future where a single foundation model can read a traffic report, parse a map and plan a lane change, but the heavy lifting on motion still comes from trajectory‑centric training rather than pure text prediction.

Why LLMs hit a wall in high‑stakes decisions

For all their fluency, today’s LLMs remain brittle when precision and accountability matter. In one evaluation of high‑stakes terminology, researchers found that, out of the box, no leading LLM could break 9 percent accuracy, with the top performer on a 1,150-term test achieving a jaw‑dropping 8.43% on its own. That kind of performance is unacceptable in domains like aviation, medicine or autonomous driving, where a single misclassification can have real‑world consequences.

Even in more forgiving enterprise settings, practitioners describe LLMs as behaving like “brilliant, over‑confident, uncontrollable teenagers,” astonishingly creative with unstructured data but still prone to hallucinations that make return‑on‑investment calculations tricky. That characterization comes from insurance specialists evaluating how LLMs handle claims, where a wrong answer can trigger regulatory scrutiny or financial loss. Large trajectory models, by contrast, are explicitly trained to respect physical constraints and long‑term dynamics, which makes them better suited to environments where “close enough” is not good enough.

World models, foundation models and the rise of LTMs

LTMs are part of a broader shift toward AI systems that build internal simulators of reality rather than just autocomplete text. Advocates of so‑called world models argue that the most important AI of the next few years will not be chatbots but engines that can understand a scene, imagine how it will evolve and choose actions accordingly. As one analysis of world models puts it, you likely will not interact with these systems the way you do with LLM‑powered tech, but they will quietly power products that perceive the world and can take actions in it.

At the same time, AI architects are drawing clearer lines between general‑purpose language systems and the broader category of foundation models. Guides on Distinguishing Foundation Models from Large Language Models modern stack stress that language is just one modality among many. Large trajectory models fit neatly into this foundation‑model frame: they are trained on vast corpora, exhibit emergent capabilities and can be adapted across tasks, but their native “language” is motion and interaction rather than text.

Autonomous transport and the LTM advantage

Transport is where the LTM thesis is being tested most aggressively. Reviews of Abstract trends in mobility note that Large models are already widely used in intelligent transportation systems and autonomous vehicles because of their strength in multi‑task learning and domain adaptation. Large trajectory models extend that logic by unifying perception, prediction and control into a single scalable architecture, which is particularly attractive for fleets of robotaxis or long‑haul trucks that must operate across varied cities and weather conditions.

Surveys of how trajectory prediction meets language modeling highlight both the promise and the remaining gaps. One comprehensive review notes that While scaling and tokenization improve generalization, challenges remain in spatial grounding and physical realism, underscoring that LTMs still need better understanding of motion and intent. That is a reminder that even trajectory‑centric models are not magic; they must still grapple with rare edge cases, ambiguous human behavior and the messy interface between digital predictions and analog streets.

More from Morning Overview