AI Predicts Pedestrians’ Next Move

A.I
AI Predicts Pedestrians’ Next Move
A new multimodal AI called OmniPredict uses a GPT-4o–style large model to anticipate pedestrian actions in real time, outperforming traditional vision systems on standard benchmarks. Researchers say it could change how autonomous vehicles—and other machines—plan around humans, but the claim that the system is "reading minds" demands careful scrutiny.

On city streets the safest split-second decision is often the one you never have to make. This week researchers at Texas A&M and collaborators in Korea unveiled OmniPredict, an AI system that does more than spot a person in the road: it tries to infer what that person will do next. Described in a peer-reviewed article in Computers & Electrical Engineering, OmniPredict blends scene images, close-up views, bounding boxes, vehicle telemetry and simple behavioural cues to forecast a pedestrian’s likely action in real time.

A model that anticipates, not just detects

Traditional autonomous-vehicle stacks separate perception from planning: cameras and lidar detect objects, then downstream modules decide how to brake or steer. OmniPredict replaces that rigid pipeline with a multimodal large language model (MLLM) architecture that fuses visual and contextual inputs and produces a probabilistic prediction about human behaviour—whether someone will cross, pause in an occluded area, glance toward the vehicle, or perform another action. In laboratory tests the team reports a roughly 67% prediction accuracy on established pedestrian-behaviour benchmarks, a performance gain of about ten percentage points against recent state-of-the-art methods.

The researchers frame the advance as a shift from reactive automation toward anticipatory autonomy. "Cities are unpredictable. Pedestrians can be unpredictable," said the project lead, noting that a car which anticipates a likely step into the road can plan earlier and more smoothly, potentially reducing near-misses. The result is not a human mind-reading oracle but a statistical engine that converts visual cues—pose, head direction, occlusion, vehicle speed—into a short-term forecast of movement.

How OmniPredict reads the scene

At the technical core, OmniPredict uses an MLLM—the kind of architecture increasingly used for chat and image tasks—adapted to interpret video frames and structured contextual signals. Inputs include a wide-angle scene image, zoomed crops of individual pedestrians, bounding-box coordinates, and simple sensor data such as vehicle velocity. The model processes these multimodal streams together and maps them to four behaviour categories the team found useful for driving contexts: crossing, occlusion, actions and gaze.

Two properties matter. First, the MLLM’s cross-modal attention lets the model link a distant body orientation to a local gesture—someone turning their torso while looking down at a phone, for example—without bespoke hand-coded rules. Second, the system appears to generalise: the researchers ran OmniPredict on two challenging public datasets for pedestrian behaviour (JAAD and WiDEVIEW) without bespoke, dataset-specific training and still saw above‑state-of-the-art results. That generalisation is the headline claim, and it’s why the group describes OmniPredict as a "reasoning" layer sitting above raw perception.

Benchmarks, limits and the realism gap

Benchmarks tell one part of the story. The reported 67% accuracy and a 10% improvement over recent baselines are meaningful in academic comparisons, but they do not automatically translate into roadworthy safety. Benchmarks contain many repeated patterns and a narrower distribution of scenarios than live city driving; rare events, adversarial behaviour and unusual weather often swamp model assumptions when systems leave the lab.

Critics are quick to point out that the language "reading human minds" risks overstating the result. The model’s predictions derive from statistical associations learned from past data: similar visual contexts in the training set led to similar outcomes. That’s powerful, but it is not the same as access to human intent or internal mental states. In practice, pedestrians are influenced by local culture, street design and social signalling; an AI that doesn’t account for those layers can make confident but wrong predictions.

Safety, privacy and behavioural feedback

If a vehicle plans around what it expects you to do, human behaviour may change in response—a point sometimes called the behavioural feedback loop. People who know cars will anticipate them might take more risks, or conversely become more wary; either dynamic can change the statistical relationships the model depends on. That makes continuous in‑field validation essential.

The system’s reliance on visual and contextual cues also raises privacy and equity questions. Models trained on urban footage often inherit the biases and blind spots of their datasets: who was recorded, under which conditions, and with what cameras. Weaknesses in detection for certain skin tones, clothing types or body shapes could translate into different prediction quality across populations. Engineering teams must therefore prioritise dataset diversity, transparency about model failure modes, and procedures to audit and mitigate biased behaviour.

From multimodal LLMs to brain-inspired architectures

The parallel is conceptual rather than literal. Current AI does not replicate human consciousness or the mechanisms of real intention. But taking inspiration from neural organisation—how networks route information and form specialised modules—can help engineers design systems that better balance speed, robustness and adaptability on chaotic city streets.

What needs to happen before deployment

OmniPredict is a research prototype, not a finished autonomy stack. Before deployment in vehicles, it needs long-term field trials, rigorous safety validation under corner cases, and integration tests that show how behavioural predictions should influence motion planning. Regulators and manufacturers will also have to decide standards for acceptable false-positive and false-negative rates when a system predicts human actions—trade-offs that carry clear safety implications.

Finally, the project underscores a recurring truth of applied AI: accuracy on curated tests is necessary but not sufficient. Real-world systems must be auditable, fair and robust to distribution shifts; they must degrade gracefully when uncertain. The prospect of machines that "anticipate" human movement is attractive for safety and flow in urban transport, but it brings technical, ethical and legal questions that should be resolved before cars make irreversible decisions based on those predictions.

The work from Texas A&M and partners points to a near future in which perception, context and behavioural reasoning are inseparable components of autonomous systems. That future will be safer only if it combines the new predictive layer with conservative safety design, careful testing and clear rules for transparency and accountability.

Sources

  • Computers & Electrical Engineering (research paper on OmniPredict)
  • Texas A&M University College of Engineering
  • Korea Advanced Institute of Science and Technology (KAIST)
  • Nature Machine Intelligence (research on neuromorphic networks)
  • McGill University / The Neuro (Montreal Neurological Institute-Hospital)
Mattias Risberg

Mattias Risberg

Cologne-based science & technology reporter tracking semiconductors, space policy and data-driven investigations.

University of Cologne (Universität zu Köln) • Cologne, Germany