what is a world model | World Models Watch

What is a world model, in one short answer?

If a language model predicts the next token, a world model tries to predict the next state of a world. That world can be a generated 3D space, a simulated driving scene, an environment for an agent, or an interactive scene that responds when a user moves through it.

This is why the category matters. A normal image or video model can make a plausible-looking output. A world model has to maintain consistency when the camera moves, objects change, actions happen, and time continues.

What it is not

A world model is not just another word for a video model. Video can be part of the interface, but the deeper goal is simulation: representing how a world behaves after a prompt, movement, or action.

Dimension	Video model	World model
Primary output	A fixed video sequence	A stateful environment that can change with actions
Interaction	Usually prompt to clip	Prompt or action to evolving world state
Core challenge	Visual realism and temporal coherence	Spatial memory, causality, controllability, and persistence
Typical use	Creative media generation	Simulation, spatial design, robotics, agent training, interactive media
Evaluation question	Does the clip look plausible?	Does the world behave consistently when explored or acted on?

Why it is becoming a separate category

The phrase now has several visible product and research tracks: DeepMind uses it for interactive generated worlds, World Labs uses it for spatial 3D worlds, Runway uses it for a general world model research direction, and NVIDIA uses world foundation models for physical AI workflows.

That spread is exactly why World Models Watch treats the term as a category, not a single product label.

First models to know

EMOAlibaba Group, Institute for Intelligent ComputingExpressive portrait video model HappyOysterAlibaba Token HubReal-time interactive world model product LingBot-WorldAnt Group / RobbyantOpen-source interactive world simulator LingBot-MapAnt Group / RobbyantStreaming 3D foundation model LingBot-VAAnt Group / RobbyantRobot-control world model

What is a world model?

The simplest path: watch, enter, act.

A video model makes a scene move.

A world model keeps track of a place.

Interaction turns the place into a system.

What is a world model, in one short answer?

What it is not

Why it is becoming a separate category

First models to know

Sources