World model concept map

World model concept map from AI video to spatial worlds.

Use the world model concept map to connect AI video, spatial computing, digital twins, physical AI, and generated worlds.

Start definition See models

Virtual worldsAI videoSpatial computingPhysical AI

Concept map

Core sentence

Virtual worlds were built by humans. Now AI is learning to generate, control, and simulate them.

This sentence is the spine of the site. Minecraft and Roblox explain the user mental model. The metaverse explains persistence and social space. Vision Pro explains spatial computing. EMO, Veo, Wan, Kling, and Ray explain controllable video. Cosmos and digital twins explain simulation. World models connect all of it into one technical future.

Scene explainer

A three-step visual map.

The map works best as a consumer story: start from what people already know, then reveal the new layer.

Known worlds

Games and metaverse taught the interface.

People already understand avatars, spaces, inventories, maps, and shared places.

Generated media

AI video made the world visible.

Synthetic scenes became easy to watch, remix, and share, but still behaved like clips.

World models

The next layer is enterable and controllable.

The scene starts to remember space, respond to action, and support agents or physical simulation.

Concept flow

How the world model concept map connects familiar ideas.

Definition page

Past interface

Human-built virtual worlds

Before world models, people already understood avatars, sandbox worlds, social rooms, and user-built spaces.

Current surface

AI-generated video and humans

AI video is the visible surface of the shift. The deeper issue is control, consistency, and memory across time.

Spatial interface

Spatial computing and immersive access

Vision Pro and spatial computing are not the same thing as world models. They are how generated worlds may be seen and operated.

Industrial layer

Simulation, digital twins, and physical AI

The industrial version of world models is not entertainment. It is simulation for robots, vehicles, factories, and cities.

Core capability

World models

The core shift is from generating isolated outputs to modeling how a world changes under time, viewpoint, and action.

Past interface

Human-built virtual worlds

Before world models, people already understood avatars, sandbox worlds, social rooms, and user-built spaces.

Minecraft-style identity

Blocky avatars

Simple characters make virtual presence easy to understand. The form is basic, but the mental model is powerful: a person can enter a world.

MinecraftRoblox avatarVoxel worlds

World models inherit the question of presence: who is inside the generated world, and can that identity persist?

Buildable spaces

Sandbox worlds

Minecraft and Roblox trained users to expect worlds that can be modified, extended, and shared.

Minecraft blocksRoblox experiencesUGC worlds

AI world generation becomes more valuable when generated spaces are editable instead of disposable.

Persistent social space

Metaverse

The metaverse idea framed virtual worlds as social, persistent, and identity-driven, even when the tooling was still manual.

Meta Horizon WorldsVR roomsSocial worlds

World models can supply the missing automation layer: worlds generated on demand, not only built by hand.

Current surface

AI-generated video and humans

AI video is the visible surface of the shift. The deeper issue is control, consistency, and memory across time.

Audio-driven identity

Expressive humans

EMO makes the control problem visible: the same identity needs to move, emote, sing, and stay coherent over time.

EMODigital humansTalking avatars

If a generated person cannot persist, a generated world cannot feel stable.

Prompt-to-motion

Video models

Veo, Wan, Kling, Ray, and earlier systems like Sora turn text, images, audio, and references into moving scenes.

Veo 3.1Wan2.7-VideoKlingRaySora

The next comparison is whether those scenes can be controlled, extended, and interacted with.

From avatar to actor

Digital characters

MetaHuman-style characters, Roblox avatars, and EMO-like portraits point to a future where generated characters need continuity.

MetaHumanRoblox avatarEMO portraitRunway Characters

Characters are the social layer of generated worlds.

Spatial interface

Spatial computing and immersive access

Vision Pro and spatial computing are not the same thing as world models. They are how generated worlds may be seen and operated.

Computer as environment

Spatial computing

Apple Vision Pro reframes computing as something placed into space instead of locked inside a flat screen.

Apple Vision ProSpatial video3D interfaces

World models need interfaces where generated space can be inspected, edited, and inhabited.

Scene as data

3D reconstruction

NeRF, Gaussian splatting, and scan-to-3D workflows make real or imagined spaces computable.

NeRF3D Gaussian SplattingLingBot-MapHY-World 2.0

Generated worlds need spatial structure, not only pixels.

From screen to place

Immersive worlds

VR, AR, and mixed reality make the user feel located inside a generated or captured environment.

Meta QuestVision ProImmersive video

World models become more legible when users can enter and manipulate the output.

Industrial layer

Simulation, digital twins, and physical AI

The industrial version of world models is not entertainment. It is simulation for robots, vehicles, factories, and cities.

Real world mirror

Digital twins

Digital twins model real places and systems so teams can test changes before touching the physical world.

NVIDIA OmniverseFactory twinsCity simulation

World models can make simulations cheaper to create and easier to vary.

AI for embodied systems

Physical AI

Robots and autonomous vehicles need models of how environments respond to motion, contact, and decisions.

CosmosHY-Embodied-0.5LingBot-VALingBot-VLA

This is where world models become training infrastructure, not just media.

World-scale spatial memory

Geospatial models

Large geospatial models connect AI to real-world places, maps, and location-aware behavior.

Niantic spatial AIMapsAR location layers

They turn the real world into a modelable environment.

Core capability

World models

The core shift is from generating isolated outputs to modeling how a world changes under time, viewpoint, and action.

World responds to action

Interactive generation

A world model should preserve a coherent state when the user moves, edits, or acts.

Genie 3MarbleHappyOysterHY-World 2.0

Interaction is the difference between watching a clip and entering a system.

Base models for simulation

World foundation models

Foundation models can become reusable infrastructure for generating, predicting, and testing world states.

CosmosGWM-1World APIHY-World 2.0LingBot-VALingBot-VLA

This is where creative, spatial, and physical world generation begin to share a vocabulary.

World as training ground

Agent environments

Agents need environments where they can observe, act, fail, and learn.

Game worldsRobot simulatorsInteractive scenes

World models can become the substrate for training and evaluating future AI agents.

Bridge table

What each familiar concept contributes.

Entry concept	Known for	Connects to	Meaning inside world models
Blocky avatars / Minecraft	Simple identity inside a buildable world	Avatars, sandbox worlds, UGC	Generated worlds need persistent users, objects, and editable structure.
Metaverse	Persistent social virtual spaces	VR, Horizon Worlds, social identity	World models automate world creation instead of relying only on manual building.
Vision Pro	Spatial computing and immersive interface	AR, spatial video, 3D interaction	Generated worlds need a spatial interface for viewing, editing, and operation.
AI video	Generated motion, characters, and scenes	EMO, Veo 3.1, Wan2.7-Video, Kling, Ray	The video layer must become controllable, continuous, and stateful.
Digital twins	Simulation of real systems	Omniverse, robotics, LingBot-VA, LingBot-VLA, city and factory models	World models become useful when they predict and test real-world behavior.
World model	Predicting and generating world state	Genie 3, Marble, Cosmos, GWM-1	The final category is not a place or device; it is the model that makes worlds behave.