Embodied World Models for Decision Making

Overview

World models infer and predict real-world dynamics by modeling the external environment, and have become a cornerstone of embodied artificial intelligence. They have powered recent progress in decision-making and planning for interacting agents. This workshop aims to bring together researchers working at the intersection of generative modeling, reinforcement learning, computer vision, and robotics to explore the next generation of embodied world models—models that enable agents to understand, predict, and interact with the world through learned models. By focusing on embodiment and decision-making, this workshop seeks to advance world models beyond passive prediction, toward active, goal-driven interaction with the physical and virtual world. By emphasizing embodiment and decision-making, we aim to move beyond passive sequence prediction toward goal-directed interaction with both physical and simulated worlds.

Topics of Interest

We welcome contributions that advance theoretical foundations, algorithmic innovations, or real-world applications of world models. Topics of interest include (but are not limited to):

Model-based reinforcement learning and long-horizon planning. Investigating how world models can benefit model-based reinforcement learning with a focus on sample efficiency, performance, and scalability. Particular attention is given to long-horizon planning, which requires the agent to reason over extended sequences of actions, anticipate delayed outcomes, and maintain coherent strategies across temporally distant states and goals, often under uncertainty and limited feedback.
Aligning simulation and real-world physics for robot learning. Investigating how to bridge the gap between simulated and real-world physics to enhance robot learning. This includes using generative models to improve perception, planning, and control by capturing physical dynamics more accurately, modeling uncertainty and feedback effects, and learning diffusion-based policies that transfer robustly from simulation to the real world.
Interactive scene generation and downstream tasks. Building models that generate physically plausible and semantically coherent interactive video simulations. Focus areas include action-conditioned scene synthesis, controllable simulation of agent-environment dynamics, and the development of evaluation techniques and benchmarks that assess video fidelity, temporal consistency, and task-relevant controllability for downstream applications such as planning and policy learning.
Video-language-action (VLA) models and leveraging the world knowledge encoded in large language models (LLMs). Studying large-scale pretrained models that unify video, language, and action representations to support robust and generalizable policy learning. Core areas include curating diverse multi-modal datasets, improving cross-modal alignment, developing parameter-efficient fine-tuning methods, and enabling agents to follow complex, language-guided instructions in both simulated and real-world settings. We also explore how the structured and unstructured world knowledge embedded in large language models can be exploited to guide agents’ decision-making.
Applications in broader domains, such as open-world video games and autonomous driving. Extending world models to embodied agents in both real-world environments and high-fidelity simulators. Key topics include integrating perception with control, sim-to-real transfer, continual learning and adaptation, and deploying agents in open-ended tasks such as Minecraft, autonomous driving, and interactive real-world scenarios.

Call for Papers

We invite submissions of original research papers related to building physically plausible world models.

Submission Types:

Opinion Papers (max 4 pages) - For preliminary results, interesting applications, or novel ideas that did not pan out in practice.
Research Papers (4 to 9 pages) - For original research contributions.

Submission Guidelines:

Submit your paper via OpenReview
Please follow the style guidelines of NeurIPS 2025.
Papers are non-archival - we welcome submissions that have been submitted to or accepted by other venues.
Papers already accepted to NeurIPS 2025 will undergo an expedited review process primarily evaluating their relevance to the workshop themes.
All accepted papers will be presented in a poster session

Important Dates: