CoMAP: Co-Evolving World Models and Agent Policies for LLM Agents

June 3, 2026 · View on GitHub

arXiv License

Youwei Liu, Jian Wang†, Hanlin Wang, Wenjie Li

Overview

Figure 1. Conceptual illustration of CoMAP.
CoMAP co-evolves the world model and the agent policy in a closed loop. The world model provides lookahead states for policy improvement, while the agent policy generates on-policy interactions for world-model updating.

Figure 2. Framework overview.
At each step, the agent first drafts an action, the world model predicts its future state, and the policy performs future-aware reflection to refine the action. The resulting trajectories are used for on-policy self-distillation of the world model and policy-side evolution.


Citation

If you find this work helpful, please consider citing:

@article{liu2026comap,
  title   = {Co-Evolving World Models and Agent Policies for LLM Agents},
  author  = {Liu, Youwei and Wang, Jian and Wang, Hanlin and Li, Wenjie},
  journal = {arXiv preprint arXiv:2606.02372},
  year    = {2026},
  url     = {https://arxiv.org/abs/2606.02372}
}

Contact

For questions, please contact:

loyiv5477@gmail.com