README.md

March 26, 2026 · View on GitHub

ColaVLA: Leveraging Cognitive Latent Reasoning for Hierarchical Parallel Trajectory Planning in Autonomous Driving

🔥 CVPR 2026 🔥

Qihang Peng^1,2,3 Xuesong Chen^2,3 Chenye Yang¹ Shaoshuai Shi³ Hongsheng Li²
¹Tsinghua University ²CUHK MMLab ³Voyager Research, Didi Chuxing

🔥 News

[2026-03] Model weights and detailed reproduction instructions are released ! 🔥🔥🔥
[2026-03] Training and evaluation scripts for [CVPR2026] ColaVLA are released ! We also open-source the code for [CVPR2025] SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving, an innovative framework that synergizes VLMs with end-to-end models to enhance autonomous vehicle planning, along with the corresponding config and checkpoint.
[2026-02] Our paper was accept by CVPR2026 ! 🥳
[2025-12] We release the paper and the project page for ColaVLA.

🎄 Overview

ColaVLA is a unified vision–language–action framework for autonomous driving trajectory planning. While VLMs provide strong priors and commonsense reasoning, VLM-based planners often suffer from:

mismatch between discrete text reasoning and continuous control,
high latency from autoregressive chain-of-thought decoding, and
non-causal or inefficient planning that hinders real-time deployment.

ColaVLA addresses these issues by transferring reasoning from text to a compact latent space and decoding multi-scale trajectories in parallel.

📝 TODO

[x] Release paper and project page.
[x] Release training / evaluation code.
[x] Release model checkpoints.
[x] Provide detailed reproduction instructions.

📚 Getting Started

⭐ Motivation (Reasoning Paradigm)

We propose Cognitive Latent Reasoning to relocate chain-of-thought from discrete text to a compact latent space, reducing latency while preserving VLM generalization and interpretability.

📖 Framework

ColaVLA consists of two key components:

Cognitive Latent Reasoner: compresses multimodal scene understanding into compact, decision-oriented meta-action embeddings with ego-adaptive selection and only a small number of VLM passes.
Hierarchical Parallel Planner: generates multi-scale, causality-consistent trajectories in a single forward pass with a hierarchical decoder and a hybrid attention mask.

Qihang Peng: pqh22@mails.tsinghua.edu.cn

🔗 Citation

If you find our work helpful, please cite:

@misc{peng2025colavlaleveragingcognitivelatent,
      title={ColaVLA: Leveraging Cognitive Latent Reasoning for Hierarchical Parallel Trajectory Planning in Autonomous Driving}, 
      author={Qihang Peng and Xuesong Chen and Chenye Yang and Shaoshuai Shi and Hongsheng Li},
      year={2025},
      eprint={2512.22939},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2512.22939}, 
}