README.md

March 26, 2026 Β· View on GitHub


ColaVLA: Leveraging Cognitive Latent Reasoning for Hierarchical Parallel Trajectory Planning in Autonomous Driving

πŸ”₯ CVPR 2026 πŸ”₯

Qihang Peng1,2,3  Xuesong Chen2,3  Chenye Yang1  Shaoshuai Shi3  Hongsheng Li2
1Tsinghua UniversityΒ Β  2CUHK MMLabΒ Β  3Voyager Research, Didi Chuxing

arXiv


πŸ”₯ News


πŸŽ„ Overview

ColaVLA is a unified vision–language–action framework for autonomous driving trajectory planning. While VLMs provide strong priors and commonsense reasoning, VLM-based planners often suffer from:

  1. mismatch between discrete text reasoning and continuous control,
  2. high latency from autoregressive chain-of-thought decoding, and
  3. non-causal or inefficient planning that hinders real-time deployment.

ColaVLA addresses these issues by transferring reasoning from text to a compact latent space and decoding multi-scale trajectories in parallel.


πŸ“ TODO

  • [x] Release paper and project page.
  • [x] Release training / evaluation code.
  • [x] Release model checkpoints.
  • [x] Provide detailed reproduction instructions.

πŸ“š Getting Started

  1. Environment Setup
  2. Train&Inference

⭐ Motivation (Reasoning Paradigm)

Illustration

We propose Cognitive Latent Reasoning to relocate chain-of-thought from discrete text to a compact latent space, reducing latency while preserving VLM generalization and interpretability.


πŸ“– Framework

Framework

ColaVLA consists of two key components:

  • Cognitive Latent Reasoner: compresses multimodal scene understanding into compact, decision-oriented meta-action embeddings with ego-adaptive selection and only a small number of VLM passes.
  • Hierarchical Parallel Planner: generates multi-scale, causality-consistent trajectories in a single forward pass with a hierarchical decoder and a hybrid attention mask.

πŸ“Š Results

Open-loop Results
Closed-loop Results

We report strong performance on nuScenes in both open-loop and closed-loop evaluations, with favorable efficiency and robustness. Please see the paper for full tables, metrics, and ablations.


πŸ‘€ Visualization

Visualization

Qualitative examples show robust multi-scale trajectory planning under complex multi-agent interactions and safety-critical scenarios.


πŸ“¬ Contact

If you have questions about the paper, feel free to open an issue or contact:

  • Qihang Peng: pqh22@mails.tsinghua.edu.cn

πŸ”— Citation

If you find our work helpful, please cite:

@misc{peng2025colavlaleveragingcognitivelatent,
      title={ColaVLA: Leveraging Cognitive Latent Reasoning for Hierarchical Parallel Trajectory Planning in Autonomous Driving}, 
      author={Qihang Peng and Xuesong Chen and Chenye Yang and Shaoshuai Shi and Hongsheng Li},
      year={2025},
      eprint={2512.22939},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2512.22939}, 
}