README.md
March 26, 2026 Β· View on GitHub
ColaVLA: Leveraging Cognitive Latent Reasoning for Hierarchical Parallel Trajectory Planning in Autonomous Driving
π₯ CVPR 2026 π₯
Qihang Peng1,2,3β
Xuesong Chen2,3β
Chenye Yang1β
Shaoshuai Shi3β
Hongsheng Li2
1Tsinghua UniversityΒ Β
2CUHK MMLabΒ Β
3Voyager Research, Didi Chuxing
π₯ News
- [2026-03] Model weights and detailed reproduction instructions are released ! π₯π₯π₯
- [2026-03] Training and evaluation scripts for [CVPR2026] ColaVLA are released ! We also open-source the code for [CVPR2025] SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving, an innovative framework that synergizes VLMs with end-to-end models to enhance autonomous vehicle planning, along with the corresponding config and checkpoint.
- [2026-02] Our paper was accept by CVPR2026 ! π₯³
- [2025-12] We release the paper and the project page for ColaVLA.
π Overview
ColaVLA is a unified visionβlanguageβaction framework for autonomous driving trajectory planning. While VLMs provide strong priors and commonsense reasoning, VLM-based planners often suffer from:
- mismatch between discrete text reasoning and continuous control,
- high latency from autoregressive chain-of-thought decoding, and
- non-causal or inefficient planning that hinders real-time deployment.
ColaVLA addresses these issues by transferring reasoning from text to a compact latent space and decoding multi-scale trajectories in parallel.
π TODO
- [x] Release paper and project page.
- [x] Release training / evaluation code.
- [x] Release model checkpoints.
- [x] Provide detailed reproduction instructions.
π Getting Started
β Motivation (Reasoning Paradigm)
We propose Cognitive Latent Reasoning to relocate chain-of-thought from discrete text to a compact latent space, reducing latency while preserving VLM generalization and interpretability.
π Framework
ColaVLA consists of two key components:
- Cognitive Latent Reasoner: compresses multimodal scene understanding into compact, decision-oriented meta-action embeddings with ego-adaptive selection and only a small number of VLM passes.
- Hierarchical Parallel Planner: generates multi-scale, causality-consistent trajectories in a single forward pass with a hierarchical decoder and a hybrid attention mask.
π Results
We report strong performance on nuScenes in both open-loop and closed-loop evaluations, with favorable efficiency and robustness. Please see the paper for full tables, metrics, and ablations.
π Visualization
Qualitative examples show robust multi-scale trajectory planning under complex multi-agent interactions and safety-critical scenarios.
π¬ Contact
If you have questions about the paper, feel free to open an issue or contact:
- Qihang Peng:
pqh22@mails.tsinghua.edu.cn
π Citation
If you find our work helpful, please cite:
@misc{peng2025colavlaleveragingcognitivelatent,
title={ColaVLA: Leveraging Cognitive Latent Reasoning for Hierarchical Parallel Trajectory Planning in Autonomous Driving},
author={Qihang Peng and Xuesong Chen and Chenye Yang and Shaoshuai Shi and Hongsheng Li},
year={2025},
eprint={2512.22939},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2512.22939},
}