๐ฆ BOLT
May 11, 2026 ยท View on GitHub
๐ฆ BOLT
Online Lightweight Adaptation for Preparation-Free Heterogeneous Cooperative Perception
Drop a ~0.9M-parameter plugin between your detector and a stranger's detector,
adapt it on the fly with the ego's own predictions, and recover useful cooperation
without any pre-deployment joint training, without any labels, and without any cooperative data.
๐ฐ News
- 2026-05 ย Paper submitted to a peer-reviewed venue.
- 2026-05 ย Code is publicly released ๐
- 2026-05 ย arXiv preprint will be linked here once the ID is assigned.
โจ TL;DR
Most "heterogeneous cooperative perception" methods assume that all agents pre-train together โ shared protocols, joint optimization, or collaborator-specific calibration data. This breaks the moment two agents from different vendors meet on the road.
BOLT assumes nothing of the sort. Each agent ships with its own independently trained single-agent detector. When two agents meet, BOLT inserts a tiny adaptive plugin between the neighbor's feature stream and the ego's frozen fusion module, and adapts the plugin online via ego-as-teacher distillation. No labels. No cooperative training data. No shared protocol.
Across DAIR-V2X and OPV2V, with multiple LiDAR/camera encoder pairs and multiple fusion strategies, BOLT consistently turns degraded preparation-free cooperation into useful cooperation that surpasses ego-only performance.
โ Highlights
| Feature | Description | |
|---|---|---|
| 1 | ๐ Preparation-free | Each agent uses its own independently trained detector. No joint training. No shared protocol. |
| 2 | ๐ Online adaptation | Plugin parameters are updated at deployment by single-pass test-time distillation. |
| 3 | ๐ชถ Lightweight | About 0.9M trainable parameters. Encoders, fusion module, and detection head are all frozen. |
| 4 | ๐ท๏ธ Label-free, data-free | Ego predictions act as teacher โ no labels, no cooperative training data. |
| 5 | ๐งฉ Plug-and-play | Works across PointPillars / SECOND / Lift-Splat-Shoot encoders and across multiple fusion strategies. |
๐ Results
Evaluated on DAIR-V2X (real V2I) and OPV2V (simulated V2V).
In the preparation-free setting, vanilla unadapted fusion typically falls below ego-only detection โ cooperation actively hurts. BOLT reverses this:
- ๐ Up to +32.3 AP@50 over unadapted fusion in the preparation-free setting.
- ๐ Surpasses ego-only on every evaluated encoder pair across DAIR-V2X and OPV2V.
- ๐ชถ Trains only ~0.9M parameters per plugin.
ย ย
Left: performance across encoder pairs โ vanilla fusion (red) drops below ego-only, BOLT (blue) consistently surpasses it. ย ย Right: AP@50 improves online as more frames stream in.
Qualitative BEV โ with vs. without BOLT
Each pair shares the same scene; the left tile uses vanilla cross-agent fusion, the right tile uses BOLT. Green = ground truth, red = predictions. Without BOLT, fused predictions drift, duplicate, or miss objects in the neighbor's view; with BOLT, predictions snap back onto the true objects.
See the paper for full tables, ablations, and additional qualitative results.
โ๏ธ Installation
git clone https://github.com/sidiangongyuan/BOLT.git
cd BOLT
# Conda env
conda create -n bolt python=3.8 pytorch==1.12.0 torchvision==0.13.0 cudatoolkit=11.6 \
-c pytorch -c conda-forge
conda activate bolt
# Python packages
pip install -r requirements.txt
pip install spconv-cu116 # match your CUDA version
# Project-local extensions
python setup.py develop
python opencood/utils/setup.py build_ext --inplace
The codebase reuses some infrastructure from HEAL and OpenCOOD; if you run into platform-specific issues with
spconv/cumm, those repos' troubleshooting tips also apply here.
๐ฆ Data Preparation
We evaluate on the following datasets:
| Dataset | Used | Source |
|---|---|---|
| DAIR-V2X-C | โ | DAIR-V2X with complemented annotations |
| OPV2V | โ | OpenCOOD (also additional-001.zip for camera) |
| OPV2V-H | โณ TODO | Hugging Face โ not used in this work; planned for future support. |
| V2X-Real | โณ TODO | Official site โ not used in this work; planned for future support. |
Each dataset's directory layout is configured through the YAML files under opencood/hypes_yaml/. Update the dataset paths there before training or inference.
๐ Quick Start
BOLT's deployment-time contribution is a single command: a frozen detector pair plus an online-adapted plugin. Before running it, you need a trained base model (single-agent detectors and a fused backbone). The minimal reproducible flow is:
Step 1 โ Train your single-agent detectors
Train one detector per modality / encoder using your own data. Configs are under opencood/hypes_yaml/:
python -m opencood.tools.train \
-y opencood/hypes_yaml/dairv2x/HEAL/lidar_pyramid_local.yaml
Repeat for each agent's modality (e.g., LiDAR PointPillars, LiDAR SECOND, camera Lift-Splat-Shoot). These detectors are trained independently โ no cross-agent coordination.
Step 2 โ Build the heterogeneous base checkpoint
Assemble the single-agent encoders into a base model that BOLT will plug into. Encoders, the fusion module, and the detection head will be frozen afterwards.
python -m opencood.tools.train \
-y opencood/hypes_yaml/dairv2x/HEAL/lidar_pp_second_stage2.yaml \
--stage1_model_dir <path_to_stage1_checkpoints>
Step 3 โ Run BOLT online adaptation ๐ข (this is BOLT)
At deployment, the ~0.9M-parameter plugin is the only trainable component. It is updated online: one gradient step per incoming test sample, supervised by the ego detector's high-confidence predictions (no labels, no cooperative training data).
python -m opencood.tools.online_adapt \
--model_dir <path_to_base_checkpoint> \
--output_dir <output_path> \
--lr 1e-4 --epochs 1 \
--teacher_conf_thresh 0.3 \
--boost_weight 0.1 --boost_lo 0.1 --boost_hi 0.3
Evaluate
python -m opencood.tools.inference --model_dir <path_to_checkpoint>
For ready-made runners, see scripts/inference/inference.sh and the multi-agent demos under scripts/more_agents/.
๐ Project Structure
opencood/
โโโ models/
โ โโโ plugin/ # BOLT adaptive plugin (AdaIN + residual + gate)
โ โโโ heter_pyramid_collab.py # Heterogeneous pyramid model
โ โโโ heter_encoders.py # Multi-modality encoder registry
โ โโโ fuse_modules/ # Fusion strategies (pyramid, attention, ...)
โโโ tools/
โ โโโ online_adapt.py # ๐ข Online TTT with ego-as-teacher distillation
โ โโโ train.py # Standard training
โ โโโ inference.py # Evaluation
โโโ hypes_yaml/ # Configs for DAIR-V2X, OPV2V
โโโ data_utils/ # Dataset loaders + pre/post processors
scripts/
โโโ inference/ # Off-the-shelf inference
โโโ train/ # End-to-end training
โโโ more_agents/ # Multi-agent (3+ cars) assembly + adaptation
## ๐ Acknowledgements
This codebase is built on top of [**HEAL**](https://github.com/yifanlu0227/HEAL) and [**OpenCOOD**](https://github.com/DerrickXuNu/OpenCOOD). We thank the authors of DAIR-V2X, OPV2V, OPV2V-H, and V2X-Real for releasing the datasets that made this work possible.
๐ Citation
If you find BOLT useful, please cite:
@article{bolt2026,
title = {BOLT: Online Lightweight Adaptation for Preparation-Free Heterogeneous Cooperative Perception},
author = {Yang, Kang and Bu, Tianci and Wang, Peng and Li, Deying and Wang, Yongcai},
journal = {arXiv preprint 2605.00405},
year = {2026}
}
๐ License
Released under the MIT License.