Linking Modality Isolation in Heterogeneous Collaborative Perception.(CVPR2026)
June 16, 2026 · View on GitHub
Official implementation of Linking Modality Isolation in Heterogeneous Collaborative Perception (paper).
This paper studies modality isolation, a heterogeneous collaborative perception setting where agents with different modalities never co-occur in the same training frame.
CodeAlign links isolated modalities through feature-code-feature translation, aligning heterogeneous features without requiring cross-modal co-occurrence.

Environment
This repository follows the environment setup of HEAL. Please refer to the HEAL installation instructions for dependency installation, CUDA/PyTorch setup, and OpenCOOD-style package setup.
Dataset
Create a dataset/ folder under the repository root. The main experiments use OPV2V and DAIR-V2X-C:
CodeAlign/
└── dataset/
├── OPV2V/
│ ├── train/
│ ├── validate/
│ └── test/
└── my_dair_v2x/
└── v2x_c/
└── cooperative-vehicle-infrastructure/
├── train.json
└── val.json
Dataset download references follow HEAL:
| Dataset | Download |
|---|---|
| OPV2V | Follow OpenCOOD. HEAL additionally notes that additional-001.zip is needed for camera modality data. |
| DAIR-V2X-C | Download from the DAIR-V2X page, and follow the complemented annotation instructions. |
| OPV2V-H | Optional, available on Hugging Face. |
| V2XSet | Optional, follow V2X-ViT. |
| V2X-Sim 2.0 | Optional, download from the V2X-Sim page and its accompanying pickle files. |
Modality Setting
Modality definitions and scenario assignments are stored in:
opencood/modality_assign/
Important files include:
opencood/modality_assign/10modality_define.yaml
opencood/modality_assign/10modality_define_codebook16.yaml
opencood/modality_assign/10modality_define_dair.yaml
opencood/modality_assign/opv2v_4modality.json
opencood/modality_assign/opv2v_4modality_in_order.json
In the paper setting, m1 is PointPillar LiDAR, m2 is SECOND LiDAR, m3 is VoxelNet LiDAR, m4/m5 are LiDAR variants, and m6/m7 are LSS camera variants.
The opv2v_4modality*.json files assign vehicle IDs to scenario slots. For CodeAlign experiments, the paper-level modality semantics come from 10modality_define*.yaml, and each YAML's heter.mapping_dict maps those scenario slots to the intended paper modalities.
Training
Training is mainly controlled by YAML configs. Standard single-process training:
python opencood/tools/train.py -y ${CONFIG_FILE}
Distributed training:
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch \
--nproc_per_node=2 --use_env \
opencood/tools/train_ddp.py \
-y ${CONFIG_FILE}
CodeAlign YAML templates are under:
opencood/hypes_yaml/opv2v/Heter_group/codebook/
opencood/hypes_yaml/dairv2x/Heter/
A CodeAlign YAML generally contains:
root_dir: dataset/OPV2V/train
validate_dir: dataset/OPV2V/validate
test_dir: dataset/OPV2V/test
yaml_parser: load_general_params_heter_group
heter:
definition_path: opencood/modality_assign/10modality_define.yaml
assignment_path: opencood/modality_assign/opv2v_4modality.json
ego_modality: ['m7']
heter_group: [['m7']]
fusion:
core_method: intermediateheter
model:
core_method: heter_codebook_shared_head_c2c
args:
only_train_translator: true
backend_modality: ['m1', 'm6']
codebook:
seg_num: 1
dict_size: 16
r: 1
translator:
core_method: convnext
loss:
core_method: point_pillar_pyramid_loss
optimizer:
core_method: Adam
Key CodeAlign-specific parameters:
| Parameter | Meaning |
|---|---|
model.core_method | Selects the CodeAlign model, e.g. heter_codebook_shared_head, heter_codebook_shared_head_c2c, or heter_codebook_shared_head_c2c_infer. |
heter.definition_path | Modality definition file. |
heter.assignment_path | Vehicle-to-modality assignment for heterogeneous OPV2V evaluation/training. |
heter.ego_modality / heter.heter_group | Defines source modalities used in the current training/evaluation stage. |
fix_encoder / fix_backend | Freezes pretrained encoder/backbone or backend modules during codebook/alignment training. |
aligner / aligner_model | Defines and optionally loads modality-specific aligners. |
codebook | Controls codebook structure, including seg_num, dict_size, and r. |
only_train_translator | Enables CodeAlign stage-2 translator-only training. |
backend_modality / backend | Specifies target backend modalities and pretrained backend checkpoints. |
translator | Defines the FCF translator architecture and feature/code shape. |
use_coded_feature, use_codemap, use_d2d | Ablation switches for different translation/representation variants. |
The typical CodeAlign training flow is:
Stage 0: single-modality pretraining
train each modality-specific detector independently
use the resulting single-modality checkpoints as frozen or initialized backends
Stage 1: code space / group formation
heter_codebook_shared_head
train aligner + codebook while keeping pretrained perception backend fixed
Stage 2: feature-code-feature translation
heter_codebook_shared_head_c2c
load source aligner and target backend/codebooks
train translator only
Inference
Fixed-order heterogeneous inference, used for the main OPV2V/DAIR-V2X tables:
CUDA_VISIBLE_DEVICES=0 python opencood/tools/inference_heter_in_order.py \
--model_dir opencood/logs/path_to_checkpoint \
--fusion_method intermediate \
--range 102.4,102.4 \
--use_cav '[2]'
Late fusion:
CUDA_VISIBLE_DEVICES=0 python opencood/tools/inference_heter_in_order.py \
--model_dir opencood/logs/path_to_late_checkpoint \
--fusion_method late \
--range 102.4,102.4 \
--use_cav '[2]'
Pair-average heterogeneous inference:
CUDA_VISIBLE_DEVICES=0 python opencood/tools/inference_heter_task_average.py \
--model_dir opencood/logs/path_to_checkpoint \
--fusion_method intermediate \
--range 102.4,102.4
Standard homogeneous or single-model inference:
CUDA_VISIBLE_DEVICES=0 python opencood/tools/inference.py \
--model_dir opencood/logs/path_to_checkpoint \
--fusion_method intermediate
Noise robustness for fixed-order heterogeneous inference can be evaluated with:
CUDA_VISIBLE_DEVICES=0 python opencood/tools/inference_heter_in_order.py \
--model_dir opencood/logs/path_to_checkpoint \
--fusion_method intermediate \
--range 102.4,102.4 \
--use_cav '[2]' \
--noise 0.2
Acknowledgements
This implementation is based on code from several repositories. We especially thank HEAL for its heterogeneous collaborative perception codebase.
Citation
If you find this project useful, please cite:
@inproceedings{liu2026linking,
title={Linking Modality Isolation in Heterogeneous Collaborative Perception},
author={Liu, Changxing and Chao, Zichen and Chen, Siheng},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2026}
}