INSTINCT
May 8, 2026 · View on GitHub
INSTINCT
Instance-Level Interaction Architecture for Query-Based Collaborative Perception
Yunjiang Xu, Lingzhi Li, Jin Wang, Yupeng Ouyang, Benyuan Yang
Overview
INSTINCT is a collaborative 3D object detection framework for autonomous driving that achieves state-of-the-art accuracy while requiring only 1/281 to 1/264 of the communication bandwidth compared to existing methods. It introduces instance-level query-based interaction between cooperative agents (vehicles and infrastructure), overcoming the limitations of single-vehicle perception in long-range detection and occlusion scenarios.
News
- [2025.07] INSTINCT is accepted by ICCV 2025.
- [2025.05] Code and configs are released.
Key Contributions
INSTINCT features three core components:
-
Quality-Aware Filtering (QAF) — A learned filtering mechanism that selects high-quality instance features before transmission, suppressing noisy or redundant queries to reduce bandwidth and improve robustness.
-
Dual-Branch Detection Routing (DDR) — Decouples collaboration-irrelevant instances (detectable by a single agent) from collaboration-relevant instances (requiring cross-agent information), routing them through separate detection pathways for more effective collaboration.
-
Cross-Agent Local Instance Fusion (CALIF) — Aggregates local hybrid instance features from multiple agents using two sub-modules:
- Cross-Domain Adaption (CDA): Aligns feature distributions between ego and cooperative agents.
- Gaussian Distance Attention (GDA): Models spatial relationships with Gaussian-weighted distance attention for geometrically-aware fusion.
Additionally, an enhanced GT Sampling technique is introduced to facilitate training with diverse hybrid instance features.
Main Results
Comparison with State-of-the-Art
| Method | Fusion Type | DAIR-V2X AP50 / AP70 | V2XSet AP50 / AP70 | V2V4Real AP50 / AP70 | Comm (log2 B) |
|---|---|---|---|---|---|
| No Fusion | — | 63.49 / 49.62 | 65.15 / 52.04 | 39.80 / 22.00 | 0 |
| Late Fusion | — | 60.43 / 37.46 | 78.76 / 68.01 | 55.00 / 26.70 | 8.39 / 9.60 / 9.79 |
| V2VNet | Intermediate | 63.44 / 42.27 | 82.74 / 65.82 | 64.70 / 33.60 | 24.62 |
| V2XViT | Intermediate | 73.49 / 56.75 | 84.31 / 70.18 | 64.90 / 36.90 | 24.62 |
| DiscoNet | Intermediate | 74.69 / 59.20 | 87.78 / 72.07 | 64.12 / 34.51 | 24.62 |
| CoBEVT | Intermediate | 68.25 / 57.87 | 84.54 / 71.19 | 55.56 / 27.08 | 24.62 |
| Where2comm | Intermediate | 79.01 / 66.49 | 92.59 / 84.92 | 70.21 / 38.01 | 21.72 |
| CoAlign | Intermediate | 77.97 / 65.47 | 92.93 / 84.66 | 72.09 / 46.56 | 24.62 |
| INSTINCT (Ours) | Instance | 81.91 / 75.29 | 92.29 / 87.31 | 80.88 / 61.96 | 13.58 |
INSTINCT achieves 13.23% relative improvement on DAIR-V2X (AP70) and 33.08% relative improvement on V2V4Real (AP50) over the previous best intermediate fusion method (CoAlign), while requiring ~10× less communication bandwidth.
Robustness to Pose Noise (DAIR-V2X)
| Pose Noise (σt, σr) | V2VNet | V2XViT | DiscoNet | Where2comm | CoAlign | INSTINCT |
|---|---|---|---|---|---|---|
| AP@0.5 | ||||||
| (0.0, 0.0) | 65.72 | 73.49 | 74.69 | 79.01 | 77.97 | 81.91 |
| (0.1, 0.1) | 64.76 | 73.13 | 74.36 | 78.47 | 77.58 | 81.60 |
| (0.2, 0.2) | 63.36 | 71.90 | 73.26 | 76.19 | 75.08 | 77.82 |
| (0.3, 0.3) | 61.44 | 70.21 | 72.05 | 73.01 | 72.20 | 73.27 |
| (0.4, 0.4) | 59.74 | 69.14 | 70.71 | 70.40 | 69.89 | 71.44 |
| AP@0.7 | ||||||
| (0.0, 0.0) | 49.82 | 56.75 | 59.20 | 66.49 | 65.47 | 75.29 |
| (0.1, 0.1) | 48.79 | 56.42 | 58.91 | 64.03 | 63.27 | 72.38 |
| (0.2, 0.2) | 47.05 | 55.50 | 58.12 | 60.21 | 59.33 | 64.51 |
| (0.3, 0.3) | 46.24 | 54.45 | 57.44 | 57.18 | 57.27 | 61.05 |
| (0.4, 0.4) | 45.12 | 53.63 | 56.80 | 55.87 | 56.04 | 59.85 |
INSTINCT maintains the highest robustness under localization noise, outperforming all methods at every noise level on the DAIR-V2X dataset.
Ablation Study (DAIR-V2X)
| QAF | DDR | CDA | GDA | GT Sampling | AP50 / AP70 | Comm (log2 B) |
|---|---|---|---|---|---|---|
| 73.02 / 59.78 | 17.67 | |||||
| ✓ | 69.62 / 60.41 | 13.58 | ||||
| ✓ | ✓ | 71.98 / 63.21 | 13.58 | |||
| ✓ | ✓ | ✓ | 78.98 / 71.01 | 13.58 | ||
| ✓ | ✓ | ✓ | ✓ | 81.09 / 73.94 | 13.58 | |
| ✓ | ✓ | ✓ | ✓ | ✓ | 81.91 / 75.29 | 13.58 |
Installation
Prerequisites
- Python >= 3.7
- PyTorch >= 1.12
- CUDA >= 11.6
- spconv >= 2.x
Step 1: Clone the Repository
git clone https://github.com/CrazyShout/INSTINCT.git
cd INSTINCT
Step 2: Create Conda Environment
conda create -n instinct python=3.7.16
conda activate instinct
Step 3: Install PyTorch and spconv
# PyTorch 1.12 + CUDA 11.6
pip install torch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 --index-url https://download.pytorch.org/whl/cu116
# spconv
pip install spconv-cu116
Step 4: Install Dependencies
pip install -r requirements.txt
Step 5: Build CUDA Extensions
# Install the opencood package
python setup.py develop
# Build CUDA kernels (GPU required)
python opencood/utils/setup.py build_ext --inplace
python opencood/pcdet_utils/setup.py build_ext --inplace
Build Troubleshooting: If you encounter
THC/THC.herrors during pcdet_utils compilation (common with PyTorch >= 2.0), comment out the following lines in the.cppfiles underopencood/pcdet_utils/pointnet2/,roiaware_pool3d/, andiou3d_nms/:// #include <THC/THC.h> // extern THCState *state;
Step 6: Install OpenPCDet
cd ..
git clone https://github.com/open-mmlab/OpenPCDet.git
cd OpenPCDet
python setup.py develop
cd ../INSTINCT
Setup for Modern GPUs (RTX 50-series, CUDA 12.x)
For newer GPUs (e.g., RTX 5080 with sm_120), use PyTorch nightly with CUDA 12.8:
conda create -n instinct python=3.10
conda activate instinct
# PyTorch nightly for sm_120 support
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128
# spconv
pip install spconv-cu120
# Use compatible numba (0.48.0 is incompatible with Python 3.10)
pip install numba
pip install --no-deps -r requirements.txt
Then follow Steps 5–6 as above.
Data Preparation
The data preparation follows the same procedure as CoAlign and OpenCOOD.
For the DAIR-V2X dataset, please use the supplemented annotations provided by CoAlign.
Getting Started
Training
python opencood/tools/train_simple.py -y dairv2x opencood/hypes_yaml/dairv2x/lidar_only_with_noise/second_CQCPInstance_onecycle.yaml
The -y flag specifies the dataset name. Available options: dairv2x, opv2v, v2v4real, v2xsim, v2xset.
Configs are organized by dataset under opencood/hypes_yaml/<dataset>/.
Inference
python opencood/tools/inference_simple.py --model_dir opencood/logs/<experiment_dir>
Checkpoints
Pre-trained model weights will be released soon. Stay tuned!
Code Structure
INSTINCT/
├── opencood/
│ ├── models/
│ │ ├── second_boxattention_cqcp.py # Main model entry
│ │ ├── comm_modules/
│ │ │ └── CQCP_head_instance.py # INSTINCT detection head (QAF + DDR + CALIF)
│ │ └── sub_modules/
│ │ ├── CQCP_transformer.py # Encoder-decoder transformer
│ │ └── matcher.py # Hungarian matching
│ ├── data_utils/datasets/
│ │ └── intermediate_v2_fusion_dataset.py # Data pipeline for INSTINCT
│ ├── hypes_yaml/ # YAML configs per dataset
│ ├── pcdet_utils/ # Custom CUDA kernels
│ ├── tools/ # Training & inference scripts
│ └── visualization/ # Visualization utilities
├── images/ # Demo and architecture figures
└── setup.py
Citation
If you find INSTINCT useful for your research, please consider citing:
@inproceedings{xu2025instinct,
title={INSTINCT: Instance-Level Interaction Architecture for Query-Based Collaborative Perception},
author={Xu, Yunjiang and Li, Lingzhi and Wang, Jin and Ouyang, Yupeng and Yang, Benyuan},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={25464--25473},
year={2025}
}
Acknowledgements
This project is built upon the excellent collaborative perception codebases: