INSTINCT

May 8, 2026 · View on GitHub

INSTINCT

Instance-Level Interaction Architecture for Query-Based Collaborative Perception

ICCV 2025 | arXiv

Yunjiang Xu, Lingzhi Li, Jin Wang, Yupeng Ouyang, Benyuan Yang

Paper arXiv License


Overview

INSTINCT is a collaborative 3D object detection framework for autonomous driving that achieves state-of-the-art accuracy while requiring only 1/281 to 1/264 of the communication bandwidth compared to existing methods. It introduces instance-level query-based interaction between cooperative agents (vehicles and infrastructure), overcoming the limitations of single-vehicle perception in long-range detection and occlusion scenarios.

INSTINCT Architecture

News

  • [2025.07] INSTINCT is accepted by ICCV 2025.
  • [2025.05] Code and configs are released.

Key Contributions

INSTINCT features three core components:

  1. Quality-Aware Filtering (QAF) — A learned filtering mechanism that selects high-quality instance features before transmission, suppressing noisy or redundant queries to reduce bandwidth and improve robustness.

  2. Dual-Branch Detection Routing (DDR) — Decouples collaboration-irrelevant instances (detectable by a single agent) from collaboration-relevant instances (requiring cross-agent information), routing them through separate detection pathways for more effective collaboration.

  3. Cross-Agent Local Instance Fusion (CALIF) — Aggregates local hybrid instance features from multiple agents using two sub-modules:

    • Cross-Domain Adaption (CDA): Aligns feature distributions between ego and cooperative agents.
    • Gaussian Distance Attention (GDA): Models spatial relationships with Gaussian-weighted distance attention for geometrically-aware fusion.

Additionally, an enhanced GT Sampling technique is introduced to facilitate training with diverse hybrid instance features.

Main Results

Comparison with State-of-the-Art

MethodFusion TypeDAIR-V2X AP50 / AP70V2XSet AP50 / AP70V2V4Real AP50 / AP70Comm (log2 B)
No Fusion63.49 / 49.6265.15 / 52.0439.80 / 22.000
Late Fusion60.43 / 37.4678.76 / 68.0155.00 / 26.708.39 / 9.60 / 9.79
V2VNetIntermediate63.44 / 42.2782.74 / 65.8264.70 / 33.6024.62
V2XViTIntermediate73.49 / 56.7584.31 / 70.1864.90 / 36.9024.62
DiscoNetIntermediate74.69 / 59.2087.78 / 72.0764.12 / 34.5124.62
CoBEVTIntermediate68.25 / 57.8784.54 / 71.1955.56 / 27.0824.62
Where2commIntermediate79.01 / 66.4992.59 / 84.9270.21 / 38.0121.72
CoAlignIntermediate77.97 / 65.4792.93 / 84.6672.09 / 46.5624.62
INSTINCT (Ours)Instance81.91 / 75.2992.29 / 87.3180.88 / 61.9613.58

INSTINCT achieves 13.23% relative improvement on DAIR-V2X (AP70) and 33.08% relative improvement on V2V4Real (AP50) over the previous best intermediate fusion method (CoAlign), while requiring ~10× less communication bandwidth.

Robustness to Pose Noise (DAIR-V2X)
Pose Noise (σt, σr)V2VNetV2XViTDiscoNetWhere2commCoAlignINSTINCT
AP@0.5
(0.0, 0.0)65.7273.4974.6979.0177.9781.91
(0.1, 0.1)64.7673.1374.3678.4777.5881.60
(0.2, 0.2)63.3671.9073.2676.1975.0877.82
(0.3, 0.3)61.4470.2172.0573.0172.2073.27
(0.4, 0.4)59.7469.1470.7170.4069.8971.44
AP@0.7
(0.0, 0.0)49.8256.7559.2066.4965.4775.29
(0.1, 0.1)48.7956.4258.9164.0363.2772.38
(0.2, 0.2)47.0555.5058.1260.2159.3364.51
(0.3, 0.3)46.2454.4557.4457.1857.2761.05
(0.4, 0.4)45.1253.6356.8055.8756.0459.85

INSTINCT maintains the highest robustness under localization noise, outperforming all methods at every noise level on the DAIR-V2X dataset.

Ablation Study (DAIR-V2X)
QAFDDRCDAGDAGT SamplingAP50 / AP70Comm (log2 B)
73.02 / 59.7817.67
69.62 / 60.4113.58
71.98 / 63.2113.58
78.98 / 71.0113.58
81.09 / 73.9413.58
81.91 / 75.2913.58

Installation

Prerequisites

  • Python >= 3.7
  • PyTorch >= 1.12
  • CUDA >= 11.6
  • spconv >= 2.x

Step 1: Clone the Repository

git clone https://github.com/CrazyShout/INSTINCT.git
cd INSTINCT

Step 2: Create Conda Environment

conda create -n instinct python=3.7.16
conda activate instinct

Step 3: Install PyTorch and spconv

# PyTorch 1.12 + CUDA 11.6
pip install torch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 --index-url https://download.pytorch.org/whl/cu116

# spconv
pip install spconv-cu116

Step 4: Install Dependencies

pip install -r requirements.txt

Step 5: Build CUDA Extensions

# Install the opencood package
python setup.py develop

# Build CUDA kernels (GPU required)
python opencood/utils/setup.py build_ext --inplace
python opencood/pcdet_utils/setup.py build_ext --inplace

Build Troubleshooting: If you encounter THC/THC.h errors during pcdet_utils compilation (common with PyTorch >= 2.0), comment out the following lines in the .cpp files under opencood/pcdet_utils/pointnet2/, roiaware_pool3d/, and iou3d_nms/:

// #include <THC/THC.h>
// extern THCState *state;

Step 6: Install OpenPCDet

cd ..
git clone https://github.com/open-mmlab/OpenPCDet.git
cd OpenPCDet
python setup.py develop
cd ../INSTINCT
Setup for Modern GPUs (RTX 50-series, CUDA 12.x)

For newer GPUs (e.g., RTX 5080 with sm_120), use PyTorch nightly with CUDA 12.8:

conda create -n instinct python=3.10
conda activate instinct

# PyTorch nightly for sm_120 support
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128

# spconv
pip install spconv-cu120

# Use compatible numba (0.48.0 is incompatible with Python 3.10)
pip install numba
pip install --no-deps -r requirements.txt

Then follow Steps 5–6 as above.

Data Preparation

The data preparation follows the same procedure as CoAlign and OpenCOOD.

For the DAIR-V2X dataset, please use the supplemented annotations provided by CoAlign.

Getting Started

Training

python opencood/tools/train_simple.py -y dairv2x opencood/hypes_yaml/dairv2x/lidar_only_with_noise/second_CQCPInstance_onecycle.yaml

The -y flag specifies the dataset name. Available options: dairv2x, opv2v, v2v4real, v2xsim, v2xset.

Configs are organized by dataset under opencood/hypes_yaml/<dataset>/.

Inference

python opencood/tools/inference_simple.py --model_dir opencood/logs/<experiment_dir>

Checkpoints

Pre-trained model weights will be released soon. Stay tuned!

Code Structure

INSTINCT/
├── opencood/
│   ├── models/
│   │   ├── second_boxattention_cqcp.py      # Main model entry
│   │   ├── comm_modules/
│   │   │   └── CQCP_head_instance.py        # INSTINCT detection head (QAF + DDR + CALIF)
│   │   └── sub_modules/
│   │       ├── CQCP_transformer.py          # Encoder-decoder transformer
│   │       └── matcher.py                   # Hungarian matching
│   ├── data_utils/datasets/
│   │   └── intermediate_v2_fusion_dataset.py # Data pipeline for INSTINCT
│   ├── hypes_yaml/                          # YAML configs per dataset
│   ├── pcdet_utils/                         # Custom CUDA kernels
│   ├── tools/                               # Training & inference scripts
│   └── visualization/                       # Visualization utilities
├── images/                                  # Demo and architecture figures
└── setup.py

Citation

If you find INSTINCT useful for your research, please consider citing:

@inproceedings{xu2025instinct,
  title={INSTINCT: Instance-Level Interaction Architecture for Query-Based Collaborative Perception},
  author={Xu, Yunjiang and Li, Lingzhi and Wang, Jin and Ouyang, Yupeng and Yang, Benyuan},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={25464--25473},
  year={2025}
}

Acknowledgements

This project is built upon the excellent collaborative perception codebases: