ChronoTrack

April 29, 2026 · View on GitHub

Temporally Consistent Long-Term Memory for 3D Single Object Tracking [arXiv]

Jaejoon Yoo, SuBeen Lee, Yerim Jeon, Miso Lee, Jae-Pil Heo

CVPR 2026 Findings


Overview

ChronoTrack is the first long-term memory framework for 3D single object tracking (3D-SOT).

Problem. Recent memory-based 3D-SOT methods store point-level features from a few past frames (typically 2–3) as reference templates. Naïvely extending this memory to longer horizons degrades tracking accuracy due to temporal feature inconsistency — cosine similarity between a target's LiDAR features decreases as temporal distance grows, causing mismatches when attending to distant frames. Furthermore, point-level storage grows proportionally with memory length, making long-term extension impractical.

Solution. ChronoTrack replaces per-point feature storage with a compact set of learnable foreground memory tokens that are recurrently updated at each timestep. These tokens serve as a compressed, temporally persistent summary of the target's appearance. Short-term background context (single previous frame) is maintained separately to provide local scene contrast. Two novel training objectives ensure the tokens remain informative over long horizons:

  • Temporal Consistency Loss (L_TC). Foreground points from different frames are transformed into a shared canonical coordinate system (center subtracted, rotation applied). Nearest-neighbor pairs are found across frames and filtered by a distance threshold; SmoothL1 loss is applied to the corresponding feature pairs. This encourages part-level feature consistency across time, directly combating temporal feature drift.

  • Memory Cycle Consistency Loss (L_MCC). A two-step cyclic walk — memory tokens → scene points → memory tokens — is performed via softmax cosine similarity. A cycle-consistency cross-entropy term penalizes tokens that fail to return to themselves, while a foreground affinity term encourages tokens to specialize in target-relevant regions. Together these losses push the tokens to encode diverse, target-specific semantics.


Installation

We provide a pre-built Docker image with all dependencies installed.

docker pull jaejoonyoo/3dsot:chronotrack

Option 2: Manual Setup

The codebase is tested with the following environment:

PackageVersion
Python3.10.14
PyTorch2.3.1
Torchvision0.18.1
CUDA11.8
PyTorch3D0.7.8
Lightning2.4.0
# 1. Clone repository
git clone https://github.com/ujaejoon/ChronoTrack.git
cd ChronoTrack

# 2. Create conda environment
conda create -n chronotrack python=3.10
conda activate chronotrack

# 3. Install PyTorch (with CUDA 11.8)
pip install torch==2.3.1 torchvision==0.18.1 --index-url https://download.pytorch.org/whl/cu118

# 4. Install NumPy (must be pinned before other packages to avoid numpy 2.x conflicts)
pip install numpy==1.24.0

# 5. Install PyTorch3D (built from source; requires a C++ compiler and CUDA toolkit)
pip install "git+https://github.com/facebookresearch/pytorch3d.git@V0.7.8"

# 6. Install remaining dependencies
pip install lightning==2.4.0 addict==2.4.0 pyquaternion==0.9.9
pip install nuscenes-devkit==1.1.11 tensorboard==2.12.3
pip install pyyaml==6.0.1 pandas==1.5.3 tqdm==4.66.4

Data Preparation

KITTI

  • Download the data for velodyne, calib and label_02 from KITTI Tracking.

  • Unzip the downloaded files and organize as follows:

    [Parent Folder]
    --> [calib]
        --> {0000-0020}.txt
    --> [label_02]
        --> {0000-0020}.txt
    --> [velodyne]
        --> [0000-0020] folders with velodyne .bin files
    

NuScenes

  • Download the dataset from the download page.

  • Extract the downloaded files and make sure you have the following structure:

    [Parent Folder]
      samples   -   Sensor data for keyframes.
      sweeps    -   Sensor data for intermediate frames.
      maps      -   Folder for all map files: rasterized .png images and vectorized .json files.
      v1.0-*    -   JSON tables that include all the meta data and annotations. Each split (trainval, test, mini) is provided in a separate folder.
    

Note: We use the train_track split to train our model and test it with the val split. Both splits are officially provided by NuScenes. During testing, we ignore the sequences where there is no point in the first given bbox.

Waymo

  • We follow the benchmark created by LiDAR-SOT based on the Waymo Open Dataset. You can download and process the Waymo dataset as guided by LiDAR_SOT, and use our code to test model performance on this benchmark.

  • The following processing results are necessary:

    [waymo_sot]
        [benchmark]
            [validation]
                [vehicle]
                    bench_list.json
                    easy.json
                    medium.json
                    hard.json
                [pedestrian]
                    bench_list.json
                    easy.json
                    medium.json
                    hard.json
        [pc]
            [raw_pc]
                Here are some segment.npz files containing raw point cloud data
        [gt_info]
            Here are some segment.npz files containing tracklet and bbox data
    

Update data_root_dir in the corresponding config files to point to your data paths.


Training

All models were trained using 2x NVIDIA RTX 4090 or RTX 3090 GPUs.

# KITTI - Car
python main.py configs/kitti/chronotrack_kitti_car.yaml --gpus 0 1 --phase train

# NuScenes - Car
python main.py configs/nuscenes/chronotrack_nuscenes_car.yaml --gpus 0 1 --phase train

Arguments:

ArgumentDescriptionDefault
configPath to config filerequired
--gpusGPU device indicesrequired
--phasetrain or testtrain
--workspace PATHDirectory to save checkpoints and logs./workspace
--run_name NAMEName for this run (used as subdirectory under workspace)config filename
--resume_from PATHPath to checkpoint file to resume fromNone
--seed SEEDRandom seedNone
--debugDebug mode (10 epochs, batch size 2/GPU, no checkpoint saved)False

Evaluation

# KITTI - Car
python main.py configs/kitti/chronotrack_kitti_car.yaml --phase test --gpus 0 \
    --resume_from /path/to/checkpoint.ckpt

# NuScenes - Car
python main.py configs/nuscenes/chronotrack_nuscenes_car.yaml --phase test --gpus 0 \
    --resume_from /path/to/checkpoint.ckpt

# Waymo - Vehicle
python main.py configs/waymo/chronotrack_waymo_vehicle.yaml --phase test --gpus 0 \
    --resume_from /path/to/checkpoint.ckpt

Model Zoo

TODO: Upload checkpoint files and add download links.

DatasetCategoryConfigCheckpoint
KITTICarchronotrack_kitti_car.yamlTBD
KITTIPedestrianchronotrack_kitti_ped.yamlTBD
KITTIVanchronotrack_kitti_van.yamlTBD
KITTICyclistchronotrack_kitti_cyc.yamlTBD
NuScenesCarchronotrack_nuscenes_car.yamlTBD
NuScenesPedestrianchronotrack_nuscenes_ped.yamlTBD
NuScenesTruckchronotrack_nuscenes_truck.yamlTBD
NuScenesBuschronotrack_nuscenes_bus.yamlTBD
NuScenesTrailerchronotrack_nuscenes_trailer.yamlTBD
WaymoVehiclechronotrack_waymo_vehicle.yamlTBD
WaymoPedestrianchronotrack_waymo_pedestrian.yamlTBD

Citation

@misc{yoo2026temporallyconsistentlongtermmemory,
      title={Temporally Consistent Long-Term Memory for 3D Single Object Tracking},
      author={Jaejoon Yoo and SuBeen Lee and Yerim Jeon and Miso Lee and Jae-Pil Heo},
      year={2026},
      eprint={2604.13789},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2604.13789},
}

Acknowledgements

This codebase is built upon MBPTrack. We thank the authors for their open-source contribution.