ChronoTrack

April 29, 2026 · View on GitHub

Temporally Consistent Long-Term Memory for 3D Single Object Tracking [arXiv]

Jaejoon Yoo, SuBeen Lee, Yerim Jeon, Miso Lee, Jae-Pil Heo

CVPR 2026 Findings

Overview

ChronoTrack is the first long-term memory framework for 3D single object tracking (3D-SOT).

Problem. Recent memory-based 3D-SOT methods store point-level features from a few past frames (typically 2–3) as reference templates. Naïvely extending this memory to longer horizons degrades tracking accuracy due to temporal feature inconsistency — cosine similarity between a target's LiDAR features decreases as temporal distance grows, causing mismatches when attending to distant frames. Furthermore, point-level storage grows proportionally with memory length, making long-term extension impractical.

Solution. ChronoTrack replaces per-point feature storage with a compact set of learnable foreground memory tokens that are recurrently updated at each timestep. These tokens serve as a compressed, temporally persistent summary of the target's appearance. Short-term background context (single previous frame) is maintained separately to provide local scene contrast. Two novel training objectives ensure the tokens remain informative over long horizons:

Temporal Consistency Loss (L_TC). Foreground points from different frames are transformed into a shared canonical coordinate system (center subtracted, rotation applied). Nearest-neighbor pairs are found across frames and filtered by a distance threshold; SmoothL1 loss is applied to the corresponding feature pairs. This encourages part-level feature consistency across time, directly combating temporal feature drift.
Memory Cycle Consistency Loss (L_MCC). A two-step cyclic walk — memory tokens → scene points → memory tokens — is performed via softmax cosine similarity. A cycle-consistency cross-entropy term penalizes tokens that fail to return to themselves, while a foreground affinity term encourages tokens to specialize in target-relevant regions. Together these losses push the tokens to encode diverse, target-specific semantics.

Installation

Option 1: Docker (Recommended)

We provide a pre-built Docker image with all dependencies installed.

docker pull jaejoonyoo/3dsot:chronotrack

Option 2: Manual Setup

The codebase is tested with the following environment:

Package	Version
Python	3.10.14
PyTorch	2.3.1
Torchvision	0.18.1
CUDA	11.8
PyTorch3D	0.7.8
Lightning	2.4.0

# 1. Clone repository
git clone https://github.com/ujaejoon/ChronoTrack.git
cd ChronoTrack

# 2. Create conda environment
conda create -n chronotrack python=3.10
conda activate chronotrack

# 3. Install PyTorch (with CUDA 11.8)
pip install torch==2.3.1 torchvision==0.18.1 --index-url https://download.pytorch.org/whl/cu118

# 4. Install NumPy (must be pinned before other packages to avoid numpy 2.x conflicts)
pip install numpy==1.24.0

# 5. Install PyTorch3D (built from source; requires a C++ compiler and CUDA toolkit)
pip install "git+https://github.com/facebookresearch/pytorch3d.git@V0.7.8"

# 6. Install remaining dependencies
pip install lightning==2.4.0 addict==2.4.0 pyquaternion==0.9.9
pip install nuscenes-devkit==1.1.11 tensorboard==2.12.3
pip install pyyaml==6.0.1 pandas==1.5.3 tqdm==4.66.4

Data Preparation

KITTI

Download the data for velodyne, calib and label_02 from KITTI Tracking.

Unzip the downloaded files and organize as follows:

[Parent Folder]
--> [calib]
    --> {0000-0020}.txt
--> [label_02]
    --> {0000-0020}.txt
--> [velodyne]
    --> [0000-0020] folders with velodyne .bin files

NuScenes

Download the dataset from the download page.

Extract the downloaded files and make sure you have the following structure:

[Parent Folder]
  samples   -   Sensor data for keyframes.
  sweeps    -   Sensor data for intermediate frames.
  maps      -   Folder for all map files: rasterized .png images and vectorized .json files.
  v1.0-*    -   JSON tables that include all the meta data and annotations. Each split (trainval, test, mini) is provided in a separate folder.

Note: We use the train_track split to train our model and test it with the val split. Both splits are officially provided by NuScenes. During testing, we ignore the sequences where there is no point in the first given bbox.

Waymo

We follow the benchmark created by LiDAR-SOT based on the Waymo Open Dataset. You can download and process the Waymo dataset as guided by LiDAR_SOT, and use our code to test model performance on this benchmark.

The following processing results are necessary:

[waymo_sot]
    [benchmark]
        [validation]
            [vehicle]
                bench_list.json
                easy.json
                medium.json
                hard.json
            [pedestrian]
                bench_list.json
                easy.json
                medium.json
                hard.json
    [pc]
        [raw_pc]
            Here are some segment.npz files containing raw point cloud data
    [gt_info]
        Here are some segment.npz files containing tracklet and bbox data

Update data_root_dir in the corresponding config files to point to your data paths.

Training

All models were trained using 2x NVIDIA RTX 4090 or RTX 3090 GPUs.

# KITTI - Car
python main.py configs/kitti/chronotrack_kitti_car.yaml --gpus 0 1 --phase train

# NuScenes - Car
python main.py configs/nuscenes/chronotrack_nuscenes_car.yaml --gpus 0 1 --phase train

Arguments:

Argument	Description	Default
`config`	Path to config file	required
`--gpus`	GPU device indices	required
`--phase`	`train` or `test`	`train`
`--workspace PATH`	Directory to save checkpoints and logs	`./workspace`
`--run_name NAME`	Name for this run (used as subdirectory under workspace)	config filename
`--resume_from PATH`	Path to checkpoint file to resume from	`None`
`--seed SEED`	Random seed	`None`
`--debug`	Debug mode (10 epochs, batch size 2/GPU, no checkpoint saved)	`False`

Evaluation

# KITTI - Car
python main.py configs/kitti/chronotrack_kitti_car.yaml --phase test --gpus 0 \
    --resume_from /path/to/checkpoint.ckpt

# NuScenes - Car
python main.py configs/nuscenes/chronotrack_nuscenes_car.yaml --phase test --gpus 0 \
    --resume_from /path/to/checkpoint.ckpt

# Waymo - Vehicle
python main.py configs/waymo/chronotrack_waymo_vehicle.yaml --phase test --gpus 0 \
    --resume_from /path/to/checkpoint.ckpt

Model Zoo

TODO: Upload checkpoint files and add download links.

Dataset	Category	Config	Checkpoint
KITTI	Car	chronotrack_kitti_car.yaml	TBD
KITTI	Pedestrian	chronotrack_kitti_ped.yaml	TBD
KITTI	Van	chronotrack_kitti_van.yaml	TBD
KITTI	Cyclist	chronotrack_kitti_cyc.yaml	TBD
NuScenes	Car	chronotrack_nuscenes_car.yaml	TBD
NuScenes	Pedestrian	chronotrack_nuscenes_ped.yaml	TBD
NuScenes	Truck	chronotrack_nuscenes_truck.yaml	TBD
NuScenes	Bus	chronotrack_nuscenes_bus.yaml	TBD
NuScenes	Trailer	chronotrack_nuscenes_trailer.yaml	TBD
Waymo	Vehicle	chronotrack_waymo_vehicle.yaml	TBD
Waymo	Pedestrian	chronotrack_waymo_pedestrian.yaml	TBD

Citation

@misc{yoo2026temporallyconsistentlongtermmemory,
      title={Temporally Consistent Long-Term Memory for 3D Single Object Tracking},
      author={Jaejoon Yoo and SuBeen Lee and Yerim Jeon and Miso Lee and Jae-Pil Heo},
      year={2026},
      eprint={2604.13789},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2604.13789},
}

Acknowledgements

This codebase is built upon MBPTrack. We thank the authors for their open-source contribution.