README.md

March 2, 2026 · View on GitHub

tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction

📦 Installation

python3.10 -m venv tttlrm
source tttlrm/bin/activate
# CAUTION: change it to your CUDA version
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu118 xformers
pip install -U setuptools wheel packaging ninja
## Install flash-attn (You can also install prebuild wheels at: https://github.com/mjun0812/flash-attention-prebuild-wheels)
pip install flash_attn==2.5.9.post1 --no-build-isolation
pip install -r requirements.txt

🤖 Pretrained Models

bash script/download_ckpts.sh

We use sp_size for sequence parallel, which denotes the number of GPUs used for one sequence. Input views and generated Gaussians will be evenly distributed to sp_size GPUs. sp_size should be divisible by total number of GPUs you use.

# For Full model
bash script/inference_dl3dv.sh
# For AR model (4 views per chunk, set by '-s model.miniupdate_views')
bash script/inference_dl3dv_ar.sh

We might not provide training code at this moment, but it can be easily done by combining LongLRM and our inference code (the sequence parallel part).

📂 Dataset

DL3DV Benchmark

Download DL3DV benchmark (i.e., test; not used in training) data at https://huggingface.co/datasets/DL3DV/DL3DV-Benchmark/tree/main using the following command:

python data/dl3dv_eval_download.py --odir ./data_example/dl3dv_benchmark --subset hash --only_level4 --hash 032dee9fb0a8bc1b90871dc5fe950080d0bcd3caf166447f44e60ca50ac04ec7

Use option --subset full to download all testing scenes. After downloading, run

python data/dl3dv_format_converter.py

to convert to our dataset format (OpenCV camera).

Custom COLMAP Data

We also support running inference on your own COLMAP reconstructions. First, convert your COLMAP output to our format:

python data/colmap_format_convert.py --source_dir /path/to/colmap_scene --output_dir ./data_example/colmap_processed/scene_name

The script auto-detects sparse/0 and images directories, handles lens undistortion, and generates opencv_cameras.json. It supports both binary and text COLMAP formats. Then run inference:

bash script/inference_colmap.sh

Since the provided model doesn't trained with multiple resolutions and intrinsics, so it might not work well on custom data.

@article{wang2026tttlrm,
    title   = {tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction},
    author  = {Chen Wang, Hao Tan, Wang Yifan, Zhiqin Chen, Yuheng Liu, Kalyan Sunkavalli, Sai Bi, Lingjie Liu, Yiwei Hu},
    journal = {CVPR},
    year    = {2026}
}

README.md

tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction

📦 Installation

🤖 Pretrained Models

⚡ Inference

📂 Dataset

DL3DV Benchmark

Custom COLMAP Data

🤝 Acknowledgements

⚖️ License

📜 Citation