README.md
March 2, 2026 ยท View on GitHub
tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction
๐ฆ Installation
python3.10 -m venv tttlrm
source tttlrm/bin/activate
# CAUTION: change it to your CUDA version
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu118 xformers
pip install -U setuptools wheel packaging ninja
## Install flash-attn (You can also install prebuild wheels at: https://github.com/mjun0812/flash-attention-prebuild-wheels)
pip install flash_attn==2.5.9.post1 --no-build-isolation
pip install -r requirements.txt
๐ค Pretrained Models
bash script/download_ckpts.sh
โก Inference
We use sp_size for sequence parallel, which denotes the number of GPUs used for one sequence. Input views and generated Gaussians will be evenly distributed to sp_size GPUs. sp_size should be divisible by total number of GPUs you use.
# For Full model
bash script/inference_dl3dv.sh
# For AR model (4 views per chunk, set by '-s model.miniupdate_views')
bash script/inference_dl3dv_ar.sh
We might not provide training code at this moment, but it can be easily done by combining LongLRM and our inference code (the sequence parallel part).
๐ Dataset
DL3DV Benchmark
Download DL3DV benchmark (i.e., test; not used in training) data at https://huggingface.co/datasets/DL3DV/DL3DV-Benchmark/tree/main using the following command:
python data/dl3dv_eval_download.py --odir ./data_example/dl3dv_benchmark --subset hash --only_level4 --hash 032dee9fb0a8bc1b90871dc5fe950080d0bcd3caf166447f44e60ca50ac04ec7
Use option --subset full to download all testing scenes. After downloading, run
python data/dl3dv_format_converter.py
to convert to our dataset format (OpenCV camera).
Custom COLMAP Data
We also support running inference on your own COLMAP reconstructions. First, convert your COLMAP output to our format:
python data/colmap_format_convert.py --source_dir /path/to/colmap_scene --output_dir ./data_example/colmap_processed/scene_name
The script auto-detects sparse/0 and images directories, handles lens undistortion, and generates opencv_cameras.json. It supports both binary and text COLMAP formats. Then run inference:
bash script/inference_colmap.sh
Since the provided model doesn't trained with multiple resolutions and intrinsics, so it might not work well on custom data.
๐ค Acknowledgements
Our codebase is a replementation of the internal version. Performance is matched under the same model weights. The code is largely built upon open-source projects including LongLRM and LaCT. We thank the authors for their helpful code.
โ๏ธ License
The checkpoints are licensed under Adobe Research License.
๐ Citation
If you find this work helpful, please consider citing our paper:
@article{wang2026tttlrm,
title = {tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction},
author = {Chen Wang, Hao Tan, Wang Yifan, Zhiqin Chen, Yuheng Liu, Kalyan Sunkavalli, Sai Bi, Lingjie Liu, Yiwei Hu},
journal = {CVPR},
year = {2026}
}