README.md

March 2, 2026 ยท View on GitHub

tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction

arXiv Project Page Model

๐Ÿ“ฆ Installation

python3.10 -m venv tttlrm
source tttlrm/bin/activate
# CAUTION: change it to your CUDA version
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu118 xformers
pip install -U setuptools wheel packaging ninja
## Install flash-attn (You can also install prebuild wheels at: https://github.com/mjun0812/flash-attention-prebuild-wheels)
pip install flash_attn==2.5.9.post1 --no-build-isolation
pip install -r requirements.txt

๐Ÿค– Pretrained Models

bash script/download_ckpts.sh

โšก Inference

We use sp_size for sequence parallel, which denotes the number of GPUs used for one sequence. Input views and generated Gaussians will be evenly distributed to sp_size GPUs. sp_size should be divisible by total number of GPUs you use.

# For Full model
bash script/inference_dl3dv.sh
# For AR model (4 views per chunk, set by '-s model.miniupdate_views')
bash script/inference_dl3dv_ar.sh

We might not provide training code at this moment, but it can be easily done by combining LongLRM and our inference code (the sequence parallel part).

๐Ÿ“‚ Dataset

DL3DV Benchmark

Download DL3DV benchmark (i.e., test; not used in training) data at https://huggingface.co/datasets/DL3DV/DL3DV-Benchmark/tree/main using the following command:

python data/dl3dv_eval_download.py --odir ./data_example/dl3dv_benchmark --subset hash --only_level4 --hash 032dee9fb0a8bc1b90871dc5fe950080d0bcd3caf166447f44e60ca50ac04ec7

Use option --subset full to download all testing scenes. After downloading, run

python data/dl3dv_format_converter.py

to convert to our dataset format (OpenCV camera).

Custom COLMAP Data

We also support running inference on your own COLMAP reconstructions. First, convert your COLMAP output to our format:

python data/colmap_format_convert.py --source_dir /path/to/colmap_scene --output_dir ./data_example/colmap_processed/scene_name

The script auto-detects sparse/0 and images directories, handles lens undistortion, and generates opencv_cameras.json. It supports both binary and text COLMAP formats. Then run inference:

bash script/inference_colmap.sh

Since the provided model doesn't trained with multiple resolutions and intrinsics, so it might not work well on custom data.

๐Ÿค Acknowledgements

Our codebase is a replementation of the internal version. Performance is matched under the same model weights. The code is largely built upon open-source projects including LongLRM and LaCT. We thank the authors for their helpful code.

โš–๏ธ License

The checkpoints are licensed under Adobe Research License.

๐Ÿ“œ Citation

If you find this work helpful, please consider citing our paper:

@article{wang2026tttlrm,
    title   = {tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction},
    author  = {Chen Wang, Hao Tan, Wang Yifan, Zhiqin Chen, Yuheng Liu, Kalyan Sunkavalli, Sai Bi, Lingjie Liu, Yiwei Hu},
    journal = {CVPR},
    year    = {2026}
}