README.md

March 24, 2026 · View on GitHub

SpatioTemporal Difference Network for Video Depth Super-Resolution (AAAI 2026 Oral)

arXiv Paper

Zhengxue Wang1, Yuan Wu1, Xiang Li2, Zhiqiang Yan✉3, Jian Yang✉1

Corresponding author   
1Nanjing University of Science and Technology   
2Nankai University    3National University of Singapore   

🎬 Video demo


LR

C2PD

DORNet

RGB

Ours

GT

:mega: Pipeline

Overview of STDNet. Given DLR\boldsymbol D_{LR}, we first predict its spatial difference representation σ\boldsymbol \sigma. Then, DLR\boldsymbol D_{LR}, I\boldsymbol I, and σ\boldsymbol \sigma are jointly fed into the spatial difference to enhance non-smooth regions, producing Fsd\boldsymbol F_{sd}. Next, we estimate the temporal difference representations for consecutive frames and cross frames, generating φ\boldsymbol \varphi and φ^\widehat{\boldsymbol \varphi}. These difference representations are used to propagate adjacent RGB and depth frames to the current depth frame, generating HR depth video DHR\boldsymbol D_{HR}. Finally, a degradation regularization takes DHR\boldsymbol D_{HR}, DGT\boldsymbol D_{GT}, σ\boldsymbol \sigma, φ\boldsymbol \varphi, and φ^\widehat{\boldsymbol \varphi} as inputs to optimize the learning of spatiotemporal difference representations.

:hammer: Dependencies

Please refer to 'env.yaml'.

💾 Models

All pretrained models can be found here.

📥Datasets

All datasets can be downloaded from the following link:

TarTanAir

DyDToF

DynamicReplica

Additionally, we provide a DyDToF test subset in the 'dataset' folder for quick implementation, with the corresponding index file is 'data/dydtof_list/school_shot8_subset.txt'.

🏋️ Training

cd STDNet
mkdir -p experiment/SRDNet_$scale$/MAE_best

python -m torch.distributed.launch --nproc_per_node 2 train.py --scale 4 --result_root 'experiment/SRDNet_$scale$' --result_root_MAE 'experiment/SRDNet_$scale$/MAE_best'

⚡Testing

### TarTanAir dataset
python test_TarTanAir.py --scale 4
### DyDToF dataset
python test_DyDToF.py --scale 4
### DyDToF dataset
python test_DynamicReplica.py --scale 4

📊Experiments


Quantitative comparisons between our STDNet and previous state-of-the-art methods on TarTanAir dataset.

📝 Citation

If our method proves to be of any assistance, please consider citing:

@inproceedings{wang2026spatiotemporal,
  title={Spatiotemporal difference network for video depth super-resolution},
  author={Wang, Zhengxue and Wu, Yuan and Li, Xiang and Yan, Zhiqiang and Yang, Jian},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={40},
  number={12},
  pages={10403--10411},
  year={2026}
}