README.md

March 24, 2026 · View on GitHub

SpatioTemporal Difference Network for Video Depth Super-Resolution (AAAI 2026 Oral)

Zhengxue Wang¹, Yuan Wu¹, Xiang Li², Zhiqiang Yan✉³, Jian Yang✉¹

^✉Corresponding author
¹Nanjing University of Science and Technology
²Nankai University    ³National University of Singapore

🎬 Video demo

LR	C2PD	DORNet
RGB	Ours	GT

Overview of STDNet. Given $\boldsymbol D_{LR}$ , we first predict its spatial difference representation $\boldsymbol \sigma$ . Then, $\boldsymbol D_{LR}$ , $\boldsymbol I$ , and $\boldsymbol \sigma$ are jointly fed into the spatial difference to enhance non-smooth regions, producing $\boldsymbol F_{sd}$ . Next, we estimate the temporal difference representations for consecutive frames and cross frames, generating $\boldsymbol \varphi$ and $\widehat{\boldsymbol \varphi}$ . These difference representations are used to propagate adjacent RGB and depth frames to the current depth frame, generating HR depth video $\boldsymbol D_{HR}$ . Finally, a degradation regularization takes $\boldsymbol D_{HR}$ , $\boldsymbol D_{GT}$ , $\boldsymbol \sigma$ , $\boldsymbol \varphi$ , and $\widehat{\boldsymbol \varphi}$ as inputs to optimize the learning of spatiotemporal difference representations.

:hammer: Dependencies

Please refer to 'env.yaml'.

💾 Models

All pretrained models can be found here.

📥Datasets

All datasets can be downloaded from the following link:

TarTanAir

DyDToF

DynamicReplica

Additionally, we provide a DyDToF test subset in the 'dataset' folder for quick implementation, with the corresponding index file is 'data/dydtof_list/school_shot8_subset.txt'.

🏋️ Training

cd STDNet
mkdir -p experiment/SRDNet_$scale$/MAE_best

python -m torch.distributed.launch --nproc_per_node 2 train.py --scale 4 --result_root 'experiment/SRDNet_$scale$' --result_root_MAE 'experiment/SRDNet_$scale$/MAE_best'

⚡Testing

### TarTanAir dataset
python test_TarTanAir.py --scale 4
### DyDToF dataset
python test_DyDToF.py --scale 4
### DyDToF dataset
python test_DynamicReplica.py --scale 4

📊Experiments

Quantitative comparisons between our STDNet and previous state-of-the-art methods on TarTanAir dataset.

📝 Citation

If our method proves to be of any assistance, please consider citing:

@inproceedings{wang2026spatiotemporal,
  title={Spatiotemporal difference network for video depth super-resolution},
  author={Wang, Zhengxue and Wu, Yuan and Li, Xiang and Yan, Zhiqiang and Yang, Jian},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={40},
  number={12},
  pages={10403--10411},
  year={2026}
}