README.md
November 3, 2025 ยท View on GitHub
๐ About
Trans-SVSR: Transformer-based Stereo Video Super-Resolution
This repository contains the implementation of Trans-SVSR, a CVPR-published stereo video super-resolution framework that reconstructs temporally consistent high-resolution frames from stereo video pairs.
๐น Key Features
- ๐ง CVPR-proven architecture โ Transformer-based stereo SR with multi-frame fusion and cross-view attention
- โ๏ธ Edge-ready deployment โ Export to ONNX and TensorRT (FP16/INT8) for Jetson and RTX devices
- ๐ Optimized for performance โ Low-latency inference, GPU benchmarking, and memory profiling
- ๐ก Industrial focus โ Suitable for robotics, 3D perception, and real-time vision enhancement
๐น Research Background
This work extends state-of-the-art research in stereo image/video super-resolution, incorporating:
- Transformer-based spatio-temporal modeling
- Optical-flow-guided feature alignment
- Multi-stage refinement and perceptual quality enhancement
Originally developed as part of a CVPR publication, the project bridges academic excellence with practical edge-AI deployment.
๐น Edge Deployment Pipeline
The repository now includes a full edge pipeline:
export_onnx.pyโ Export trained PyTorch model โ ONNXbenchmark_ort.pyโ Benchmark ONNX GPU inference (latency, FPS, VRAM)
See the Edge Deployment section below for detailed usage.
Keywords: Stereo Vision ยท Super-Resolution ยท Transformers ยท Edge AI ยท ONNX ยท TensorRT
Table of Contents
- 1. Environment
- 2. Data Preparation
- 3. Training
- 4. Testing / Evaluation
- 5. Edge / Production Inference
- 6. Citation

1. Environment
# conda (recommended)
conda create -n transsvsr python=3.10 -y
conda activate transsvsr
pip install -r requirements.txt
# (Optional) TensorRT / ONNXRuntime for edge export is described in Section 5
2. Data Preparation
- Download videos for training: http://shorturl.at/mpwGX
- Put the raw videos in:
data/raw_train/
โโ vid_0001.mp4
โโ vid_0002.mp4
โโ vid_0003.mp4
โโ ...
- Create training patches (x4)
python3 create_train_dataset.py
After this, patches are generated at:
data/train/patches_x4/
โโ sample_000001/
โ โโ l_000.png l_001.png l_002.png l_003.png l_004.png
โ โโ r_000.png r_001.png r_002.png r_003.png r_004.png
โโ sample_000002/
โโ ...
Each patch folder contains 5 left and 5 right patches (temporal clip). Adjust clip length / stride in
create_train_dataset.pyif needed.
3. Training
python3 train.py --scale_factor 4 --device cuda:0 --batch_size 7 --lr 2e-3 --gamma 0.5 --start_epoch 0 --n_epochs 30 --n_steps 30 --trainset_dir ./data/train/ --model_name TransSVSR --load_pretrain False --model_path log/TransSVSR.pth.tar
4. Testing / Evaluation
First create the testing dataset.
Creating the test set:
Put the downloaded test videos in the following path:
data/raw_test/
For SVSR-Set dataset, run the following command:
python3 create_test_dataset_SVSRset.py
For NAMA3D and LFO3D datasets, run the following command:
python3 create_test_dataset_nama_lfo.py
Please change the path accorfing to NAMA3D or LFO3D datasets. Nama3D [1] and LFO3D [2] need to be downloaded from their references and put in the /data/raw_test/ directory first.
# Single stereo sequence
python3 test.py \
--model_name TransSVSR_4xSR \
--testset_dir ./data/test/ \
5. Edge Deployment (ONNX / TensorRT)
This repository supports exporting the Trans-SVSR model for efficient deployment on edge devices (e.g., Jetson, RTX laptops, or other embedded GPUs).
Folder layout (added):
root/
export_onnx.py # PyTorch โ ONNX
build_trt.py # ONNX โ TensorRT (FP16/INT8)
5.1 Export to ONNX
The model can be exported from its PyTorch checkpoint (.pth.tar) to ONNX format:
python3 export_onnx.py \
--ckpt log/TransSVSR_4xSR.pth.tar \
--onnx outputs/transsvsr_x4/model_static.onnx \
--height 540 --width 960 --frames 5 --channels 3 \
--scale 4 --opset 14 --device cuda
Output: outputs/transsvsr_x4/model_static.onnx
This file can be used for inference with ONNX Runtime or for conversion to TensorRT.
5.2 Build TensorRT Engine (FP16 or INT8)
Once the ONNX file is created, build a TensorRT engine for deployment:
# FP16 engine
python3 build_trt.py \
--onnx outputs/transsvsr_x4/model_static.onnx \
--engine outputs/transsvsr_x4/model_fp16.engine \
--fp16 \
--min_T 5 --opt_T 5 --max_T 5 \
--min_H 540 --opt_H 540 --max_H 540 \
--min_W 960 --opt_W 960 --max_W 960
Output: outputs/transsvsr_x4/model_fp16.engine
# INT8
To build an INT8 engine, create a folder calib_samples/ containing .npy batches:
calib_samples/
left_000.npy
right_000.npy
left_001.npy
right_001.npy
...
Then run:
python3 build_trt.py \
--onnx outputs/transsvsr_x4/model_static.onnx \
--engine outputs/transsvsr_x4/model_int8.engine \
--int8 --calib_dir calib_samples/ \
--opt_T 5 --opt_H 540 --opt_W 960
5.4 Jetson Notes (power & clocks)
# Set performance mode
sudo nvpmodel -m 0
sudo jetson_clocks
# Power logging (1 Hz)
tegrastats --interval 1000 > tegrastats.log
๐ก Notes
The exported ONNX is fully compatible with TensorRT 8.xโ9.x and ONNX Runtime GPU.
Default input shape: (B=1, C=3, T=5, H=540, W=960)
Dynamic axes can be enabled with --dynamic, but static shapes are faster and more stable on edge GPUs.
FP16 mode offers best trade-off between accuracy and speed for Jetson/RTX devices.
6. Citation

If you use this repository, please cite our paper:
@inproceedings{imani2022new,
title={A new dataset and transformer for stereoscopic video super-resolution},
author={Imani, Hassan and Islam, Md Baharul and Wong, Lai-Kuan},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={706--715},
year={2022}
}