IGEV++ (TPAMI 2025)
June 3, 2025 ยท View on GitHub
Our significant extension version of IGEV, named IGEV++, is available at Paper, Code
IGEV-Stereo (CVPR 2023)
This repository contains the source code for our paper:
Iterative Geometry Encoding Volume for Stereo Matching
Gangwei Xu, Xianqi Wang, Xiaohuan Ding, Xin Yang
๐ข News
2024-12-30: We add bfloat16 training to prevent potential NAN issues during the training process.
Demos
Pretrained models can be downloaded from google drive
We assume the downloaded pretrained weights are located under the pretrained_models directory.
You can demo a trained model on pairs of images. To predict stereo for Middlebury, run
python demo_imgs.py \
--restore_ckpt pretrained_models/sceneflow/sceneflow.pth \
-l=path/to/your/left_imgs \
-r=path/to/your/right_imgs
or you can demo a trained model pairs of images for a video, run:
python demo_video.py \
--restore_ckpt pretrained_models/sceneflow/sceneflow.pth \
-l=path/to/your/left_imgs \
-r=path/to/your/right_imgs
To save the disparity values as .npy files, run any of the demos with the --save_numpy flag.
Comparison with RAFT-Stereo
| Method | KITTI 2012 (3-noc) | KITTI 2015 (D1-all) | Memory (G) | Runtime (s) |
|---|---|---|---|---|
| RAFT-Stereo | 1.30 % | 1.82 % | 1.02 | 0.38 |
| IGEV-Stereo | 1.12 % | 1.59 % | 0.66 | 0.18 |
Environment
- NVIDIA RTX 3090
- Python 3.8
Create a virtual environment and activate it.
conda create -n IGEV python=3.8
conda activate IGEV
Dependencies
bash env.sh
Alternatively, you can install a higher version of PyTorch that supports bfloat16 training.
bash env_bfloat16.sh
Required Data
To evaluate/train IGEV-Stereo, you will need to download the required datasets.
- Scene Flow (Includes FlyingThings3D, Driving & Monkaa)
- KITTI
- Middlebury
- ETH3D
By default core/stereo_datasets.py will search for the datasets in these locations.
โโโ /data
โโโ sceneflow
โโโ frames_finalpass
โโโ TRAIN
โโโ A
โโโ ...
โโโ 15mm_focallength
โโโ ...
โโโ funnyworld_augmented0_x2
โโโ ...
โโโ TEST
โโโ disparity
โโโ KITTI
โโโ KITTI_2012
โโโ training
โโโ testing
โโโ vkitti
โโโ KITTI_2015
โโโ training
โโโ testing
โโโ vkitti
โโโ Middlebury
โโโ trainingH
โโโ trainingH_GT
โโโ ETH3D
โโโ two_view_training
โโโ two_view_training_gt
โโโ DTU_data
โโโ dtu_train
โโโ dtu_test
You should replace the default path with your own.
Evaluation
To evaluate on Scene Flow or Middlebury or ETH3D, run
python evaluate_stereo.py --restore_ckpt ./pretrained_models/sceneflow/sceneflow.pth --dataset sceneflow
or
python evaluate_stereo.py --restore_ckpt ./pretrained_models/sceneflow/sceneflow.pth --dataset middlebury_H
or
python evaluate_stereo.py --restore_ckpt ./pretrained_models/sceneflow/sceneflow.pth --dataset eth3d
Training
To train on Scene Flow, run
python train_stereo.py --logdir ./checkpoints/sceneflow
To train on KITTI, run
python train_stereo.py --logdir ./checkpoints/kitti --restore_ckpt ./pretrained_models/sceneflow/sceneflow.pth --train_datasets kitti
Bfloat16 Training
NaN values during training: If you encounter NaN values in your training, this is likely due to overflow when using float16. This can happen when large gradients or high activation values exceed the range represented by float16. To fix this:
-Try switching to bfloat16 by using --precision_dtype bfloat16.
-Alternatively, you can use float32 precision by setting --precision_dtype float32.
Training with bfloat16
- Before you start training, make sure you have hardware that supports bfloat16 and the right environment set up for mixed precision training.
bash env_bfloat16.sh
- Then you can train the model with bfloat16 precision:
python train_stereo.py --mixed_precision --precision_dtype bfloat16
Submission
For submission to the KITTI benchmark, run
python save_disp.py
MVS training and evaluation
To train on DTU, run
python train_mvs.py
To evaluate on DTU, run
python evaluate_mvs.py
Citation
If you find our work useful in your research, please consider citing our paper:
@inproceedings{xu2023iterative,
title={Iterative Geometry Encoding Volume for Stereo Matching},
author={Xu, Gangwei and Wang, Xianqi and Ding, Xiaohuan and Yang, Xin},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={21919--21928},
year={2023}
}
@article{xu2025igev++,
title={Igev++: Iterative multi-range geometry encoding volumes for stereo matching},
author={Xu, Gangwei and Wang, Xianqi and Zhang, Zhaoxing and Cheng, Junda and Liao, Chunyuan and Yang, Xin},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
year={2025},
publisher={IEEE}
}
Acknowledgements
This project is based on RAFT-Stereo, and CoEx. We thank the original authors for their excellent works.