install DEVO
August 11, 2024 · View on GitHub
Deep Event Visual Odometry
Simon Klenk1,2* Marvin Motzet1,2* Lukas Koestler1,2 Daniel Cremers1,2
*equal contribution
1Technical University of Munich (TUM) 2Munich Center for Machine Learning (MCML)
International Conference on 3D Vision (3DV) 2024, Davos, CH
Paper (arXiv) | Video | Poster | BibTeX
Abstract
Event cameras offer the exciting possibility of tracking the camera's pose during high-speed motion and in adverse lighting conditions. Despite this promise, existing event-based monocular visual odometry (VO) approaches demonstrate limited performance on recent benchmarks. To address this limitation, some methods resort to additional sensors such as IMUs, stereo event cameras, or frame-based cameras. Nonetheless, these additional sensors limit the application of event cameras in real-world devices since they increase cost and complicate system requirements. Moreover, relying on a frame-based camera makes the system susceptible to motion blur and HDR. To remove the dependency on additional sensors and to push the limits of using only a single event camera, we present Deep Event VO (DEVO), the first monocular event-only system with strong performance on a large number of real-world benchmarks. DEVO sparsely tracks selected event patches over time. A key component of DEVO is a novel deep patch selection mechanism tailored to event data. We significantly decrease the pose tracking error on seven real-world benchmarks by up to 97% compared to event-only methods and often surpass or are close to stereo or inertial methods.
Overview
During training, DEVO takes event voxel grids , inverse depths , and camera poses of a sequence of size as input. DEVO estimates poses and depths of the sequence. Our novel patch selection network predicts a score map to highlight optimal 2D coordinates for optical flow and pose estimation. A recurrent update operator iteratively refines the sparse patch-based optical flow between event grids by predicting and updates poses and depths through a differentiable bundle adjustment (DBA) layer, weighted by , for each revision. Ground truth optical flow for supervision is computed using poses and depth maps. At inference, DEVO samples from a multinomial distribution based on the pooled score map .
Setup
The code was tested on Ubuntu 22.04 and CUDA Toolkit 11.x. We use Anaconda to manage our Python environment.
First, clone the repo
git clone https://github.com/tum-vision/DEVO.git --recursive
cd DEVO
Then, create and activate the Anaconda environment
conda env create -f environment.yml
conda activate devo
Next, install the DEVO package
# download and unzip Eigen source code
wget https://gitlab.com/libeigen/eigen/-/archive/3.4.0/eigen-3.4.0.zip
unzip eigen-3.4.0.zip -d thirdparty
# install DEVO
pip install .
Only for Training
The following steps are only needed if you intend to (re)train DEVO. Please note, the training data have the size of about 1.1TB (rbg: 300GB, evs: 370GB).
Otherwise, skip it and go to here.
First, download all RGB images and depth maps of TartanAir from the left camera (~500GB) to <TARTANPATH>
python thirdparty/tartanair_tools/download_training.py --output-dir <TARTANPATH> --rgb --depth --only-left
Next, generate event voxel grids using vid2e.
python scripts/convert_tartan.py --dirsfile <path to .txt file>
dirsfile expects a .txt file containing line-separated paths to dirs with .png images (to generate events for these images).
Only for Evalution
We provide a pretrained model for our simulated event data.
# download model (~40MB)
./download_model.sh
Data Preprocessing
We evaluate DEVO on seven real-world event-based datasets (FPV, VECtor, HKU, EDS, RPG, MVSEC, TUM-VIE). We provide scripts for data preprocessing (undist, ...).
Check scripts/pp_DATASETNAME.py for the way to preprocess the original datasets. This will create the necessary files for you, e.g. rectify_map.h5, calib_undist.json and t_offset_us.txt.
Training
Make sure you have run the following steps. Your dataset directory structure should look as follows
├── <TARTANPATH>
├── abandonedfactory
├── abandonedfactory_night
├── ...
├── westerndesert
To train DEVO with the default configuration, run
python train.py -c="config/DEVO_base.conf" --name=<your name>
The log files will be written to runs/<your name>. Please, check train.py for more options.
Evaluation
Make sure you have run the following steps (downloading pretrained model, data and preprocessing data).
python evals/eval_evs/eval_DATASETNAME_evs.py --datapath=<DATASETPATH> --weights="DEVO.pth" --stride=1 --trials=1 --expname=<your name>
The qualitative and quantitative results will be written to results/DATASETNAME/<your name>. Check eval_rpg_evs.py for more options.
News
- Code and model are released.
- Code for simulation is released.
Citation
If you find our work useful, please cite our paper:
@inproceedings{klenk2023devo,
title = {Deep Event Visual Odometry},
author = {Klenk, Simon and Motzet, Marvin and Koestler, Lukas and Cremers, Daniel},
booktitle = {International Conference on 3D Vision, 3DV 2024, Davos, Switzerland,
March 18-21, 2024},
pages = {739--749},
publisher = {{IEEE}},
year = {2024},
}
Acknowledgments
We thank the authors of the following repositories for publicly releasing their work:
- DPVO
- TartanAir
- vid2e
- E2Calib
- rpg_trajectory_evaluation
- Event-based Vision for VO/VIO/SLAM in Robotics
This work was supported by the ERC Advanced Grant SIMULACRON.