Dynamic View Synthesis from Dynamic Monocular Video

April 22, 2022 ยท View on GitHub

arXiv

Project Website | Video | Paper

Dynamic View Synthesis from Dynamic Monocular Video
Chen Gao, Ayush Saraf, Johannes Kopf, Jia-Bin Huang
in ICCV 2021

Setup

The code is test with

  • Linux (tested on CentOS Linux release 7.4.1708)
  • Anaconda 3
  • Python 3.7.11
  • CUDA 10.1
  • 1 V100 GPU

To get started, please create the conda environment dnerf by running

conda create --name dnerf python=3.7
conda activate dnerf
conda install pytorch=1.6.0 torchvision=0.7.0 cudatoolkit=10.1 matplotlib tensorboard scipy opencv -c pytorch
pip install imageio scikit-image configargparse timm lpips

and install COLMAP manually. Then download MiDaS and RAFT weights

ROOT_PATH=/path/to/the/DynamicNeRF/folder
cd $ROOT_PATH
wget --no-check-certificate https://filebox.ece.vt.edu/~chengao/free-view-video/weights.zip
unzip weights.zip
rm weights.zip

Dynamic Scene Dataset

The Dynamic Scene Dataset is used to quantitatively evaluate our method. Please download the pre-processed data by running:

cd $ROOT_PATH
wget --no-check-certificate https://filebox.ece.vt.edu/~chengao/free-view-video/data.zip
unzip data.zip
rm data.zip

Training

You can train a model from scratch by running:

cd $ROOT_PATH/
python run_nerf.py --config configs/config_Balloon2.txt

Every 100k iterations, you should get videos like the following examples

The novel view-time synthesis results will be saved in $ROOT_PATH/logs/Balloon2_H270_DyNeRF/novelviewtime. novelviewtime

The reconstruction results will be saved in $ROOT_PATH/logs/Balloon2_H270_DyNeRF/testset. testset

The fix-view-change-time results will be saved in $ROOT_PATH/logs/Balloon2_H270_DyNeRF/testset_view000. testset_view000

The fix-time-change-view results will be saved in $ROOT_PATH/logs/Balloon2_H270_DyNeRF/testset_time000. testset_time000

Rendering from pre-trained models

We also provide pre-trained models. You can download them by running:

cd $ROOT_PATH/
wget --no-check-certificate https://filebox.ece.vt.edu/~chengao/free-view-video/logs.zip
unzip logs.zip
rm logs.zip

Then you can render the results directly by running:

python run_nerf.py --config configs/config_Balloon2.txt --render_only --ft_path $ROOT_PATH/logs/Balloon2_H270_DyNeRF_pretrain/300000.tar

Evaluating our method and others

Our goal is to make the evaluation as simple as possible for you. We have collected the fix-view-change-time results of the following methods:

NeRF
NeRF + t
Yoon et al.
Non-Rigid NeRF
NSFF
DynamicNeRF (ours)

Please download the results by running:

cd $ROOT_PATH/
wget --no-check-certificate https://filebox.ece.vt.edu/~chengao/free-view-video/results.zip
unzip results.zip
rm results.zip

Then you can calculate the PSNR/SSIM/LPIPS by running:

cd $ROOT_PATH/utils
python evaluation.py
PSNR / LPIPSJumpingSkatingTruckUmbrellaBalloon1Balloon2PlaygroundAverage
NeRF20.99 / 0.30523.67 / 0.31122.73 / 0.22921.29 / 0.44019.82 / 0.20524.37 / 0.09821.07 / 0.16521.99 / 0.250
NeRF + t18.04 / 0.45520.32 / 0.51218.33 / 0.38217.69 / 0.72818.54 / 0.27520.69 / 0.21614.68 / 0.42118.33 / 0.427
NR NeRF20.09 / 0.28723.95 / 0.22719.33 / 0.44619.63 / 0.42117.39 / 0.34822.41 / 0.21315.06 / 0.31719.69 / 0.323
NSFF24.65 / 0.15129.29 / 0.12925.96 / 0.16722.97 / 0.29521.96 / 0.21524.27 / 0.22221.22 / 0.21224.33 / 0.199
Ours24.68 / 0.09032.66 / 0.03528.56 / 0.08223.26 / 0.13722.36 / 0.10427.06 / 0.04924.15 / 0.08026.10 / 0.082

Please note:

  1. The numbers reported in the paper are calculated using TF code. The numbers here are calculated using this improved Pytorch version.
  2. In Yoon's results, the first frame and the last frame are missing. To compare with Yoon's results, we have to omit the first frame and the last frame. To do so, please uncomment line 72 and comment line 73 in evaluation.py.
  3. We obtain the results of NSFF and NR NeRF using the official implementation with default parameters.

Train a model on your sequence

  1. Set some paths
ROOT_PATH=/path/to/the/DynamicNeRF/folder
DATASET_NAME=name_of_the_video_without_extension
DATASET_PATH=$ROOT_PATH/data/$DATASET_NAME
  1. Prepare training images and background masks from a video.
cd $ROOT_PATH/utils
python generate_data.py --videopath /path/to/the/video
  1. Use COLMAP to obtain camera poses.
colmap feature_extractor \
--database_path $DATASET_PATH/database.db \
--image_path $DATASET_PATH/images_colmap \
--ImageReader.mask_path $DATASET_PATH/background_mask \
--ImageReader.single_camera 1

colmap exhaustive_matcher \
--database_path $DATASET_PATH/database.db

mkdir $DATASET_PATH/sparse
colmap mapper \
    --database_path $DATASET_PATH/database.db \
    --image_path $DATASET_PATH/images_colmap \
    --output_path $DATASET_PATH/sparse \
    --Mapper.num_threads 16 \
    --Mapper.init_min_tri_angle 4 \
    --Mapper.multiple_models 0 \
    --Mapper.extract_colors 0
  1. Save camera poses into the format that NeRF reads.
cd $ROOT_PATH/utils
python generate_pose.py --dataset_path $DATASET_PATH
  1. Estimate monocular depth.
cd $ROOT_PATH/utils
python generate_depth.py --dataset_path $DATASET_PATH --model $ROOT_PATH/weights/midas_v21-f6b98070.pt
  1. Predict optical flows.
cd $ROOT_PATH/utils
python generate_flow.py --dataset_path $DATASET_PATH --model $ROOT_PATH/weights/raft-things.pth
  1. Obtain motion mask (code adapted from NSFF).
cd $ROOT_PATH/utils
python generate_motion_mask.py --dataset_path $DATASET_PATH
  1. Train a model. Please change expname and datadir in configs/config.txt.
cd $ROOT_PATH/
python run_nerf.py --config configs/config.txt

Explanation of each parameter:

  • expname: experiment name
  • basedir: where to store ckpts and logs
  • datadir: input data directory
  • factor: downsample factor for the input images
  • N_rand: number of random rays per gradient step
  • N_samples: number of samples per ray
  • netwidth: channels per layer
  • use_viewdirs: whether enable view-dependency for StaticNeRF
  • use_viewdirsDyn: whether enable view-dependency for DynamicNeRF
  • raw_noise_std: std dev of noise added to regularize sigma_a output
  • no_ndc: do not use normalized device coordinates
  • lindisp: sampling linearly in disparity rather than depth
  • i_video: frequency of novel view-time synthesis video saving
  • i_testset: frequency of testset video saving
  • N_iters: number of training iterations
  • i_img: frequency of tensorboard image logging
  • DyNeRF_blending: whether use DynamicNeRF to predict blending weight
  • pretrain: whether pre-train StaticNeRF

License

This work is licensed under MIT License. See LICENSE for details.

If you find this code useful for your research, please consider citing the following paper:

@inproceedings{Gao-ICCV-DynNeRF,
    author    = {Gao, Chen and Saraf, Ayush and Kopf, Johannes and Huang, Jia-Bin},
    title     = {Dynamic View Synthesis from Dynamic Monocular Video},
    booktitle = {Proceedings of the IEEE International Conference on Computer Vision},
    year      = {2021}
}

Acknowledgments

Our training code is build upon NeRF, NeRF-pytorch, and NSFF. Our flow prediction code is modified from RAFT. Our depth prediction code is modified from MiDaS.