README.md

December 19, 2022 · View on GitHub

SCARF: Capturing and Animation of Body and Clothing from Monocular Video

cropped image, subject segmentation, clothing segmentation, SMPL-X estimation

This is the script to process video data for SCARF training.

Getting Started

Environment

SCARF needs input image, subject mask, clothing mask, and inital SMPL-X estimation for training. Specificly, we use

FasterRCNN to detect the subject and crop image
RobustVideoMatting to remove background
cloth-segmentation to segment clothing
PIXIE to estimate SMPL-X parameters

When using the processing script, it is necessary to agree to the terms of their licenses and properly cite them in your work.

Clone submodule repositories:

git submodule update --init --recursive

Download their needed data:

bash fetch_asset_data.sh

If the script failed, please check their websites and download the models manually.

process video data

Put your data list into ./lists/subject_list.txt, it can be video path or image folders.
Then run

python process_video.py --crop --ignore_existing

Processing time depends on the number of frames and the size of video, for mpiis-scarf video (with 400 frames and resolution 1028x1920), need around 12min.

Video Data

The script has been verified to work for datasets:

a. mpiis-scarf (recorded video for this paper)
b. People Snapshot Dataset (https://graphics.tu-bs.de/people-snapshot)
c. SelfRecon dataset (https://jby1993.github.io/SelfRecon/)
d. iPER dataset (https://svip-lab.github.io/dataset/iPER_dataset.html)

To get the optimal results for your customized video, it is recommended to capture the video using similar settings as the datasets mentioned above.

This means keeping the camera static, recording the subject with more views, and using uniform lighting. And better to have less than 1000 frames for training. For more information, please refer to the limitations section of SCARF.