ASRF : Video Action Segmentation Model

February 22, 2022 · View on GitHub

简体中文 | English

ASRF : Video Action Segmentation Model


Contents

Introduction

ASRF model is an improvement on the video motion segmentation model ms-tcn, which was published on WACV in 2021. We reproduce the officially implemented pytorch code and obtain approximate results in paddlevideo.


MS-TCN Overview

Data

ASRF can choose 50salads, breakfast, gtea as trianing set. Please refer to Video Action Segmentation dataset download and preparation doc Video Action Segmentation dataset

Unlike MS-TCN, ASRF model requires additional data construction. The script process is as follows

python data/50salads/prepare_asrf_data.py --dataset_dir data/

Train

After prepare dataset, we can run sprits.

# gtea dataset
export CUDA_VISIBLE_DEVICES=3
python3.7 main.py  --validate -c configs/segmentation/asrf/asrf_gtea.yaml
  • Start the training by using the above command line or script program. There is no need to use the pre training model. The video action segmentation model is usually a full convolution network. Due to the different lengths of videos, the DATASET.batch_size of the video action segmentation model is usually set to 1, that is, batch training is not required. At present, only single sample training is supported.

Test

Test MS-TCN on dataset scripts:

python main.py  --test -c configs/segmentation/asrf/asrf_gtea.yaml --weights=./output/ASRF/ASRF_split_1.pdparams
  • The specific implementation of the index is to calculate ACC, edit and F1 scores by referring to the test scriptevel.py provided by the author of ms-tcn.

The reproduction of pytorch comes from the official code base

  • The evaluation method of data set adopts the folding verification method in ms-tcn paper, and the division method of folding is the same as that in ms-tcn paper.

Accuracy on Breakfast dataset(4 folding verification):

ModelAccEditF1@0.1F1@0.25F1@0.5
paper67.6%72.4%74.3%68.9%56.1%
pytorch65.8%71.0%72.3%66.5%54.9%
paddle66.1%71.9%73.3%67.9%55.7%

Accuracy on 50salads dataset(5 folding verification):

ModelAccEditF1@0.1F1@0.25F1@0.5
paper84.5%79.3%82.9%83.5%77.3%
pytorch81.4%75.6%82.7%81.2%77.2%
paddle81.6%75.8%83.0%81.5%74.8%

Accuracy on gtea dataset(4 folding verification):

ModelAccEditF1@0.1F1@0.25F1@0.5
paper77.3%83.7%89.4%87.8%79.8%
pytorch76.3%79.6%87.3%85.8%74.9%
paddle77.1%83.3%88.9%87.5%79.1%

Model weight for gtea

Test_DataF1@0.5checkpoints
gtea_split172.4409ASRF_gtea_split_1.pdparams
gtea_split276.6666ASRF_gtea_split_2.pdparams
gtea_split384.5528ASRF_gtea_split_3.pdparams
gtea_split482.6771ASRF_gtea_split_4.pdparams

Infer

export inference model

python3.7 tools/export_model.py -c configs/segmentation/asrf/asrf_gtea.yaml \
                                -p data/ASRF_gtea_split_1.pdparams \
                                -o inference/ASRF

To get model architecture file ASRF.pdmodel and parameters file ASRF.pdiparams, use:

infer

Input file are the file list for infering, for example:

S1_Cheese_C1.npy
S1_CofHoney_C1.npy
S1_Coffee_C1.npy
S1_Hotdog_C1.npy
...
python3.7 tools/predict.py --input_file data/gtea/splits/test.split1.bundle \
                           --config configs/segmentation/asrf/asrf_gtea.yaml \
                           --model_file inference/ASRF/ASRF.pdmodel \
                           --params_file inference/ASRF/ASRF.pdiparams \
                           --use_gpu=True \
                           --use_tensorrt=False

example of logs:

result write in : ./inference/infer_results/S1_Cheese_C1.txt
result write in : ./inference/infer_results/S1_CofHoney_C1.txt
result write in : ./inference/infer_results/S1_Coffee_C1.txt
result write in : ./inference/infer_results/S1_Hotdog_C1.txt
result write in : ./inference/infer_results/S1_Pealate_C1.txt
result write in : ./inference/infer_results/S1_Peanut_C1.txt
result write in : ./inference/infer_results/S1_Tea_C1.txt

Reference