PowerVQE: An Open Framework for Quality Enhancement of Compressed Videos

January 15, 2024 ยท View on GitHub

:rocket: Note: Please refer to PowerQE for improved function. This repository is archived.

0. Content

1. Introduction

We implement some widely-used quality enhancement approaches for compressed videos based on the powerful MMEditing project. These approaches are commonly used for comparison in this field are as follows:

  • STDF (AAAI 2020): Enhancing compressed videos with feature-wise deformable convolutions, instead of frame-wise motion estimation and compensation.
  • MFQEv2 (TPAMI 2019): Enhancing frames in compressed videos by taking advantage of neighboring good-quality frames.
  • DCAD (DCC 2017): The first approach to post-process the decoded videos with a deep-learning-based method. It also sets up good experimental settings.
  • DnCNN (TIP 2017): Widely used approach for decompression and denoising.

We also implement some SR baseline models for quality enhancement as follows:

Furthermore, we incorporate some image-oriented models into PowerVQE, which are detailed in this section:

2. Performance

[TensorBoard] [Pre-trained models]

RGB-PSNR results on the test set of the LDVv2 dataset for the NTIRE 2022 video quality enhancement challenge are as follows:

[Table Here]
IndexVideo nameWidthHeightFramesFrame rateLQDCADDnCNNSTDFMFQEv2EDVRBasicVSR++
10019605362502532.43332.87232.90132.97133.03933.16333.342
20029605363003030.18230.58730.62430.71930.82430.97831.686
30039605363003029.04729.67029.70029.67829.98129.89030.488
40049605362502534.31335.10535.20935.32435.44835.77436.196
50059605366006030.49230.91930.95331.04831.10831.33531.693
60069605363003028.99429.45029.50329.48429.58829.78530.117
70079605362402428.84529.35929.37329.47229.58129.69629.865
80089605042402431.79832.64032.70632.83132.83533.11933.389
90099605362502529.85730.62830.65630.75130.84230.99931.140
100109605046006034.64635.53135.59835.62135.59135.66635.634
110119605366006026.30026.66226.73526.71826.66626.80426.926
120129605366006022.86723.33423.32423.46723.43723.31923.391
130139605363003028.42428.78728.79328.88629.22229.19029.805
140149605363003031.21431.82031.85331.90932.02232.20932.451
150159605363003029.03229.49529.52729.49729.60129.68130.461
Ave.29.89630.45730.49730.55830.65230.77431.106
Delta PSNR0.5610.6010.6620.7560.8781.209

Y-PSNR results on the test set (QP=37) of the MFQEv2 dataset are as follows:

[Table Here]
IndexVideo NameWidthHeightFramesFrame rateLQDCADDnCNNSTDFSTDF-YMFQEv2EDVRBasicVSR++
1Kimono192010802402434.39734.66134.68634.73734.90835.14935.29935.849
2Park Scene192010802402431.62931.80031.80931.83632.00232.14632.13732.606
3Cactus192010805005032.48632.76532.77932.83533.02333.06933.17233.479
4BQ Terrace192010806006031.28931.65331.65631.67631.76431.76631.81232.055
5Basketball Drive192010805005033.38233.72333.75133.79633.92934.00434.14334.541
6Race Horses8324803003030.16130.47530.49630.49530.57430.71830.72331.060
7BQ Mall8324806006031.35331.77731.80331.84232.10032.14332.22532.704
8Party Scene8324805005027.92528.25928.27128.38728.55228.48028.49828.761
9Basketball Drill8324805005031.64632.14032.17132.23132.38432.34032.48232.694
10Race Horses4162403003029.37529.75029.76729.79229.94130.13030.13830.494
11BQ Square4162406006028.36528.94728.95729.15429.33829.23929.19829.474
12Blowing Bubbles4162405005027.87628.18828.20228.34228.50928.48628.47928.815
13Basketball Pass4162405005030.55130.96030.98631.06531.29931.36731.49631.860
14Four People12807206006034.67335.23135.26535.29735.51635.38735.57335.834
15Johnny12807206006036.41236.91736.96036.98337.21036.99037.22837.390
16Kristen And Sara12807206006035.94936.53036.55336.62036.87536.69536.86437.185
Ave.31.71732.11132.13232.19332.37032.38232.46732.800
Delta PSNR0.3940.4150.4760.6530.6650.7501.083

Note:

  1. For simplicity, all models except for the STDF-Y model are trained with RGB data. The Y-PSNR results are obtained from the RGB data.

  2. The STDF model trained on RGB data performs worse than the STDF-Y model. I guess the increasing channel number (Y->RGB) also increases the learning difficulty of DCN (which learns offset and mask for each channel separately).

3. Environment

PowerVQE depends on PyTorch, MMCV and some other packages. Here is my code:

git clone https://github.com/ryanxingql/powervqe.git --depth=1
cd powervqe/mmediting/

conda create -n powervqe python=3.8 pytorch=1.10 cudatoolkit=11.3 torchvision -c pytorch -y && \
conda activate powervqe

pip3 install openmim && mim install mmcv-full
pip3 install -e .

pip3 install scipy tqdm lmdb
#pip3 install setuptools==59.5.0

Layout:

powervqe/
|-- mmediting/
`-- toolbox_data/

4. Datasets

We provide the built FFmpeg 5.0.1 for converting MKV to PNG. You can download it on Baidu Pan.

cd <your-ffmpeg-dir>

# download ffmpeg-release-amd64-static.tar.xz

tar -xvf ffmpeg-release-amd64-static.tar.xz

You can also use your own FFmpeg.

4.1 LDVv2 Dataset

We use the LDVv2 dataset for training, validation, and testing. This dataset is also used for the NTIRE 2022 challenge on video quality enhancement.

The LDVv2 dataset includes 240 videos for training, 15 videos for validation, and 15 videos for testing. Each raw video is compressed by HM 16.20 with the LDP, QP=37 setting.

You can first download this dataset on Dropbox or Baidu Pan.

To convert MKV to PNG, the suggested commands are as follows:

cd <your-data-dir>

# download ldv_v2/

cd ldv_v2/
chmod +x ./run.sh

# suppose the ffmpeg is located at ldv_v2/../ffmpeg-5.0.1-amd64-static/ffmpeg
# then you should run:
#
#./run.sh ../
#
./run.sh <your-ffmpeg-dir>

If you want to train DCAD or DnCNN, you should select some frames from each video to form the training and validation sets. Here are the suggested commands:

cd toolbox_data/
python generate_data_dir_for_dcad.py -label train -src-dir <your-data-dir>/ldv_v2/
python generate_data_dir_for_dcad.py -label valid -src-dir <your-data-dir>/ldv_v2/

4.2 MFQEv2 Dataset

We use the MFQEv2 dataset for testing in addition to the LDVv2 test set.

The test set includes 16 videos. Each raw video is compressed by HM 16.20 with the LDP, QP=37 setting.

You can first download this dataset on Dropbox or Baidu Pan.

To convert MKV to PNG, the suggested commands are as follows:

cd <your-data-dir>

# download mfqe_v2/

cd mfqe_v2/
chmod +x ./run.sh

# suppose the ffmpeg is located at mfqe_v2/../ffmpeg-5.0.1-amd64-static/ffmpeg
# then you should run:
#./run.sh ../
./run.sh <your-ffmpeg-dir>

Note: The MFQEv2 dataset originally has 18 test videos. Among them, two videos with 2K resolution, i.e., PeopleOnStreet and Traffic, are abandoned, since most approaches cannot test them with a GPU with 16 GB memory.

cd mmediting/

# suppose your data dir is /mnt/usr/data
# then you should run:
#
#ln -s /mnt/usr/data ./
#
ln -s <your-data-dir> ./

Note that the <your-data-dir> should be an absolute path.

Layout:

powervqe/
`-- mmediting/
    `-- data/
        |-- ldv_v2/
        |   |-- train_gt/
        |   |   |-- 001/
        |   |   |   |-- f001.png
        |   |   |   `-- ...
        |   |   `-- ...
        |   |-- train_lq/
        |   |-- valid_gt/
        |   |-- valid_lq/
        |   |-- test_gt/
        |   `-- test_lq/
        `-- mfqe_v2
            |-- test_gt/
            |   |-- BasketballDrill_832x480_500/
            |   |   |-- f001.png
            |   |   `-- ...
            |   `-- ...
            `-- test_lq/

5. Training

The suggested commands are the same as those in MMEditing.

Change the data['train_dataloader']['samples_per_gpu'] in the config file according to your GPU number (It is strongly suggested to copy the config file and rename it according to your GPU number), then run:

cd mmediting/

chmod +x ./tools/dist_train.sh

# suppose your config file is located at ./configs/restorers/basicvsr_plusplus/ldv_v2_4gpus.py
# and the gpu number is 4
# then you should run:
#
#conda activate powervqe && \
#CUDA_VISIBLE_DEVICES=0,1,2,3 \
#PORT=29500 \
#./tools/dist_train.sh ./configs/restorers/basicvsr_plusplus/ldv_v2_4gpus.py 4
#
conda activate powervqe && \
CUDA_VISIBLE_DEVICES=0,1,2,3 \
PORT=29500 \
./tools/dist_train.sh <config-path> <gpu-number>

5.1 Special Case of the MFQEv2 models

To train the MFQEv2 models, you should first train the non-PQF model and then the PQF model:

cd mmediting/

# non-PQF
conda activate powervqe && \
CUDA_VISIBLE_DEVICES=0,1,2,3 \
PORT=29500 \
./tools/dist_train.sh ./configs/restorers/mfqev2/ldv_v2_non_pqf_4gpus.py 4

# PQF
conda activate powervqe && \
CUDA_VISIBLE_DEVICES=0,1,2,3 \
PORT=29500 \
./tools/dist_train.sh ./configs/restorers/mfqev2/ldv_v2_pqf_4gpus.py 4

6. Test

You can download the pre-trained models at the latest Releases.

The suggested commands are the same as those in MMEditing.

Change the data['test']['lq_folder'] and data['test']['gt_folder'] in the config file, then run:

cd mmediting/

chmod +x ./tools/dist_test.sh

# suppose:
# your config file is located at:
# ./configs/restorers/basicvsr_plusplus/ldv_v2_4gpus.py
# your pre-trained model is located at:
# ./work_dirs/basicvsrpp_ldv_v2/iter_500000.pth
# you want to use 4 gpus
# you want to save images at ./data/enhanced/basicvsrpp_ldv_v2/500k/ldv
# then you should run:
#
#conda activate powervqe && \
#CUDA_VISIBLE_DEVICES=0,1,2,3 \
#PORT=29500 \
#./tools/dist_test.sh \
#./configs/restorers/basicvsr_plusplus/ldv_v2_4gpus.py \
#./work_dirs/basicvsrpp_ldv_v2/iter_500000.pth \
#4 \
#--save-path ./data/enhanced/basicvsrpp_ldv_v2/500k/ldv_v2
#
conda activate powervqe && \
CUDA_VISIBLE_DEVICES=0,1,2,3 \
PORT=29510 \
./tools/dist_test.sh \
<config-path> \
<model-path> \
<gpu-number> \
--save-path <img-save-path>

6.1 Special Case of the MFQEv2 models

To test the MFQEv2 models, you should test the non-PQF and PQF models separately and save the enhanced frames to the same dir. Take testing over the LDVv2 dataset as an example:

# test non-PQFs over the LDVv2 dataset
conda activate powervqe && \
CUDA_VISIBLE_DEVICES=0,1,2,3 \
PORT=29500 \
./tools/dist_test.sh \
./configs/restorers/mfqev2/ldv_v2_non_pqf_4gpus.py \
./work_dirs/mfqev2_ldv_v2_non_pqf/iter_600000.pth \
4 \
--save-path ./data/enhanced/mfqev2_ldv_v2/600k/ldv_v2

# test PQFs over the LDVv2 dataset
conda activate powervqe && \
CUDA_VISIBLE_DEVICES=0,1,2,3 \
PORT=29500 \
./tools/dist_test.sh \
./configs/restorers/mfqev2/ldv_v2_pqf_4gpus.py \
./work_dirs/mfqev2_ldv_v2_pqf/iter_600000.pth \
4 \
--save-path ./data/enhanced/mfqev2_ldv_v2/600k/ldv_v2

6.2 Special Case of BasicVSR++ over the MFQEv2 Dataset

To test over the MFQEv2 dataset for BasicVSR++, a 32GB-memory GPU is needed. Besides, we use the following script:

cd mmediting/toolbox_test/

conda activate powervqe && \
python test.py -gpu 0 \
-inp-dir '../data/mfqe_v2/test_lq' \
-out-dir '../data/enhanced/basicvsrpp_ldv_v2/300k/mfqe_v2/' \
-config-path '../configs/restorers/basicvsr_plusplus/ldv_v2_4gpus.py' \
-model-path '../work_dirs/basicvsrpp_ldv_v2/iter_300000.pth'

6.3 Special Cases of DCAD and DnCNN

To test each video subfolder for DCAD or DnCNN, the demo pipeline is more recommended than the test pipeline. Take DCAD as an example:

cd mmediting/toolbox_test/

# test over the LDVv2 dataset
conda activate powervqe && \
python test.py -gpu 0 \
-inp-dir '../data/ldv_v2/test_lq' \
-out-dir '../data/enhanced/dcad_ldv_v2/500k/ldv_v2/' \
-config-path '../configs/restorers/dcad/ldv_v2_4gpus.py' \
-model-path '../work_dirs/dcad_ldv_v2/iter_500000.pth' \
-if-img

# test over the MFQEv2 dataset
conda activate powervqe && \
python test.py -gpu 0 \
-inp-dir '../data/mfqe_v2/test_lq' \
-out-dir '../data/enhanced/dcad_ldv_v2/500k/mfqe_v2/' \
-config-path '../configs/restorers/dcad/ldv_v2_4gpus.py' \
-model-path '../work_dirs/dcad_ldv_v2/iter_500000.pth' \
-if-img

6.4 PSNR Calculation

Finally, we can get the PSNR results. Take DCAD as an example:

cd toolbox_data/

# RGB-PSNR over the LDVv2 dataset

conda activate powervqe && \
python cal_rgb_psnr.py \
-gt-dir '../mmediting/data/ldv_v2/test_gt' \
-enh-dir '../mmediting/data/enhanced/dcad_ldv_v2/500k/ldv_v2' \
-ignored-frms '{"002":[0]}' \
-save-dir './log/dcad_ldv_v2/500k/ldv_v2'

# Y-PSNR over the MFQEv2 dataset

conda activate powervqe && \
python cal_y_psnr.py \
-gt-dir '../mmediting/data/mfqe_v2/test_gt' \
-enh-dir '../mmediting/data/enhanced/dcad_ldv_v2/500k/mfqe_v2' \
-save-dir './log/dcad_ldv_v2/500k/mfqe_v2' \
-order

Note: We ignore the PSNR of the first frame of video 002 in the LDVv2 dataset since it is a black frame and the PSNR is inf.

For the STDF-Y model:

conda activate powervqe && \
python cal_y_psnr_stdf_y.py \
-gt-dir '../mmediting/data/mfqe_v2/test_gt' \
-enh-dir '../mmediting/data/enhanced/stdf_y_ldv_v2/1m/mfqe_v2' \
-save-dir './log/stdf_y_ldv_v2/1m/mfqe_v2' \
-order

7. Q&A

7.1 Main Differences from the Original Papers

To improve the performance of DCAD,

  1. The training patch size is changed from 38 to 128.
  2. The LR is changed from 1 to 1e-4.
  3. The optimizer is changed from AdaDelta to Adam.

To improve the performance of DnCNN,

  1. The training patch size is changed from 40 to 128.
  2. The LR is changed from 0.1 to 1e-4.
  3. The optimizer is changed from SGD to Adam.
  4. Different from PowerQE, the batch normalization is turned on. It benefits the convergence of DnCNN.

To simplify the training of EDVR,

  1. The input frames are 4x downsampled by strided convolutions. Downsampling can result in lower GPU consumption and faster training speed. Besides, we can use an SR model for quality enhancement this way.
  2. The scheduler is changed from the multi-step CosineRestart to a single-step CosineRestart.

To simplify the training of MFQEv2,

  1. Instead of conducting PQF detection, we assume that PQFs are located at the first, fifth, ninth, and... frames.
  2. Instead of training a ME-MC subnet from scratch, we use a pre-trained SpyNet.

7.2 How to Use the Latest MMEditing

Here are some important files to run our codes. You can simply copy these files to the latest MMEditing repo.

  • mmediting/toolbox_test
  • mmediting/demo/restoration_video_demo_basicvsrpp.py
  • mmediting/configs/restorers/<your-interested-configs>.py
  • mmediting/mmedit/apis/restoration_video_inference.py
  • mmediting/mmedit/datasets/pipelines/augmentation.py
  • mmediting/mmedit/datasets/<your-interested-datasets>.py
  • mmediting/mmedit/models/backbones/sr_backbones/<your-interested-backbones>.py
  • mmediting/mmedit/models/restorers/<your-interested-restorers>.py

7.3 Support for Image Datasets

Prepare your image dataset. Take the DIV2K dataset as an example. Layout:

powervqe/
`-- mmediting/
    `-- data/
        `-- div2k/
            |-- train_hq/
            |   |-- 0001.png
            |   |-- ...
            |   `-- 0800.png
            |-- train_lq/
            |   |-- qp27
            |   |   |-- 0001.png
            |   |   |-- ...
            |   |   `-- 0800.png
            |   |-- qp32
            |   |-- qp37
            |   |-- qp42
            |   |-- qf20
            |   |   |-- 0001.jpg
            |   |   |-- ...
            |   |   `-- 0800.jpg
            |   |-- qf30
            |   |-- qf40
            |   `-- qf50
            |-- valid_hq/
            |   |-- 0801.png
            |   |-- ...
            |   `-- 0900.png
            `-- valid_lq/
                |-- qp27
                |   |-- 0801.png
                |   |-- ...
                |   `-- 0900.png
                |-- qp32
                |-- qp37
                |-- qp42
                |-- qf20
                |   |-- 0801.jpg
                |   |-- ...
                |   `-- 0900.jpg
                |-- qf30
                |-- qf40
                `-- qf50

Example config files are presented in mmediting/configs/ for some approaches as follows,

You can download the pre-trained models at the latest Releases.

Note that for simplicity, we first train the QP=37 and QF=50 models, and then fine-tune them to get other models.

7.4 Support for LMDB

We can use LMDB to accerlate the IO. Specifically, we can store training patches and test images (optional) into LMDB files.

Pros:

  • Fast IO speed.
  • We can combine a large number of image patches into a few big LMDB files.
  • Patches are prepared for training.
    • There is no need to randomly crop the patches during the training.
    • We can decide how to crop the patches (e.g., frames, patch size, cropping stride, etc.) in advance of the training.
  • All images (PNG, JPG, etc.) can be stored as PNG.

Cons:

  • We have to prepare the LMDB files with extra time, computation and storage.
  • Once the LMDB file is generated, the training patches cannot be changed.
  • The data pipeline should be modified for LMDB IO.

Take the DIV2K dataset as an example.

cd mmediting/

# train
conda activate powervqe && \
python tools/data/super-resolution/div2k/preprocess_div2k_dataset_powervqe.py \
--n-thread 16 \
--data-root ./data/div2k --data-type train \
--if-hq \
--if-lq --lqs qp27 qp32 qp37 qp42 qf20 qf30 qf40 qf50 \
--extract-patches --crop-size 128 --step 64 \
--make-lmdb

# valid
conda activate powervqe && \
python tools/data/super-resolution/div2k/preprocess_div2k_dataset_powervqe.py \
--n-thread 16 \
--data-root ./data/div2k --data-type valid \
--if-hq \
--if-lq --lqs qp27 qp32 qp37 qp42 qf20 qf30 qf40 qf50 \
--make-lmdb

After preparing the LMDB files, you should change the data path and pipeline in your config. Please refer to mmediting/configs/restorers/cbdnet/ for examples.

7.5 Use Pre-commit Hook to Polish Code

  1. Install pre-commit hook: pip install -U pre-commit
  2. Config pre-commit hook based on powervqe/.pre-commit-config.yaml: cd powervqe && pre-commit install
  3. Polish code before each commit and PR: cd powervqe && pre-commit run --all.

8. Licenses

We adopt Apache License 2.0. For other licenses, see MMEditing.

Enjoy this repo. Star it if you like it ^ ^