Model Training

February 14, 2023 · View on GitHub

English | 简体中文

Model Training

1、Start Training

We can train the model through the script provided by PaddleSeg. Here we use PP-LiteSeg model and optic_disc dataset to show the training process. Please make sure that you have already installed PaddleSeg, and it is located in the PaddleSeg directory. Then execute the following script:

export CUDA_VISIBLE_DEVICES=0 # Set 1 usable card
# If you are using windows, please excute following script:
# set CUDA_VISIBLE_DEVICES=0
python tools/train.py \
       --config configs/quick_start/pp_liteseg_optic_disc_512x512_1k.yml \
       --do_eval \
       --use_vdl \
       --save_interval 500 \
       --save_dir output

Parameters

ParameterEffectionIs RequiredDefault
itersNumber of training iterationsNoThe value specified in the configuration file.
batch_sizeBatch size on a single cardNoThe value specified in the configuration file.
learning_rateInitial learning rateNoThe value specified in the configuration file.
configConfiguration filesYes-
save_dirThe root path for saving model and visualdl log filesNooutput
num_workersThe number of processes used to read data asynchronously, when it is greater than or equal to 1, the child process is started to read datNo0
use_vdlWhether to enable visualdl to record training dataNoNo
save_intervalNumber of steps between model savingNo1000
do_evalWhether to start the evaluation when saving the model, the best model will be saved to best_model according to mIoU at startupNoNo
log_itersInterval steps for printing logNo10
resume_modelRestore the training model path, such as: output/iter_1000NoNone
keep_checkpoint_maxNumber of latest models savedNo5

2、Multi-card training

If you want to use multi-card training, you need to specify the environment variable CUDA_VISIBLE_DEVICES as multi-card (if not specified, all GPUs will be used by default), and use paddle.distributed.launch to start the training script (Can not use multi-card training under Windows, because it doesn't support nccl):

export CUDA_VISIBLE_DEVICES=0,1,2,3 # Set 4 usable cards
python -m paddle.distributed.launch tools/train.py \
       --config configs/quick_start/pp_liteseg_optic_disc_512x512_1k.yml \
       --do_eval \
       --use_vdl \
       --save_interval 500 \
       --save_dir output

3、Resume Training

python tools/train.py \
       --config configs/quick_start/pp_liteseg_optic_disc_512x512_1k.yml \
       --resume_model output/iter_500 \
       --do_eval \
       --use_vdl \
       --save_interval 500 \
       --save_dir output

4、Model Finetune

If you want to finetune from a pretrained model, you can set the model.pretrained keyword in config file, whose content is the URL or filepath of the pretrained model weights. Models pretrained on public datasets like Cityscapes or ADE20K are provided, and you can find the download urls of different models in PaddleSeg/configs.

model:
  type: FCN
  backbone:
    type: HRNet_W18
    pretrained: pretrained_model/hrnet_w18_ssld 
  num_classes: 19
  pretrained: FCN_pretrained.pdparams # URL or filepath of the pretrained model weights

5、Visualize Training Process

PaddleSeg will write the data during the training process into the VisualDL file, and view the log during the training process in real time. The recorded data includes:

  1. Loss change trend.
  2. Changes in learning rate.
  3. Training time.
  4. Data reading time.
  5. Mean IoU trend (takes effect when the do_eval switch is turned on).
  6. Trend of mean pixel Accuracy (takes effect when the do_eval switch is turned on).

Run the following command to start VisualDL to view the log

# The following command will start a service on 127.0.0.1, which supports viewing through the front-end web page. You can specify the actual ip address through the --host parameter
visualdl --logdir output/

Enter the suggested URL in the browser, the effect is as follows: