Model Training

February 14, 2023 · View on GitHub

Model Training

1、Start Training

We can train the model through the script provided by PaddleSeg. Here we use PP-LiteSeg model and optic_disc dataset to show the training process. Please make sure that you have already installed PaddleSeg, and it is located in the PaddleSeg directory. Then execute the following script:

export CUDA_VISIBLE_DEVICES=0 # Set 1 usable card
# If you are using windows, please excute following script:
# set CUDA_VISIBLE_DEVICES=0
python tools/train.py \
       --config configs/quick_start/pp_liteseg_optic_disc_512x512_1k.yml \
       --do_eval \
       --use_vdl \
       --save_interval 500 \
       --save_dir output

Parameters

Parameter	Effection	Is Required	Default
iters	Number of training iterations	No	The value specified in the configuration file.
batch_size	Batch size on a single card	No	The value specified in the configuration file.
learning_rate	Initial learning rate	No	The value specified in the configuration file.
config	Configuration files	Yes	-
save_dir	The root path for saving model and visualdl log files	No	output
num_workers	The number of processes used to read data asynchronously, when it is greater than or equal to 1, the child process is started to read dat	No	0
use_vdl	Whether to enable visualdl to record training data	No	No
save_interval	Number of steps between model saving	No	1000
do_eval	Whether to start the evaluation when saving the model, the best model will be saved to best_model according to mIoU at startup	No	No
log_iters	Interval steps for printing log	No	10
resume_model	Restore the training model path, such as: `output/iter_1000`	No	None
keep_checkpoint_max	Number of latest models saved	No	5

2、Multi-card training

If you want to use multi-card training, you need to specify the environment variable CUDA_VISIBLE_DEVICES as multi-card (if not specified, all GPUs will be used by default), and use paddle.distributed.launch to start the training script (Can not use multi-card training under Windows, because it doesn't support nccl):

export CUDA_VISIBLE_DEVICES=0,1,2,3 # Set 4 usable cards
python -m paddle.distributed.launch tools/train.py \
       --config configs/quick_start/pp_liteseg_optic_disc_512x512_1k.yml \
       --do_eval \
       --use_vdl \
       --save_interval 500 \
       --save_dir output

3、Resume Training

python tools/train.py \
       --config configs/quick_start/pp_liteseg_optic_disc_512x512_1k.yml \
       --resume_model output/iter_500 \
       --do_eval \
       --use_vdl \
       --save_interval 500 \
       --save_dir output

4、Model Finetune

If you want to finetune from a pretrained model, you can set the model.pretrained keyword in config file, whose content is the URL or filepath of the pretrained model weights. Models pretrained on public datasets like Cityscapes or ADE20K are provided, and you can find the download urls of different models in PaddleSeg/configs.

model:
  type: FCN
  backbone:
    type: HRNet_W18
    pretrained: pretrained_model/hrnet_w18_ssld 
  num_classes: 19
  pretrained: FCN_pretrained.pdparams # URL or filepath of the pretrained model weights

5、Visualize Training Process

PaddleSeg will write the data during the training process into the VisualDL file, and view the log during the training process in real time. The recorded data includes:

Loss change trend.
Changes in learning rate.
Training time.
Data reading time.
Mean IoU trend (takes effect when the do_eval switch is turned on).
Trend of mean pixel Accuracy (takes effect when the do_eval switch is turned on).

Run the following command to start VisualDL to view the log

# The following command will start a service on 127.0.0.1, which supports viewing through the front-end web page. You can specify the actual ip address through the --host parameter
visualdl --logdir output/

Enter the suggested URL in the browser, the effect is as follows: