Data Pre-Processing Document

June 12, 2024 · View on GitHub

<🎯Back to Homepage>

Data Pre-Processing Document

This document mainly illustrates how to use several off-the-shelf toolkits to automatically perform data pre-processing.

Overview

We use the following toolkits to perform data pre-processing of LaCon:

ConditionScriptModel Weights
Canny Edgegenerate-canny.py-
HED Edgebdcn-edge-detection/generate-bdcn-edge.pydata-preprocessing/bdcn.pth
Color Strokegenerate-stroke.pth-
Image Palettegenerate-palette.py-
Saliency Masku2net-saliency-detection/generate-saliency-mask.pydata-preprocessing/u2net.pth

Before you start data pre-processing with the above toolkits, please download the toolkit model weights from our Huggingface repo and place them in bdcn-edge-detection/checkpoints and u2net-saliency-detection/checkpoints.

Prepare Datasets

Before you start to prepare the conditions, you need to download the source images of different datasets. You can refer to the following links:

Generate Canny Edge

Execute the following command line to extract Canny edge maps from images in a folder:

python data-preprocessing/generate-canny.py --indir IMAGE_PATH --outdir CANNY_PATH --threshold1 CANNY_THRESHOLD_ONE --threshold2 CANNY_THRESHOLD_TWO

The extracted Canny edge maps will be saved in CANNY_PATH. You can refer to this example command line:

python data-preprocessing/generate-canny.py --indir data/coco2017val/images --outdir data/coco2017val/canny-edges --threshold1 200 --threshold2 225

Generate HED Edge

  1. Download the model weights of BDCN edge extractor from this link, and place the weights in data-preprocessing/bdcn-edge-detection/checkpoints.
  2. Execute the following command line to extract HED edge maps from images in a folder:
python data-preprocessing/bdcn-edge-detection/generate-bdcn-edge.py --indir IMAGE_PATH --outdir HED_EDGE_PATH

The extracted HED edge maps will be saved in HED_EDGE_PATH. You can refer to this example command line:

python data-preprocessing/bdcn-edge-detection/generate-bdcn-edge.py --indir data/coco2017val/images --outdir data/coco2017val/bdcn-edges

Generate Color Stroke

Execute the following command line to extract color strokes from images in a folder:

python data-preprocessing/generate-stroke.py --indir IMAGE_PATH --outdir OUTPUT_PATH --kmeans_center K_MEANS_CENTER_NUMBER

You can refer to this example command line:

python data-preprocessing/generate-stroke.py --indir data/coco2017val/images --outdir data/coco2017val/color-strokes --kmeans_center 16

By changing --kmeans_center, you can control the number of color in the generated strokes; by turning on --visualize_intermediate, you can visualize the intermediate results (i.e., source image, filtered image, and generated strokes) during color stroke generation. The intermediate results will be saved in OUTPUT_PATH/intermediates.

Generate Image Palette

Execute the following command line to extract image palettes from images in a folder:

python data-preprocessing/generate-palette.py --indir IMAGE_PATH --outdir OUTPUT_PATH --size BICUBIC_SIZE --image_resolution IMAGE_RESOLUTION

You can refer to this example command line:

python data-preprocessing/generate-palette.py --indir data/coco2017val/images --outdir data/coco2017val/image-palette --size 8 --image_resolution 512

By changing --size, you can control the image size of Bicubic down-sampled result; by changing --image_resolution, you can define the original image resolution; by turning on --visualize_intermediate, you can visualize the intermediate results (i.e., source image, Bicubic down-sampled image, and generated palette) during image palette generation. The intermediate results will be saved in OUTPUT_PATH/intermediates.

Generate Saliency Mask

  1. Download the model weights of U2-Net from this link, and place the weights in u2net-saliency-detection/checkpoints.
  2. Execute the following command line to extract saliency masks from images in a folder:
python data-preprocessing/u2net-saliency-detection/generate-saliency-mask.py --indir IMAGE_PATH --outdir OUTPUT_PATH

You can refer to this example command line:

python data-preprocessing/u2net-saliency-detection/generate-saliency-mask.py --indir data/coco2017val/images --outdir data/coco2017val/saliency-masks