flags.md
October 6, 2016 ยท View on GitHub
fast_neural_style.lua
The script fast_neural_style.lua runs a trained model on new images. It has
the following flags:
Model options:
-model: Path to a.t7model file fromtrain.lua.-image_size: Before being input to the network, images are resized so their longer side is this many pixels long. If 0 then use the original size of the image.-median_filter: If nonzero, use a median filter of this kernel size as a post-processing step. Default is 3.
Input / Output:
-input_image: Path to a single image on which to run the model.-input_dir: Path to a directory of image files; the model will be run on all images in the directory.-output_image: When using-input_imageto specify input, the path to which the stylized image will be written.-output_dir: When using-input_dirto specify input, this gives a path to a directory where stylized images will be written. Each output image will have the same filename as its corresponding input image.
Backend options:
-gpu: Which GPU to use for processing (zero-indexed); use -1 to process on CPU.-backend: Which GPU backend to use; eithercudaoropencl. Default iscuda; this is ignored in CPU mode.-use_cudnn: Whether to use cuDNN with the CUDA backend; 1 for yes and 0 for no. Ignored in CPU mode or when using the OpenCL backend. Default is 1.-cudnn_benchmark: Whether to use the cuDNN autotuner when running with cuDNN; 1 for yes or 0 for no. Default is 0. If you want to run the model on many images of the same size, then setting this to 1 may give a speed boost.
webcam_demo.lua
The script webcam_demo.lua runs models off the video stream
from a webcam. It has the following flags:
Model options:
-models: A comma-separated list of models to use.
Webcam options:
-webcam_idx: Which webcam to use; default is 0.-webcam_fps: Frames per second to request from the webcam; default is 60.-height,-width: Image resolution to request from the webcam.
Backend options:
-gpu: Which GPU to use (zero-indexed); use -1 for CPU. You will likely need a GPU to get good results.-backend: GPU backend to use, eithercudaoropencl.-use_cudnn: Whether to use cuDNN when using CUDA; 1 for yes, 0 for no.
train.lua
The script train.lua trains new feedforward style transfer models.
It has the following flags:
Model options:
-arch: String specifying the architecture to use. Architectures are specified as comma-separated strings. The architecture used in the paper isc9s1-32,d64,d128,R128,R128,R128,R128,R128,u64,u32,c9s1-3. All internal convolutional layers are followed by a ReLU and either batch normalization or instance normalization.cXsY-Z: A convolutional layer with a kernel size ofX, a stride ofY, andZfilters.dX: A downsampling convolutional layer withXfilters, 3x3 kernels, and stride 2.RX: A residual block with two convolutional layers andXfilters per layer.uX: An upsampling convolutional layer withXfilters, 3x3 kernels, and stride 1/2.
-use_instance_norm: 1 to use instance normalization or 0 to use batch normalization. Default is 1.-padding_type: What type of padding to use for convolutions in residual blocks. The following choices are available:zero: Normal zero padding everywhere.none: No padding for convolutions in residual blocks.reflect: Spatial reflection padding for all convolutions in residual blocks.replicate: Spatial replication padding for all convolutions in residual blocks.reflect-start(default): Spatial reflection padding at the beginning of the model and no padding for convolutions in residual blocks.
-tanh_constant: There is a tanh nonlinearity after the final convolutional layer; this puts outputs in the range [-1, 1]. Outputs are then multiplied by the-tanh_constantso the outputs are in a more standard image range.-preprocessing: What type of preprocessing and deprocessing to use; eithervggorresnet. Default isvgg. If you want to use a ResNet as loss network you should set this toresnet.-resume_from_checkpoint: Path to a.t7checkpoint created bytrain.luato initialize the model from. If you use this option then all other model architecture options will be ignored.
Loss options:
-loss_network: Path to a.t7file containing a pretrained CNN to be used as a loss network. The default is VGG-16, but the code should support many models such as VGG-19 and ResNets.-content_layers: Which layers of the loss network to use for the content reconstruction loss. This will usually be a comma-separated list of integers, but for complicated loss networks like ResNets it can be a list of of layer strings.-content_weights: Weights to use for each content reconstruction loss. This can either be a single number, in which case the same weight is used for all content reconstruction terms, or it can be a comma-separated list of real values of the same length as-content_layers.-style_image: Path to the style image to use.-style_image_size: Before computing the style loss targets, the style image will be resized so its smaller side is this many pixels long. This can have a big effect on the types of features transferred from the style image.-style_layers: Which layers of the loss network to use for the style reconstruction loss. This is a comma-separated list of the same format as-content_layers.-style_weights: Weights to use for style reconstruction terms. Either a single number, in which case the same weight is used for all style reconstruction terms, or a comma-separated list of weights of the same length as-style_layers.-style_target_type: What type of style targets to use; eithergramormean. Default isgram, in which case style targets are Gram matrices as described by Gatys et al. If this ismeanthen the spatial average will be used as a style target instead of a Gram matrix.-tv_strength: Strength for total variation regularization on the output of the transformation network. Default is1e-6; higher values encourage the network to produce outputs that are spatially smooth.
Training options:
-h5_file: HDF5 dataset created withscripts/make_style_dataset.py.-num_iterations: The number of gradient descent iterations to run.-max_train: The maximum number of training images to use; default is -1 which uses the entire training set from the HDF5 dataset.-batch_size: The number of content images per minibatch. Default is 4.-learning_rate: Learning rate to use for Adam. Default is1e-3.-lr_decay_every,-lr_decay_after: Learning rate decay. After every-lr_decay_everyiterations the learning rate is multiplied by-lr_decay_factor. Setting-lr_decay_everyto -1 disables learning rate decay.-weight_decay: L2 regularization strength on the weights of the transformation network. Default is 0 (no L2 regularization).
Checkpointing:
-checkpoint_every: Every-checkpoint_everyiterations, check performance on the validation set and save both a.t7model checkpoint and a.jsoncheckpoint with loss history.-checkpoint_name: Path where checkpoints are saved. Default ischeckpoint, meaining that every-checkpoint_everyiterations we will write filescheckpoint.t7andcheckpoint.json.
Backend:
-gpu: Which GPU to use; default is 0. Set this to -1 to train in CPU mode.-backend: Which backend to use for GPU, eithercudaoropencl.-use_cudnn: Whether to use cuDNN when using CUDA; 0 for no and 1 for yes.
slow_neural_style.lua
The script slow_neural_style.lua uses the optimization-based style transfer method
similar to the original neural-style.
It has the following flags:
Basic Options
-content_image: Path to the content image to use.-style_image: Path to the style image to use.-image_size: Size of the generated image; its longest side is this many pixels long.
Output Options
-output_image: Path where the output image will be written.-print_every: Losses will be printed after every-print_everyiterations.-save_every: Images will be written every-save_everiterations.
Loss options
All of these flags are the same as those in train.lua:
-loss_network: Path to a.t7file containing a pretrained CNN to be used as a loss network. The default is VGG-16, but the code should support many models such as VGG-19 and ResNets.-content_layers: Which layers of the loss network to use for the content reconstruction loss. This will usually be a comma-separated list of integers, but for complicated loss networks like ResNets it can be a list of of layer strings.-content_weights: Weights to use for each content reconstruction loss. This can either be a single number, in which case the same weight is used for all content reconstruction terms, or it can be a comma-separated list of real values of the same length as-content_layers.-style_image_size: Before computing the style loss targets, the style image will be resized so its smaller side is this many pixels long. This can have a big effect on the types of features transferred from the style image.-style_layers: Which layers of the loss network to use for the style reconstruction loss. This is a comma-separated list of the same format as-content_layers.-style_weights: Weights to use for style reconstruction terms. Either a single number, in which case the same weight is used for all style reconstruction terms, or a comma-separated list of weights of the same length as-style_layers.-style_target_type: What type of style targets to use; eithergramormean. Default isgram, in which case style targets are Gram matrices as described by Gatys et al. If this ismeanthen the spatial average will be used as a style target instead of a Gram matrix.-tv_strength: Strength for total variation regularization on the output of the transformation network. Default is1e-6; higher values encourage the network to produce outputs that are spatially smooth.
Optimization Options
-learning_rate: Learning rate to use for optimization-optimizer: Eitherlbfgs,adam, or any other method from torch/optim.-num_iterations: Number of iterations to run for
Backend Options:
-gpu: Which GPU to use; default is 0. Set this to -1 to train in CPU mode.-backend: Which backend to use for GPU, eithercudaoropencl.-use_cudnn: Whether to use cuDNN when using CUDA; 0 for no and 1 for yes.
Other Options
-preprocessing: What type of preprocessing and deprocessing to use; eithervggorresnet. Default isvgg. If you want to use a ResNet as loss network you should set this toresnet.