FIGARO: Generating Symbolic Music with Fine-Grained Artistic Control

March 25, 2024 ยท View on GitHub

Listen to the samples on Soundcloud.

Paper: https://openreview.net/forum?id=NyR8OZFHw6i

Colab Demo: https://colab.research.google.com/drive/1UAKFkbPQTfkYMq1GxXfGZOJXOXU_svo6


Getting started

Prerequisites:

  • Python 3.9
  • Conda

Setup

  1. Clone this repository to your disk
  2. Install required packages (see requirements.txt). With venv:
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Preparing the Data

To train models and to generate new samples, we use the Lakh MIDI dataset (altough any collection of MIDI files can be used).

  1. Download (size: 1.6GB) and extract the archive file:
wget http://hog.ee.columbia.edu/craffel/lmd/lmd_full.tar.gz
tar -xzf lmd_full.tar.gz
  1. You may wish to remove the archive file now: rm lmd_full.tar.gz

Download Pre-Trained Models

If you don't wish to train your own models, you can download our pre-trained models.

  1. Download (size: 2.3GB) and extract the archive file:
wget -O checkpoints.zip https://polybox.ethz.ch/index.php/s/a0HUHzKuPPefWkW/download
unzip checkpoints.zip
  1. You may wish to remove the archive file now: rm checkpoints.zip

Training

Training arguments such as model type, batch size, model params are passed to the training scripts via environment variables.

Available model types are:

  • vq-vae: VQ-VAE model used for the learned desription
  • figaro: FIGARO with both the expert and learned description
  • figaro-expert: FIGARO with only the expert description
  • figaro-learned: FIGARO with only the learned description
  • figaro-no-inst: FIGARO (expert) without instruments
  • figaro-no-chord: FIGARO (expert) without chords
  • figaro-no-meta: FIGARO (expert) without style (meta) information
  • baseline: Unconditional decoder-only baseline following Huang et al. (2018)

Example invocation of the training script is given by the following command:

MODEL=figaro-expert python src/train.py

For models using the learned description (figaro and figaro-learned), a pre-trained VQ-VAE checkpoint needs to be provided as well:

MODEL=figaro VAE_CHECKPOINT=./checkpoints/vq-vae.ckpt python src/train.py

Generation

To generate samples, make sure you have a trained checkpoint prepared (either download one or train it yourself). For this script, make sure that the dataset is prepared according to Preparing the Data. This is needed to extract descriptions, based on which new samples can be generated.

An example invocation of the generation script is given by the following command:

python src/generate.py --model figaro-expert --checkpoint ./checkpoints/figaro-expert.ckpt

For models using the learned description (figaro and figaro-learned), a pre-trained VQ-VAE checkpoint needs to be provided as well:

python src/generate.py --model figaro --checkpoint ./checkpoints/figaro.ckpt --vae_checkpoint ./checkpoints/vq-vae.ckpt

Evaluation

We provide the evaluation scripts used to calculate the desription metrics on some set of generated samples. Refer to the previous section for how to generate samples yourself.

Example usage:

python src/evaluate.py --samples_dir ./samples/figaro-expert

It has been pointed out that the order of the dataset files (from which the splits are calculated) is non-deterministic and depends on the OS. To address this and to ensure reproducibility, I have added the exact files used for training/validation/testing in the respective file in the splits folder.

Parameters

The following environment variables are available for controlling hyperparameters beyond their default value.

Training (train.py)

Model

VariableDescriptionDefault value
MODELModel architecture to be trained
D_MODELHidden size of the model512
CONTEXT_SIZENumber of tokens in the context to be passed to the auto-encoder256
D_LATENT[VQ-VAE] Dimensionality of the latent space1024
N_CODES[VQ-VAE] Codebook size2048
N_GROUPS[VQ-VAE] Number of groups to split the latent vector into before discretization16

Optimization

VariableDescriptionDefault value
EPOCHSMax. number of training epochs16
MAX_TRAINING_STEPSMax. number of training iterations100,000
BATCH_SIZENumber of samples in each batch128
TARGET_BATCH_SIZENumber of samples in each backward step, gradients will be accumulated over TARGET_BATCH_SIZE//BATCH_SIZE batches256
WARMUP_STEPSNumber of learning rate warmup steps4000
LEARNING_RATEInitial learning rate, will be decayed after constant warmup of WARMUP_STEPS steps1e-4

Others

VariableDescriptionDefault value
CHECKPOINTPath to checkpoint from which to resume training
VAE_CHECKPOINTPath to VQ-VAE checkpoint to be used for the learned description
ROOT_DIRThe folder containing MIDI files to train on./lmd_full
OUTPUT_DIRFolder for saving checkpoints./results
LOGGING_DIRFolder for saving logs./logs
N_WORKERSNumber of workers to be used for the dataloaderavailable CPUs

Generation (generate.py)

The generation script uses command line arguments instead of environment variables.

ArgumentDescriptionDefault value
--modelSpecify which model will be loaded
--checkpointPath to the checkpoint for the specified model
--vae_checkpointPath to the VQ-VAE checkpoint to be used for the learned description (if applicable)
--lmd_dirFolder containing MIDI files to extract descriptions from./lmd_full
--output_dirFolder to save generated MIDI samples to./samples
--max_iterMax. number of tokens that should be generated16,000
--max_barsMax. number of bars that should be generated32
--make_medleysSet to True if descriptions should be combined into medleys.False
--n_medley_piecesNumber of pieces to be combined into one2
--n_medley_barsNumber of bars to take from each piece16
--verboseLogging level, set to 0 for silent execution2

Evaluation (evaluate.py)

The evaluation script uses command line arguments instead of environment variables.

ArgumentDescriptionDefault value
--samples_dirFolder containing generated samples which should be evaluated./samples
--output_fileCSV file to which a detailed log of all metrics will be saved to./metrics.csv
--max_samplesLimit the number of samples to be used for computing evaluation metrics1024