Continual Learning Module

October 30, 2025 ยท View on GitHub

This module provides popular continual learning baseline implementations on top of the Soft-Actor-Critic (SAC) algorithm. The implementation is based on Tensorflow.

Installation

To install the continual learning module, run the following command:

$ pip install COOM[cl]

Running Experiments

You can run single task or continual learning experiments with CL/run_single.py and CL/run_cl.py scripts, respectively. To see available script arguments, run with --help option, e.g. python CL/run_single.py --help

Single task

python CL/run_single.py --scenario pitfall

Continual learning

python CL/run_cl.py --sequence CO4 --cl_method packnet

Reproducing Experimental Results

We have also listed all the commands for running the experiments in our paper in cl.sh and single.sh. We used seeds [0, 1, ..., 9] for all experiments in the paper.

Average Performance, Forgetting, Forward Transfer and Action Distributions

We evaluate the continual learning methods on the COOM benchmark based on Average Performance, Forgetting, and Forward Transfer. We use the following CL methods:

python CL/run_cl.py --sequence [SEQUENCE] --seed [SEED] --cl_method packnet --packnet_retrain_steps 10000 --clipnorm 2e-05
python CL/run_cl.py --sequence [SEQUENCE] --seed [SEED] --cl_method mas --cl_reg_coef=10000
python CL/run_cl.py --sequence [SEQUENCE] --seed [SEED] --cl_method agem --regularize_critic --episodic_mem_per_task 10000 --episodic_batch_size 128
python CL/run_cl.py --sequence [SEQUENCE] --seed [SEED] --cl_method l2 --cl_reg_coef=100000
python CL/run_cl.py --sequence [SEQUENCE] --seed [SEED] --cl_method ewc --cl_reg_coef=250
python CL/run_cl.py --sequence [SEQUENCE] --seed [SEED] --cl_method vcl --cl_reg_coef=1 --vcl_first_task_kl False
python CL/run_cl.py --sequence [SEQUENCE] --seed [SEED] --cl_method clonex --exploration_kind 'best_return' --cl_reg_coef=100 --episodic_mem_per_task 10000 --episodic_batch_size 128
python CL/run_cl.py --sequence [SEQUENCE] --seed [SEED] --batch_size 512 --buffer_type reservoir --reset_buffer_on_task_change False --replay_size 8e5  # Perfect Memory
python CL/run_cl.py --sequence [SEQUENCE] --seed [SEED]  # Fine-tuning

We ran the COC sequence with sparse reward and only with PackNet:

python CL/run_cl.py --sequence COC --seed [SEED] --sparse_rewards --cl_method packnet --packnet_retrain_steps 10000 --clipnorm 2e-05

Measuring Forward Transfer also requires running SAC on each task in isolation:

python CL/run_single.py --scenario [SCENARIO] --envs [ENVS] --seed [SEED] --no_test

Network plasticity

To reproduce our network plasticity experiments from the paper, run the following command:

python CL/run_cl.py --sequence CO8 --seed [SEED] --repeat_sequence 10 --no_test --steps_per_env 100000

Method Variations

To reproduce our method variations experiments from the paper, run the following command:

Image Augmentations

  1. Random Convolution
  2. Random Shift
  3. Random Noise
python CL/run_cl.py --sequence CO8 --cl_method [METHOD] --seed [SEED] --augment --augmentation conv
python CL/run_cl.py --sequence CO8 --cl_method [METHOD] --seed [SEED] --augment --augmentation shift
python CL/run_cl.py --sequence CO8 --cl_method [METHOD] --seed [SEED] --augment --augmentation noise

Prioritized Experience Replay (PER)

python CL/run_cl.py --sequence CO8 --cl_method [METHOD] --seed [SEED] --buffer_type prioritized

LSTM

python CL/run_cl.py --sequence CO8 --cl_method [METHOD] --seed [SEED] --use_lstm

Critic Regularization

python CL/run_cl.py --sequence CO8 --cl_method [METHOD] --seed [SEED] --regularize_critic

Command Line Arguments

Below is a table of the available command line arguments for the script:

CategoryArgumentDefaultDescription
Core--scenariosNoneScenarios to run. Choices: health_gathering, run_and_gun, dodge_projectiles, chainsaw, raise_the_roof, floor_is_lava, hide_and_seek, arms_dealer, pitfall
--envs['default']Name of the environments in the scenario(s) to run
--test_envs[]Name of the environments to periodically evaluate the agent on
--no_testFalseIf True, no test environments will be used
--seed0Seed for randomness
--gpuNoneWhich GPU to use
--sparse_rewardsFalseWhether to use the sparse reward setting
Continual Learning--sequenceNoneName of the continual learning sequence. Choices: CD4, CD8, CD16, CO4, CO8, CO16, COC, MIXED
--cl_methodNoneContinual learning method. Choices: clonex, owl, l2, ewc, mas, vcl, packnet, agem
--start_from0Which task to start/continue the training from
--num_repeats1How many times to repeat the sequence
--random_orderFalseWhether to randomize the order of the tasks
DOOM--renderFalseRender the environment
--render_sleep0.0Sleep time between frames when rendering
--variable_queue_length5Number of game variables to remember
--frame_skip4Number of frames to skip
--resolutionNoneScreen resolution of the game. Choices: 800X600, 640X480, 320X240, 160X120
Save/Load--save_freq_epochs25Save the model parameters after n epochs
--model_pathNonePath to load the model from
Recording--recordFalseWhether to record gameplay videos
--record_every100Record gameplay video every n episodes
--video_folder'videos'Path to save the gameplay videos
Logging--with_wandbFalseEnables Weights and Biases
--logger_output["tsv", "tensorboard"]Types of logger used. Choices: neptune, tensorboard, tsv
--group_id"default_group"Group ID, for grouping logs from different experiments into common directory
--log_every1000Number of steps between subsequent evaluations and logging
Model--use_lstmFalseWhether to use an LSTM after the CNN encoder head
--hidden_sizes[256, 256]Hidden sizes list for the MLP models
--activation"lrelu"Activation kind for the models
--use_layer_normTrueWhether to use layer normalization
--multihead_archsTrueWhether to use multi-head architecture
--hide_task_idFalseWhether the model knows the task during test time
Learning Rate--lr1e-3Learning rate for the optimizer
--lr_decay'linear'Method to decay the learning rate over time. Choices: None, 'linear', 'exponential'
--lr_decay_rate0.1Rate to decay the learning
--lr_decay_steps1e5Number of steps to decay the learning rate
Replay Buffer--replay_size5e4Size of the replay buffer
--buffer_type"fifo"Strategy of inserting examples into the buffer. Choices: fifo, other values as per BufferType enum
--episodic_memory_from_bufferTrue[Description]
Training--steps_per_env2e5Number of steps the algorithm will run per environment
--update_after5000Number of env interactions to collect before starting to do update the gradient
--update_every500Number of env interactions to do between every update
--n_updates50Number of consecutive policy gradient descent updates to perform
--batch_size128Minibatch size for the optimization
--gamma0.99Discount factor
--alpha"auto"Entropy regularization coefficient
--target_output_std0.089Target standard deviation of the action distribution for dynamic alpha tuning
--regularize_criticFalseWhether to regularize both actor and critic, or only actor
--clipnormNoneValue for gradient clipping
Testing--testTrueWhether to test the model
--test_onlyFalseWhether to only test the model
--test_episodes3Number of episodes to test the model
Exploration--start_steps10000Number of steps for uniform-random action selection
--agent_policy_explorationFalseWhether to use uniform exploration only in the first task
--exploration_kindNoneKind of exploration to use at the beginning of a new task
Task Change--reset_buffer_on_task_changeTrueWhether to reset the replay buffer on task change
--reset_optimizer_on_task_changeTrueWhether to reset the optimizer on task change
--reset_critic_on_task_changeFalseWhether to reset the critic on task change
CL Method Specific--packnet_retrain_steps0Number of retrain steps after network pruning per task
--cl_reg_coef0.0Regularization strength for certain CL methods
--vcl_first_task_klFalseUse KL regularization for the first task in VCL
--episodic_mem_per_task0Number of examples to keep in memory per task for AGEM
--episodic_batch_size0Minibatch size for additional loss computation in AGEM
Observation--frame_stack4Number of frames to stack
--frame_height84Height of the frame
--frame_width84Width of the frame
--augmentFalseWhether to use image augmentation
--augmentationNoneType of image augmentation. Choices: 'conv', 'shift', 'noise'
Reward--reward_frame_survived0.01Reward for surviving a frame
--reward_switch_pressed15.0Reward for pressing a switch
--reward_kill_dtc1.0Reward for eliminating an enemy
--reward_kill_rag5.0Reward for eliminating an enemy
--reward_kill_chain5.0Reward for eliminating an enemy
--reward_health_hg15.0Reward for picking up a health kit
--reward_health_has5.0Reward for picking a health kit
--reward_weapon_ad15.0Reward for picking a weapon
--reward_delivery30.0Reward for delivering an item
--reward_platform_reached1.0Reward for reaching a platform
--reward_on_platform0.1Reward for staying on a platform
--reward_scaler_pitfall0.1Reward scaler for traversal
--reward_scaler_traversal1e-3Reward scaler for traversal
Penalty--penalty_passivity-0.1Penalty for not moving
--penalty_death-1.0Negative reward for dying
--penalty_projectile-0.01Negative reward for projectile hit
--penalty_health_hg-0.01Negative reward for losing health
--penalty_health_dtc-1.0Negative reward for losing health
--penalty_health_has-5.0Negative reward for losing health
--penalty_lava-0.1Penalty for stepping on lava
--penalty_ammo_used-0.1Negative reward for using ammo