README.md
October 21, 2025 ยท View on GitHub
Result Module
This module provides scripts for downloading and plotting the results of the experiments in our paper. The results are stored in Weights | Biases and can be downloaded using the scripts in the download folder. The plotting scripts are located in the plotting folder. Calculating the core results in the paper can be done using the cl_metrics.py script.
Installation
To install the results module, run the following command:
$ pip install COOM[results]
Running experiments
For running the experiments in our paper please follow the instructions in the continual learning (CL) module README.md.
Downloading results
We recommend using Weights | Biases to log your experiments. Having done so, you can use the following scripts to download the results:
- Continual Learning Data - cl_data.py
python cl_data.py --project <WANDB_PROJECT> --sequence <SEQUENCE> - Single Run Data
- Single Run Data - single_data.py
python single_data.py --project <WANDB_PROJECT> --sequence <SEQUENCE> - Evaluation data on given tasks
python single_data.py --project <WANDB_PROJECT> --sequence <SEQUENCE> --test_envs <TEST_ENV_1> <TEST_ENV_2> ...
- Single Run Data - single_data.py
- Action Distribution Data - action_data.py
- Runtime Data - runtime_data.py
- For memory usage data run
python runtime_data.py --project <WANDB_PROJECT> --sequence <SEQUENCE> --metric system.proc.memory.rssMB - For walltime data run
python runtime_data.py --project <WANDB_PROJECT> --sequence <SEQUENCE> --metric walltime
- For memory usage data run
Plotting figures
Figures from the paper can be drawn using the plotting scripts.
- Ablation study bar plots
Compare ablation results to the default setting - ablations.py
python plotting/ablations.py --sequence CO8 --tags default noise conv shift reg_critic single_head no_task_id --methods packnet l2 mas ewc clonex agem - Stackplots of action distributions
- Single method during evaluation of all environments of a sequence - actions_all_envs.py
python plotting/actions_all_envs.py --method packnet --sequence CO8 --episode_length 1000 n_actions 12 - Selected methods during training on a given sequence - actions_by_method.py
python plotting/actions_all_envs.py --methods packnet vcl l2 agem --sequence CO8 --episode_length 1000 n_actions 12 - Single method during training on the given sequences - actions_by_sequence.py
python plotting/actions_all_envs.py --method packnet --sequences CO8 COC --episode_length 1000 n_actions 12 - Includes a histogram of training actions - actions_histogram.py
python plotting/actions_all_envs.py --method packnet --sequences CO8 COC --episode_length 1000 n_actions 12
- Single method during evaluation of all environments of a sequence - actions_all_envs.py
- Line plots of cumulative success during evaluation
- Compare ablations on a given sequence and methods - avg_success_ablations.py
python plotting/avg_success_methods.py --sequence CO8 --methods packnet ewc mas - Compare method performance across sequences - avg_success_sequence.py
python plotting/avg_success_sequence.py --method packnet --sequences CD8 CO8
- Compare ablations on a given sequence and methods - avg_success_ablations.py
- Resource consumption histograms
- Memory consumption during training - consumption.py
python plotting/consumption.py --metric memory --sequence CO4 --methods clonex ewc vcl agem - Walltime per training step - consumption.py
python plotting/consumption.py --metric walltime --sequence CO8 --methods packnet mas l2 perfect_memory
- Memory consumption during training - consumption.py
- Line plots of average success during evaluation on individual envs (useful to visualizer forgetting)
- Compare methods - perf_per_env.py
python plotting/perf_per_env.py --sequence CO8 --methods fine_tuning mas clonex packnet - Compare methods across multiple sequences - perf_per_env_n_seq.py
python plotting/perf_per_env_n_seq.py --sequences CO8 COC --methods packnet l2 - Compare methods on envs - perf_per_method.py
python plotting/perf_per_method.py --sequence CO8 --methods mas vcl packnet
- Compare methods - perf_per_env.py
- Plasticity line plots
Visualize loss of plasticity across training repetitions of a sequence - plasticity.py
python plotting/plasticity.py --sequence CO8 --method fine_tuning --n_repeats 10 - Training performance line plots
- Compare method across sequences on individual envs - train_comparison_per_env.py
python plotting/train_comparison_per_env.py --sequences CO8 COC --methods packnet mas l2 - Compare methods across sequences - train_comparison_per_method.py
python plotting/train_comparison_per_method.py --sequences CO8 COC --methods packnet mas l2
- Compare method across sequences on individual envs - train_comparison_per_env.py
- Visualize forward transfer with shaded areas between the training curves of the RL baseline and the CL method
- Per environment - transfer_per_env.py
python plotting/transfer_per_env.py --sequence CO8 --method packnet - Compare methods on the full sequence - transfer_per_method.py
python plotting/transfer_per_method.py --sequence CO8 --methods packnet clonex mas fine_tuning vcl l2 ewc agem
- Per environment - transfer_per_env.py
Calculating metrics
The results tables displayed in our paper can be obtained using the scripts for drawing tables.
- Ablation study results - ablations.py
python tables/ablations.py --sequence CO8 --tags default noise conv shift reg_critic single_head no_task_id --methods packnet l2 mas ewc clonex agem - Continual learning metrics across sequences and methods - cl_metrics.py
python tables/cl_metrics.py --sequences CD4 CO4 CD8 CO8 COC --methods packnet mas agem l2 ewc vcl fine_tuning clonex perfect_memory