README.md

June 13, 2025 · View on GitHub

EquiCaps: Predictor-Free Pose-Aware Pre-Trained Capsule Networks

Athinoulla Konstantinou^1,2* Georgios Leontidis^1,2 Mamatha Thota³ Aiden Durrant¹
¹School of Natural and Computing Sciences, University of Aberdeen, UK
²Interdisciplinary Institute, University of Aberdeen, UK
³School of Computer Science, University of Lincoln, UK
^*Corresponding author. Email: a.konstantinou.24@abdn.ac.uk

Paper Summary: We propose EquiCaps (Equivariant Capsule Network), a capsule-based self-supervised method that jointly learns invariant and equivariant representations. By leveraging capsules’ innate pose-awareness, EquiCaps shapes the latent space predictably, removing the need for a dedicated predictor. To enable richer benchmarking, we extend 3DIEBench to 3DIEBench-T, incorporating object translations alongside rotations. Empirically, EquiCaps achieves state-of-the-art rotation prediction on 3DIEBench, matching the performance of the fully supervised baseline. Extensive tests on 3DIEBench-T show that EquiCaps attains the highest rotation and object frame translation prediction among equivariant methods. Additionally, it is the only equivariant method that consistently achieves high equivariant performance on both rotation and translation, demonstrating its generalisation ability under multi-geometric equivariance learning, whereas the rest non-capsule-based equivariant methods achieve rotation prediction comparable to purely invariant methods. We hope our dataset enables the evaluation of invariant and equivariant representations under more challenging and realistic settings, and that our proposed method and results spark the exploration of alternative architectures.

Framework

Pretrained Models

You can download the complete pretrained model weights, including the projector (32 capsules).

Backbone	Dataset	Optimised for equivariance in	Download	Top-1 (%)	Rotation (R²)	Translation (R²)
ResNet-18	3DIEBench	rotation	ckpt	83.24	0.78	-
ResNet-18	3DIEBench-T	rotation	ckpt	76.91	0.73	0.60 (object frame) 0.88 (base frame)
ResNet-18	3DIEBench-T	rotation and object frame translation	ckpt	78.25	0.71	0.62 (object frame)
ResNet-18	3DIEBench-T	rotation and base frame translation	ckpt	77.88	0.71	0.91 (base frame)

Pretraining

The code is set up for distributed training for pretraining, but it can be easily adjusted to run on a single GPU. To pretrain EquiCaps on 3DIEBench-T, run the following command:

python main.py --experience EquiCaps_3x3 \
    --exp-dir './exps/<run_name>/' \
    --root-log-dir './logs/' \
    --epochs 2000 \
    --arch resnet18 \
    --batch-size 256 \
    --base-lr 0.001 \
    --dataset-root 'path/to/3DIEBench-T' \
    --images-file ./data/train_images.npy \
    --labels-file ./data/train_labels.npy \
    --num-workers 32 \
    --sim-coeff  0.1 \
    --equi-factor 5 \
    --std-coeff 10 \
    --cov-coeff 1 \
    --mlp 1111-16-32 \
    --resolution 256

Downstream Evaluation of Representations

The evaluation scripts are configured to run on a single GPU.

Classification

Run the following command:

python eval_classification.py \
    --dataset-root 'path/to/3DIEBench-T' \
    --exp-dir './exps/<run_name>/' \
    --root-log-dir './logs/' \
    --epochs 300 \
    --arch resnet18 \
    --batch-size 256 \
    --lr 0.001 \
    --wd 0.00000 \
    --num-workers 8 \
    --experience EquiCaps_3x3 \
    --mlp 1111-16-32 \
    --resolution 256 \
    --weights-file './path/to/ckpt'

Rotation Prediction

Run the following command:

python eval_angle_prediction.py \
    --dataset-root 'path/to/3DIEBench-T' \
    --exp-dir './exps/<run_name>/' \
    --root-log-dir './logs/' \
    --epochs 300 \
    --arch resnet18 \
    --batch-size 256 \
    --lr 0.001 \
    --wd 0.00000 \
    --experience EquiCaps_3x3 \
    --num-workers 8 \
    --mlp 1111-16-32 \
    --resolution 256 \
    --weights-file './path/to/ckpt'

Translation Prediction - Object Frame

Run the following command:

python eval_translation_prediction_object_frame.py \
    --dataset-root 'path/to/3DIEBench-T' \
    --exp-dir './exps/<run_name>/' \
    --root-log-dir './logs/' \
    --epochs 300 \
    --arch resnet18 \
    --batch-size 256 \
    --lr 0.001 \
    --wd 0.00000 \
    --num-workers 8 \
    --experience EquiCaps_3x3 \
    --mlp 1111-16-32 \
    --resolution 256 \
    --weights-file './path/to/ckpt'

Translation Prediction - Base (Final) Frame

Run the following command:

python eval_translation_prediction_final_frame.py \
    --dataset-root 'path/to/3DIEBench-T' \
    --exp-dir './exps/<run_name>/' \
    --root-log-dir './logs/' \
    --epochs 300 \
    --arch resnet18 \
    --batch-size 256 \
    --lr 0.001 \
    --wd 0.00000 \
    --num-workers 8 \
    --experience EquiCaps_3x3 \
    --mlp 1111-16-32 \
    --resolution 256 \
    --weights-file './path/to/ckpt'

Colour Prediction

Run the following command:

python eval_color_prediction.py \
    --dataset-root 'path/to/3DIEBench-T' \
    --exp-dir './exps/<run_name>/' \
    --root-log-dir './logs/' \
    --epochs 50 \
    --arch resnet18 \
    --batch-size 256 \
    --lr 0.001 \
    --wd 0.00000 \
    --num-workers 8 \
    --experience EquiCaps_3x3 \
    --mlp 1111-16-32 \
    --resolution 256 \
    --weights-file './path/to/ckpt'

3DIEBench-T Dataset

The dataset introduced in this work is publicly available on Hugging Face.

The dataset is reproducible from scratch; details are given in the dataset_generation directory.

Samples 3DIEBench-T

@article{konstantinou2025equicaps,
  title={EquiCaps: Predictor-Free Pose-Aware Pre-Trained Capsule Networks},
  author={Konstantinou, Athinoulla and Leontidis, Georgios and Thota, Mamatha and Durrant, Aiden},
  journal={arXiv preprint arXiv:2506.09895},
  year={2025}
}

README.md

EquiCaps: Predictor-Free Pose-Aware Pre-Trained Capsule Networks

Pretrained Models

Pretraining

Downstream Evaluation of Representations

Classification

Rotation Prediction

Translation Prediction - Object Frame

Translation Prediction - Base (Final) Frame

Colour Prediction

3DIEBench-T Dataset

Requirements

License

Contribution and Contact

Acknowledgement

Citation