README.md

June 13, 2025 · View on GitHub

EquiCaps: Predictor-Free Pose-Aware Pre-Trained Capsule Networks

arXiv:2506.09895 Dataset: 3DIEBench-T

Athinoulla Konstantinou1,2* Georgios Leontidis1,2 Mamatha Thota3 Aiden Durrant1
1School of Natural and Computing Sciences, University of Aberdeen, UK  
2Interdisciplinary Institute, University of Aberdeen, UK
3School of Computer Science, University of Lincoln, UK
*Corresponding author. Email: a.konstantinou.24@abdn.ac.uk

Paper Summary: We propose EquiCaps (Equivariant Capsule Network), a capsule-based self-supervised method that jointly learns invariant and equivariant representations. By leveraging capsules’ innate pose-awareness, EquiCaps shapes the latent space predictably, removing the need for a dedicated predictor. To enable richer benchmarking, we extend 3DIEBench to 3DIEBench-T, incorporating object translations alongside rotations. Empirically, EquiCaps achieves state-of-the-art rotation prediction on 3DIEBench, matching the performance of the fully supervised baseline. Extensive tests on 3DIEBench-T show that EquiCaps attains the highest rotation and object frame translation prediction among equivariant methods. Additionally, it is the only equivariant method that consistently achieves high equivariant performance on both rotation and translation, demonstrating its generalisation ability under multi-geometric equivariance learning, whereas the rest non-capsule-based equivariant methods achieve rotation prediction comparable to purely invariant methods. We hope our dataset enables the evaluation of invariant and equivariant representations under more challenging and realistic settings, and that our proposed method and results spark the exploration of alternative architectures.

Framework

Pretrained Models

You can download the complete pretrained model weights, including the projector (32 capsules).

Backbone Dataset Optimised for equivariance in Download Top-1 (%) Rotation (R2) Translation (R2)
ResNet-18 3DIEBench rotation ckpt 83.24 0.78 -
ResNet-18 3DIEBench-T rotation ckpt 76.91 0.73 0.60 (object frame) 0.88 (base frame)
ResNet-18 3DIEBench-T rotation and object frame translation ckpt 78.25 0.71 0.62 (object frame)
ResNet-18 3DIEBench-T rotation and base frame translation ckpt 77.88 0.71 0.91 (base frame)

Pretraining

The code is set up for distributed training for pretraining, but it can be easily adjusted to run on a single GPU. To pretrain EquiCaps on 3DIEBench-T, run the following command:

python main.py --experience EquiCaps_3x3 \
    --exp-dir './exps/<run_name>/' \
    --root-log-dir './logs/' \
    --epochs 2000 \
    --arch resnet18 \
    --batch-size 256 \
    --base-lr 0.001 \
    --dataset-root 'path/to/3DIEBench-T' \
    --images-file ./data/train_images.npy \
    --labels-file ./data/train_labels.npy \
    --num-workers 32 \
    --sim-coeff  0.1 \
    --equi-factor 5 \
    --std-coeff 10 \
    --cov-coeff 1 \
    --mlp 1111-16-32 \
    --resolution 256

Downstream Evaluation of Representations

The evaluation scripts are configured to run on a single GPU.

Classification

Run the following command:

python eval_classification.py \
    --dataset-root 'path/to/3DIEBench-T' \
    --exp-dir './exps/<run_name>/' \
    --root-log-dir './logs/' \
    --epochs 300 \
    --arch resnet18 \
    --batch-size 256 \
    --lr 0.001 \
    --wd 0.00000 \
    --num-workers 8 \
    --experience EquiCaps_3x3 \
    --mlp 1111-16-32 \
    --resolution 256 \
    --weights-file './path/to/ckpt'

Rotation Prediction

Run the following command:

python eval_angle_prediction.py \
    --dataset-root 'path/to/3DIEBench-T' \
    --exp-dir './exps/<run_name>/' \
    --root-log-dir './logs/' \
    --epochs 300 \
    --arch resnet18 \
    --batch-size 256 \
    --lr 0.001 \
    --wd 0.00000 \
    --experience EquiCaps_3x3 \
    --num-workers 8 \
    --mlp 1111-16-32 \
    --resolution 256 \
    --weights-file './path/to/ckpt'

Translation Prediction - Object Frame

Run the following command:

python eval_translation_prediction_object_frame.py \
    --dataset-root 'path/to/3DIEBench-T' \
    --exp-dir './exps/<run_name>/' \
    --root-log-dir './logs/' \
    --epochs 300 \
    --arch resnet18 \
    --batch-size 256 \
    --lr 0.001 \
    --wd 0.00000 \
    --num-workers 8 \
    --experience EquiCaps_3x3 \
    --mlp 1111-16-32 \
    --resolution 256 \
    --weights-file './path/to/ckpt'

Translation Prediction - Base (Final) Frame

Run the following command:

python eval_translation_prediction_final_frame.py \
    --dataset-root 'path/to/3DIEBench-T' \
    --exp-dir './exps/<run_name>/' \
    --root-log-dir './logs/' \
    --epochs 300 \
    --arch resnet18 \
    --batch-size 256 \
    --lr 0.001 \
    --wd 0.00000 \
    --num-workers 8 \
    --experience EquiCaps_3x3 \
    --mlp 1111-16-32 \
    --resolution 256 \
    --weights-file './path/to/ckpt'

Colour Prediction

Run the following command:

python eval_color_prediction.py \
    --dataset-root 'path/to/3DIEBench-T' \
    --exp-dir './exps/<run_name>/' \
    --root-log-dir './logs/' \
    --epochs 50 \
    --arch resnet18 \
    --batch-size 256 \
    --lr 0.001 \
    --wd 0.00000 \
    --num-workers 8 \
    --experience EquiCaps_3x3 \
    --mlp 1111-16-32 \
    --resolution 256 \
    --weights-file './path/to/ckpt'

3DIEBench-T Dataset

The dataset introduced in this work is publicly available on Hugging Face.

The dataset is reproducible from scratch; details are given in the dataset_generation directory.

Samples 3DIEBench-T

Requirements

PyTorch, Numpy, PIL, Tensorboard and Scipy.

License

This code is released under the GPL v3.0 License.

The 3DIEBench-T dataset is released under the CC-BY-NC 4.0 License.

Contribution and Contact

We do not anticipate pull requests for this repository. If you encounter any issues reproducing our experiments, please feel free to open an issue or contact the corresponding author of the paper directly.

Acknowledgement

This code is mostly built on the SIE and SR-CapsNet repositories.

Citation

@article{konstantinou2025equicaps,
  title={EquiCaps: Predictor-Free Pose-Aware Pre-Trained Capsule Networks},
  author={Konstantinou, Athinoulla and Leontidis, Georgios and Thota, Mamatha and Durrant, Aiden},
  journal={arXiv preprint arXiv:2506.09895},
  year={2025}
}