Awesome Self Supervised Learning [](https://github.com/sindresorhus/awesome) [](https://discord.gg/xvNJW94)

April 16, 2025 · View on GitHub

Want to leverage state of the art Self-Supervised Learning and Distillation to pretrain your models? Check out the following tools by the team from Lightly AI:

  • ⚡️ LightlyTrain: A framework to pretrain your computer vision backbones in 3 lines of code.
  • 💡 LightlySSL: A research-focused collection of state-of-the-art self-supervised training methods.

2024

TitleRelevant Links
Scalable Pre-training of Large Autoregressive Image ModelsarXiv Open In Colab
SAM 2: Segment Anything in Images and VideosarXiv Google Drive
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based ApproacharXiv
GLID: Pre-training a Generalist Encoder-Decoder Vision ModelarXiv Google Drive
Rethinking Patch Dependence for Masked AutoencodersarXiv Google Drive
You Don't Need Data-Augmentation in Self-Supervised LearningarXiv
Occam's Razor for Self Supervised Learning: What is Sufficient to Learn Good Representations?arXiv
Asymmetric Masked Distillation for Pre-Training Small Foundation ModelsCVPR GitHub
Revisiting Feature Prediction for Learning Visual Representations from VideoarXiv GitHub
Rethinking Patch Dependence for Masked AutoencodersarXiv GitHub
ARVideo: Autoregressive Pretraining for Self-Supervised Video Representation LearningarXiv

2023

TitleRelevant Links
A Cookbook of Self-Supervised LearningarXiv
Masked Autoencoders Enable Efficient Knowledge DistillersarXiv Google Drive
Understanding and Generalizing Contrastive Learning from the Inverse Optimal Transport PerspectiveCVPR Google Drive
CycleCL: Self-supervised Learning for Periodic VideosarXiv Google Drive
Temperature Schedules for Self-Supervised Contrastive Methods on Long-Tail DataarXiv Google Drive
Reverse Engineering Self-Supervised LearningarXiv Google Drive
Improved baselines for vision-language pre-trainingarXiv Google Drive
DINOv2: Learning Robust Visual Features without SupervisionarXiv Google Drive
Segment AnythingarXiv Google Drive
Self-Supervised Learning from Images with a Joint-Embedding Predictive ArchitecturearXiv Google Drive
Self-supervised Object-Centric Learning for VideosNeurIPS
Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and ResolutionNeurIPS
An Information-Theoretic Perspective on Variance-Invariance-Covariance RegularizationNeurIPS
The Role of Entropy and Reconstruction in Multi-View Self-Supervised LearningarXiv GitHub
Fast Segment AnythingarXiv GitHub
Faster Segment Anything: Towards Lightweight SAM for Mobile ApplicationsarXiv GitHub
What Do Self-Supervised Vision Transformers Learn?arXiv GitHub
Improved baselines for vision-language pre-trainingarXiv GitHub
Active Self-Supervised Learning: A Few Low-Cost Relationships Are All You NeedarXiv
EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment AnythingarXiv GitHub
DropPos: Pre-Training Vision Transformers by Reconstructing Dropped PositionsarXiv GitHub
VideoMAE V2: Scaling Video Masked Autoencoders with Dual MaskingCVPR
MGMAE: Motion Guided Masking for Video Masked AutoencodingCVPR GitHub

2022

TitleRelevant Links
Masked Siamese Networks for Label-Efficient LearningarXiv Google Drive Open In Colab
The Hidden Uniform Cluster Prior in Self-Supervised LearningarXiv Open In Colab
Unsupervised Visual Representation Learning by Synchronous Momentum GroupingarXiv Open In Colab
TiCo: Transformation Invariance and Covariance Contrast for Self-Supervised Visual Representation LearningarXiv Open In Colab
VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised LearningarXiv Open In Colab
VICRegL: Self-Supervised Learning of Local Visual FeaturesarXiv Open In Colab
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-TrainingarXiv Google Drive
Improving Visual Representation Learning through Perceptual UnderstandingarXiv Google Drive
RankMe: Assessing the downstream performance of pretrained self-supervised representations by their rankarXiv Google Drive
A Closer Look at Self-Supervised Lightweight Vision TransformersarXiv GitHub
Beyond neural scaling laws: beating power law scaling via data pruningarXiv GitHub
A simple, efficient and scalable contrastive masked autoencoder for learning visual representationsarXiv
Masked Autoencoders are Robust Data AugmentorsarXiv
Is Self-Supervised Learning More Robust Than Supervised Learning?arXiv
Can CNNs Be More Robust Than Transformers?arXiv GitHub
Patch-level Representation Learning for Self-supervised Vision TransformersarXiv GitHub

2021

TitleRelevant Links
Barlow Twins: Self-Supervised Learning via Redundancy ReductionarXiv Open In Colab
Decoupled Contrastive LearningarXiv Open In Colab
Dense Contrastive Learning for Self-Supervised Visual Pre-TrainingarXiv Open In Colab
Emerging Properties in Self-Supervised Vision TransformersarXiv Open In Colab
Masked Autoencoders Are Scalable Vision LearnersarXiv Open In Colab
With a Little Help from My Friends: Nearest-Neighbor Contrastive Learning of Visual RepresentationsarXiv Open In Colab
SimMIM: A Simple Framework for Masked Image ModelingarXiv Open In Colab
Exploring Simple Siamese Representation LearningarXiv Open In Colab
When Does Contrastive Visual Representation Learning Work?arXiv
Efficient Visual Pretraining with Contrastive DetectionarXiv

2020

TitleRelevant Links
Bootstrap your own latent: A new approach to self-supervised LearningarXiv Open In Colab
A Simple Framework for Contrastive Learning of Visual RepresentationsarXiv Open In Colab
Unsupervised Learning of Visual Features by Contrasting Cluster AssignmentsarXiv Open In Colab

2019

TitleRelevant Links
Momentum Contrast for Unsupervised Visual Representation LearningarXiv Open In Colab

2018

TitleRelevant Links
Unsupervised Feature Learning via Non-Parametric Instance-level DiscriminationarXiv

2016

TitleRelevant Links
Context Encoders: Feature Learning by InpaintingarXiv