Awesome Self Supervised Learning [](https://github.com/sindresorhus/awesome) [](https://discord.gg/xvNJW94)

April 16, 2025 · View on GitHub

Want to leverage state of the art Self-Supervised Learning and Distillation to pretrain your models? Check out the following tools by the team from Lightly AI:

⚡️ LightlyTrain: A framework to pretrain your computer vision backbones in 3 lines of code.
💡 LightlySSL: A research-focused collection of state-of-the-art self-supervised training methods.

2024

Title	Relevant Links
Scalable Pre-training of Large Autoregressive Image Models
SAM 2: Segment Anything in Images and Videos
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach
GLID: Pre-training a Generalist Encoder-Decoder Vision Model
Rethinking Patch Dependence for Masked Autoencoders
You Don't Need Data-Augmentation in Self-Supervised Learning
Occam's Razor for Self Supervised Learning: What is Sufficient to Learn Good Representations?
Asymmetric Masked Distillation for Pre-Training Small Foundation Models
Revisiting Feature Prediction for Learning Visual Representations from Video
Rethinking Patch Dependence for Masked Autoencoders
ARVideo: Autoregressive Pretraining for Self-Supervised Video Representation Learning

2023

Title	Relevant Links
A Cookbook of Self-Supervised Learning
Masked Autoencoders Enable Efficient Knowledge Distillers
Understanding and Generalizing Contrastive Learning from the Inverse Optimal Transport Perspective
CycleCL: Self-supervised Learning for Periodic Videos
Temperature Schedules for Self-Supervised Contrastive Methods on Long-Tail Data
Reverse Engineering Self-Supervised Learning
Improved baselines for vision-language pre-training
DINOv2: Learning Robust Visual Features without Supervision
Segment Anything
Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture
Self-supervised Object-Centric Learning for Videos
Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution
An Information-Theoretic Perspective on Variance-Invariance-Covariance Regularization
The Role of Entropy and Reconstruction in Multi-View Self-Supervised Learning
Fast Segment Anything
Faster Segment Anything: Towards Lightweight SAM for Mobile Applications
What Do Self-Supervised Vision Transformers Learn?
Improved baselines for vision-language pre-training
Active Self-Supervised Learning: A Few Low-Cost Relationships Are All You Need
EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything
DropPos: Pre-Training Vision Transformers by Reconstructing Dropped Positions
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
MGMAE: Motion Guided Masking for Video Masked Autoencoding

2022

Title	Relevant Links
Masked Siamese Networks for Label-Efficient Learning
The Hidden Uniform Cluster Prior in Self-Supervised Learning
Unsupervised Visual Representation Learning by Synchronous Momentum Grouping
TiCo: Transformation Invariance and Covariance Contrast for Self-Supervised Visual Representation Learning
VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning
VICRegL: Self-Supervised Learning of Local Visual Features
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Improving Visual Representation Learning through Perceptual Understanding
RankMe: Assessing the downstream performance of pretrained self-supervised representations by their rank
A Closer Look at Self-Supervised Lightweight Vision Transformers
Beyond neural scaling laws: beating power law scaling via data pruning
A simple, efficient and scalable contrastive masked autoencoder for learning visual representations
Masked Autoencoders are Robust Data Augmentors
Is Self-Supervised Learning More Robust Than Supervised Learning?
Can CNNs Be More Robust Than Transformers?
Patch-level Representation Learning for Self-supervised Vision Transformers

2021

Title	Relevant Links
Barlow Twins: Self-Supervised Learning via Redundancy Reduction
Decoupled Contrastive Learning
Dense Contrastive Learning for Self-Supervised Visual Pre-Training
Emerging Properties in Self-Supervised Vision Transformers
Masked Autoencoders Are Scalable Vision Learners
With a Little Help from My Friends: Nearest-Neighbor Contrastive Learning of Visual Representations
SimMIM: A Simple Framework for Masked Image Modeling
Exploring Simple Siamese Representation Learning
When Does Contrastive Visual Representation Learning Work?
Efficient Visual Pretraining with Contrastive Detection

2020

Title	Relevant Links
Bootstrap your own latent: A new approach to self-supervised Learning
A Simple Framework for Contrastive Learning of Visual Representations
Unsupervised Learning of Visual Features by Contrasting Cluster Assignments

2019

Title	Relevant Links
Momentum Contrast for Unsupervised Visual Representation Learning

2018

Title	Relevant Links
Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination

2016

Title	Relevant Links
Context Encoders: Feature Learning by Inpainting