Computer Vision (CV)

September 14, 2022 · View on GitHub

Awesome-Self-Supervised-Papers

Collecting papers about Self-Supervised Learning, Representation Learning.

Last Update : 2021. 09. 26.

  • Update papers that handles self-supervised learnning with distillation. (Seed, Compress, DisCo, DoGo, SimDis ...)
  • Add a dense prediction paper (SoCo)

Any contributions, comments are welcome.

Computer Vision (CV)

Pretraining / Feature / Representation

Contrastive Learning

Conference / JournalPaperImageNet Acc (Top 1)
CVPR 2006Dimensionality Reduction by Learning an Invariant Mapping-
arXiv:1807.03748Representation learning with contrastive predictive coding (CPC)-
arXiv:1911.05722Momentum Contrast for Unsupervised Visual Representation Learning (MoCo)60.6 %
arXiv:1905.09272Data-Efficient Image Recognition contrastive predictive coding (CPC v2)63.8 %
arXiv:1906.05849Contrastive Multiview Coding (CMC)66.2 %
arXiv:2002.05709A Simple Framework for Contrastive Learning of Visual Representations (SimCLR)69.3 %
arXiv:2003.12338Improved Baselines with Momentum Contrastive Learning(MoCo v2)71.1 %
arXiv:2003.05438Rethinking Image Mixture for Unsupervised Visual Representation Learning65.9 %
arXiv:2004.05554Feature Lenses: Plug-and-play Neural Modules for Transformation-Invariant Visual Representations
arXiv:2006.10029Big Self-Supervised Models are Strong Semi-Supervised Learners(SimCLRv2)77.5 % (10% label)
arXiv:2006.07733Bootstrap Your Own Latent A New Approach to Self-Supervised Learning74.3 %
arXiv:2006.09882Unsupervised Learning of Visual Features by Contrasting Cluster Assignments(SwAV)75.3%
arXiv:2008.05659What Should Not Be Contrastive in Contrastive Learning80.2 % (ImageNet-100)
arXiv:2007.00224Debiased Contrastive Learning74.6 % (ImageNet-100)
arXiv:2009.00104A Framework For Contrastive Self-Supervised Learning And Designing A New Approach-
ICLR2021 under reviewSELF-SUPERVISED REPRESENTATION LEARNING VIA ADAPTIVE HARD-POSITIVE MINING72.3% (ResNet-50(4x): 77.3%)
IEEE AccessContrastive Representation Learning: A Framework and Reviewreview paper
arXiv:2010.01929EQCO: EQUIVALENT RULES FOR SELF-SUPERVISED CONTRASTIVE LEARNING68.5 % (Proposed) / 66.6 % (SimCLR) / 200epochs
arXiv:2010.01028Hard Negative Mixing for Contrastive Learning68.0% / 200epochs
arXiv:2011.10566Exploring Simple Siamese Representation Learning(SimSiam)68.1% / 100 epochs / 256 batch
arXiv:2010.06682Are all negatives created equal in contrastive instance discrimination?-
arXiv:2101.05224Big Self-Supervised Models Advance Medical Image ClassificationAUC: 0.7729 (SimCLR / ImagNet--> Chexpert / ResNet-152(2x))
arXiv:2012.08850Contrastive Learning Inverts the Data Generating ProcessTheoretical fondation about contrastive learning
arXiv:2103.01988Self-supervised Pretraining of Visual Features in the Wild(finetune) 83.8%(693M parameters), 84.2%(1.3B parameters)
arXiv:2103.03230Barlow Twins: Self-Supervised Learning via Redundancy Reduction73.2%
arXiv:2104.02057An Empirical Study of Training Self-Supervised Vision Transformers81.0%

Dense Contrastive Learning

Conference / JournalPaperAP(bbox) @COCOAP(mask) @COCO
NeurIPS 2020Unsupervised Learning of Dense Visual Representations39.235.6
arXiv:2011.09157Dense Contrastive Learning for Self-Supervised Visual Pre-Training40.3 @COCO36.4
arXiv:2011.10043Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning41.437.4
arXiv:2102.08318Instance Localization for Self-supervised Detection Pretraining42.037.6
arXiv:2103.06122Spatially Consistent Representation Learning41.337.7
arXiv:2103.10957Efficient Visual Pretraining with Contrastive Detection42.7 (DetCon_B)38.2 (DetCon_B)
arXiv:2106.02637Aligning Pretraining for Detection via Object-Level Contrastive Learning43.238.4

Image Transformation

Conference / JournalPaperImageNet Acc (Top 1).
ECCV 2016Colorful image colorization(Colorization)39.6%
ECCV 2016Unsupervised learning of visual representations by solving jigsaw puzzles45.7%
CVPR 2018Unsupervised Feature Learning via Non-Parametric Instance Discrimination (NPID, NPID++)NPID: 54.0%, NPID++: 59.0%
CVPR 2018Boosting Self-Supervised Learning via Knowledge Transfer (Jigsaw++)-
CVPR 2020Self-Supervised Learning of Pretext-Invariant Representations (PIRL)63.6 %
CVPR 2020Steering Self-Supervised Feature Learning Beyond Local Pixel Statistics-
arXiv:2003.04298Multi-modal Self-Supervision from Generalized Data Transformations-

Self-supervised learning with Knowledge Distillation

Conference / JournalPaperMethod
NeurIPS 2020CompRess: Self-Supervised Learning by Compressing RepresentationsSimilarity Distribution + Memory bank
ICLR 2021SEED: SELF-SUPERVISED DISTILLATION FOR VISUAL REPRESENTATIONSimilarity Distribution + Memory bank
arXiv:2104.09124DisCo: Remedy Self-supervised Learning on Lightweight Models with Distilled Contrastive LearningContrastive Learning w/ Teacher Model
arXiv:2104.09866Distill on the Go: Online knowledge distillation in self-supervised learningContrastive Learnning w/ Teacher Model
arXiv:2104.14294Emerging Properties in Self-Supervised Vision TransformersSelf Distillation w/ Teacher Model
ICLR 2022iBOT: Image BERT Pre-Training with Online TokenizerSelf Distillation w/ Teacher Model + Masked Image Modeling
arXiv:2106.11304Simple Distillation Baselines for Improving Small Self-supervised ModelsContrastive Learning w/ Teacher Model + Multi-view loss
arXiv:2107.01691Bag of Instances Aggregation Boosts Self-supervised LearningBag aggregation

Others (in Pretraining / Feature / Representation)

Conference / JournalPaperMethod
ICLR2018Unsupervised Representation Learning by Predicting Image RotationsSurrogate classes, pre-training
ICML 2018Mutual Information Neural EstimationMutual Information
NeurIPS 2019Wasserstein Dependency Measure for Representation LearningMutual Information
ICLR 2019Learning Deep Representations by Mutual Information Estimation and MaximizationMutual Information
arXiv:1903.12355Local Aggregation for Unsupervised Learning of Visual EmbeddingsLocal Aggregation
arXiv:1906.00910Learning Representations by Maximizing Mutual Information Across ViewsMutual Information
arXiv:1907.02544Large Scale Adversarial Representation Learning(BigBiGAN)Adversarial Training
ICLR 2020On Mutual Information Maximization for Representation LearningMutual Information
CVPR 2020How Useful is Self-Supervised Pretraining for Visual Tasks?-
CVPR 2020Adversarial Robustness: From Self-Supervised Pre-Training to Fine-TuningAdversarial Training
ICLR 2020Self-Labeling via Simultaneous Clustering and Representation LearningInformation
arXiv:1912.11370Big Transfer (BiT): General Visual Representation Learningpre-training
arXiv:2009.07724Evaluating Self-Supervised Pretraining Without Using Labelspre-training
arXiv:2010.00578UNDERSTANDING SELF-SUPERVISED LEARNING WITH DUAL DEEP NETWORKSDual Deep Network
ICLR 2021 under reviewREPRESENTATION LEARNING VIA INVARIANT CAUSAL MECHANISMSCasual mechanism
arXiv:2006.06882Rethinking Pre-training and Self-trainingRethinking
arXiv:2102.12903Self-Tuning for Data-Efficient Deep LearningData-efficient deep learning
arXiv:2102.10106Mine Your Own vieW: Self-Supervised Learning Through Across-Sample PredictionFind similar samples
ECCV 2020Mitigating Embedding and Class Assignment Mismatch in Unsupervised Image ClassificationFeature embedding & refining
arXiv:2102.11150Improving Unsupervised Image Clustering With Robust LearningPseudo-label, clustering
CVPR 2021How Well Do Self-Supervised Models Transfer?Benchmarking

Identification / Verification / Classification / Recognition

Conference / JournalPaperDatasetsPerformance
CVPR 2020Real-world Person Re-Identification via Degradation Invariance LearningMLR-CHUK03Acc : 85.7(R@1)
CVPR 2020Spatially Attentive Output Layer for Image ClassificationImageNetAcc : 81.01 (Top-1)
CVPR 2020Look-into-Object: Self-supervised Structure Modeling for Object RecognitionImageNetTop-1 err : 22.87

Segmentation / Depth Estimation

Conference / JournalPaperDatasetsPerformance
CVPR 2020Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic SegmentationVOC 2012mIoU : 64.5
CVPR 2020Towards Better Generalization: Joint Depth-Pose Learning without PoseNetKITTI 2015F1 : 18.05 %
IROS 2020Monocular Depth Estimation with Self-supervised Instance AdaptationKITTI 2015Abs Rel : 0.074
CVPR 2020Novel View Synthesis of Dynamic Scenes with Globally Coherent Depths from a Monocular Camera--
CVPR 2020Unsupervised Intra-domain Adaptation for Semantic Segmentation through Self-SupervisionGTA5->CityscapemIoU : 46.3
CVPR 2020D3VO : Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual Odometry--
CVPR 2020Self-Supervised Human Depth Estimation from Monocular Videos--
arxiv:2009.07714Calibrating Self-supervised Monocular Depth EstimationKITTIAbs Rel: 0.113

Detection / Localization

Conference / JournalPaperDatsetsPerformance
CVPR 2020Instance-aweare, Context-focused, and Memory-efficient Weakly Supervised Object DetectionVOC 2012AP(50) : 67.0

Generation

Conference / JournalPaperTask
CVPR 2020StyleRig: Rigging StyleGAN for 3D Control over Portrait ImagesPortrait Images
ICLR 2020From Inference to Generation: End-to-End Fully Self-Supervised Generation of Human Face from SpeechGenerate human face from speech
ACMMM2020Neutral Face Game Character Auto-Creation via PokerFace-GAN
ICLR 2021
under review
Self-Supervised Variational Auto-EncodersFID: 34.71 (CIFAR-10)

Video

Conference / JournalPaperTaskPerformanceDatasets
TPAMIA Review on Deep Learning Techniques for Video PredictionVideo prediction review--
CVPR 2020Distilled Semantics for Comprehensive Scene Understanding from VideosScene UnderstandingSq Rel : 0.748KITTI 2015
CVPR 2020Self-Supervised Learning of Video-Induced Visual InvariancesRepresentation Learning--
ECCV 2020Video Representation Learning by Recognizing Temporal TransformationsRepresentation Learning26.1 % (Video Retrieval Top-1)UCF101
arXiv:2008.02531Self-supervised Video Representation Learning Using Inter-intra Contrastive FrameworkRepresentation Learning42.4 % (Video Retrieval Top-1)UCF101
NeurIPS 2020Space-Time Correspondence as a Contrastive Random WalkContrastive Learning64.8 (Region Similarity)DAVIS 2017

Others

Conference / JournalPaperTaskPerformance
CVPR 2020Flow2Stereo: Effective Self-Supervised Learning of Optical Flow and Stereo MatchingOptical FlowF1 : 7.63% (KITTI 2012)
CVPR 2020Self-Supervised Viewpoint Learning From Image CollectionsViewpoint learningMAE : 4.0 (BIWI)
CVPR 2020Self-Supervised Scene De-occlusionRemove occlusionmAP : 29.3 % (KINS)
CVPR 2020Distilled Semantics for Comprehensive Scene Understanding from VideosScene Understanding-
CVPR 2020Learning by Analogy : Reliable Supervision from Transformations for Unsupervised Optical Flow EstimationOptical FlowF1 : 11.79% (KITTI 2015)
CVPR 2020D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features3D Local Features-
CVPR 2020SpeedNet: Learning the Speediness in Videospredict the "speediness"-
CVPR 2020Action Segmentation with Joint Self-Supervised Temporal Domain AdaptationAction SegmentationF1@10 : 83.0 (GTEA)
CVPR 2020MVP: Unified Motion and Visual Self-Supervised Learning for Large-Scale Robotic NavigationRobotic Navigation-
arXiv:2003.06734Active Perception and Representation for Robotic ManipulationRobot manipulation-
arXiv:2005.01655Words aren’t enough, their order matters: On the Robustness of Grounding Visual Referring ExpressionsVisual Referring Expressions-
arXiv:2004.11362Supervised Contrastive LearningSupervised Contrastive LearningImageNet Acc: 80.8 (Top-1)
arXiv:2007.14449Learning from Scale-Invariant Examples for Domain Adaptation in Semantic SegmentationDomain AdaptationGTA5 to Cityscape : 47.5 (mIoU)
arXiv:2007.12360On the Effectiveness of Image Rotation for Open Set Domain AdaptationDomain Adaptation-
arXiv:2003.12283LIMP: Learning Latent Shape Representations with Metric Preservation PriorsGeneartive models-
arXiv:2004.04312Learning to Scale Multilingual Representations for Vision-Language TasksVision-LanguageMSCOCO: 81.5
arXiv:2003.08934NeRF: Representing Scenes as Neural Radiance Fields for View SynthesisView Synthesis-
arXiv:2001.01536Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed ClassificationKnowledge Distillation, Long-tail classification-
arXiv:2006.07114Knowledge Distillation Meets Self-SupervisionKnowledge DistillationRes50 --> MobileNetv2 Acc: 72.57 (Top-1)
AAAI2020Fast and Robust Face-to-Parameter Translation for Game Character Auto-CreationGame Character Auto-Creation-
arXiv:2009.07719Domain-invariant Similarity Activation Map Metric Learning for Retrieval-based Long-term Visual LocalizationSimilarity Activation Map-
arXiv:2008.10312Self-Supervised Learning for Large-Scale Unsupervised Image ClusteringImage ClusteringImageNet Acc: 38.60 (cluster assignment)
ICLR2021 under reviewSSD: A UNIFIED FRAMEWORK FOR SELFSUPERVISED OUTLIER DETECTIONOutlier DetectionCIFAR10/CIFAR100 : 94.1% (in/out)

Natural Language Processing (NLP)

Conference / JournalPaperDatasetsPerformance
arXiv:2004.03808Improving BERT with Self-Supervised AttentionGLUEAvg : 79.3 (BERT-SSA-H)
arXiv:2004.07159PALM: Pre-training an Autoencoding&Autoregressive Language Model for Context-conditioned GenerationMARCO0.498 (Rouge-L)
ACL 2020TriggerNER: Learning with Entity Triggers as Explanations for Named Entity Recognition--
arXiv:1909.11942ALBERT: A Lite BERT For Self-Supervised Learning of Language RepresentationsGLUEAvg : 89.4
AAAI 2020Learning to Compare for Better Training and Evaluation of Open Domain Natural Language Generation Models--
ACL 2020Contrastive Self-Supervised Learning for Commonsense ReasoningPDP-6090.0%

Speech

Conference / JournalPaperDatasetsPerformance
arXiv:1910.05453v3VQ-WAV2VEC: SELF-SUPERVISED LEARNING OF DISCRETE SPEECH REPRESENTATIONSnov92WER : 2.34
arXiv:1911.03912v2EFFECTIVENESS OF SELF-SUPERVISED PRE-TRAINING FOR SPEECH RECOGNITIONLibrispeechWER : 4.0
ICASSP 2020Generative Pre-Training for Speech with Augoregressive Predictive Coding--
Interspeech 2020Jointly Fine-Tuning “BERT-like” Self Supervised Models to Improve Multimodal Speech Emotion RecognitionIEMOCAPEmotion Acc: 75.458(%)

Graph

Conference / JournalPaperDatasetsPerformance
arXiv:2009.05923Contrastive Self-supervised Learning for Graph ClassificationPROTEINSA3-specific:85.80
arXiv:2102.13085Towards Robust Graph Contrastive LearningCora, Citeseer, PubmedAcc: 82.4 (Cora, GCA-DE)

Reinforcement Learning

Conference / JournalPaperPerformance
arxiv:2009.05923CONTRASTIVE BEHAVIORAL SIMILARITY EMBEDDINGS FOR GENERALIZATION IN REINFORCEMENT LEARNINGBiC-catch: 821±17 (Random Initialization / DrQ+PSEs)