CVPR-2022-Papers
July 28, 2022 · View on GitHub

官网链接:https://cvpr2022.thecvf.com/
开会时间:2022年6月19日-6月24日
❣❣❣近日,CVPR 2022 接收论文公布! 总计2067篇!,全部论文已发布,多多关注!!
❣❣❣另外打包下载所有论文,可在【我爱计算机视觉】微信公众号后台回复“paper”。
历年综述论文分类汇总戳这里↘️CV-Surveys施工中~~~~~~~~~~
2022 年论文分类汇总戳这里
↘️CVPR-2022-Papers ↘️WACV-2022-Papers
2021年论文分类汇总戳这里
↘️ICCV-2021-Papers ↘️CVPR-2021-Papers
2020 年论文分类汇总戳这里
↘️CVPR-2020-Papers ↘️ECCV-2020-Papers
目录
-
聚类
-
场景流
-
图识别
-
运动模糊
-
人像眼镜和阴影消除
-
识别唇语
-
模拟时钟读数
-
指纹识别
-
基于草图的图像操作
-
草图识别
-
去偏移
-
线段分类
-
Interactive object understanding
-
数字人类
-
强化学习
-
视觉关系检测
-
裂缝识别
-
眼球认证
-
视听事件定位
-
无偏见学习
-
Object Proposal Generation
-
读唇术
-
对应学习
-
视觉定位
-
视觉识别
-
Long-term action quality assessment
-
运动识别
-
CNN
-
Volume Rendering
-
virtual correspondences
-
红外测量
-
4D场景捕捉
-
可变形头像
-
活动预测
-
Mirror Detection
-
双手重建
-
Image Vectorization
-
行动学习
-
BNN
-
CNN
-
Place Recognition
-
物体识别
-
边缘检测
-
缺陷检测
Open-Set Recognition(开集识别)
Active Learning(主动学习)
- Active Learning for Open-Set Annotation
- Active Learning by Feature Mixing
- Towards Robust and Reproducible Active Learning Using Neural Networks
:star:code
Backdoor Attacks(后门攻击)
- DEFEAT: Deep Hidden Feature Backdoor Attacks by Imperceptible Perturbation and Latent Representation Constraints
- Better Trigger Inversion Optimization in Backdoor Scanning
- Towards Practical Deployment-Stage Backdoor Attack on Deep Neural Networks
:open_mouth:oral:star:code
Multi-view Clustering(多视图聚类)
- Highly-efficient Incomplete Large-scale Multi-view Clustering with Consensus Bipartite Graph
:star:code - Multi-Level Feature Learning for Contrastive Multi-View Clustering
:star:code - Deep Safe Multi-View Clustering: Reducing the Risk of Clustering Performance Degradation Caused by View Increase
- MPC: Multi-View Probabilistic Clustering
Machine Translation(机器翻译)
Object Counting(目标计数)
- Rethinking Spatial Invariance of Convolutional Networks for Object Counting
:star:code
:newspaper:解读 - Represent, Compare, and Learn: A Similarity-Aware Framework for Class-Agnostic Counting
:star:code
computer-aided design (CAD)
- Neural Face Identification in a 2D Wireframe Projection of a Manifold Object
:star:code - JoinABLe: Learning Bottom-up Assembly of Parametric CAD Joints
:star:code - ROCA: Robust CAD Model Retrieval and Alignment from a Single Image
:star:code - CADTransformer: Panoptic Symbol Spotting Transformer for CAD Drawings
:star:code - GAT-CADNet: Graph Attention Network for Panoptic Symbol Spotting in CAD Drawings
Transfer Learning(迁移学习)
Graph Matching(图匹配)
- Graph-Context Attention Networks for Size-Varied Deep Graph Matching
:star:code - Appearance and Structure Aware Robust Deep Visual Graph Matching: Attack, Defense and Beyond
:star:code
Noise Modeling(图像噪声建模)
60.Visual Emotion Analysis(视觉情感分析)
59.动画
- APES: Articulated Part Extraction From Sprite Sheets
:house:project - BANMo: Building Animatable 3D Neural Models From Many Casual Videos
:open_mouth:oral:house:project - Neural Head Avatars From Monocular RGB Videos
:star:code:house:project - FLAG: Flow-Based 3D Avatar Generation From Sparse Observations
:house:project - 图像动画
- 人物动画
- 3D character animation(三维角色动画)
- 3D 舞蹈生成
- 静止图像到动画
- 3D human avatars
58.Neural rendering(神经渲染)
- Learning Motion-Dependent Appearance for High-Fidelity Rendering of Dynamic Humans from a Single Camera
- IRON: Inverse Rendering by Optimizing Neural SDFs and Materials from Photometric Images
:open_mouth:oral:house:project - SqueezeNeRF: Further factorized FastNeRF for memory-efficient inference
- Direct Voxel Grid Optimization: Super-fast Convergence for Radiance Fields Reconstruction
:star:code - Modeling Indirect Illumination for Inverse Rendering
:star:code:house:project - GenDR: A Generalized Differentiable Renderer
:star:code
泛化可微渲染器 - CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields
:star:code:house:project - NeRF-Editing: Geometry Editing of Neural Radiance Fields
- AR-NeRF: Unsupervised Learning of Depth and Defocus Effects from Natural Images with Aperture Rendering Neural Radiance Fields
:house:project - Neural Rays for Occlusion-Aware Image-Based Rendering
:star:code:house:project - EfficientNeRF Efficient Neural Radiance Fields
:star:code - CoNeRF: Controllable Neural Radiance Fields
:star:code:house:project - Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields
:house:project - Hallucinated Neural Radiance Fields in the Wild
:star:code:house:project - HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video
:open_mouth:oral:star:code:house:project:tv:video - Ref-NeRF: Structured View-Dependent Appearance for Neural Radiance Fields
- Deblur-NeRF: Neural Radiance Fields From Blurry Images
:star:code:house:project - NeRFReN: Neural Radiance Fields With Reflections
:house:project - Depth-Supervised NeRF: Fewer Views and Faster Training for Free
:star:code:house:project - Dense Depth Priors for Neural Radiance Fields From Sparse Input Views
:star:code:house:project:tv:video - Light Field Neural Rendering
:star:code:house:project - InfoNeRF: Ray Entropy Minimization for Few-Shot Neural Volume Rendering
:star:code:house:project - BokehMe: When Neural Rendering Meets Classical Rendering
:open_mouth:oral:star:code - Plenoxels: Radiance Fields Without Neural Networks
:star:code:house:project - HDR-NeRF: High Dynamic Range Neural Radiance Fields
- Urban Radiance Fields
:house:project - Aug-NeRF: Training Stronger Neural Radiance Fields With Triple-Level Physically-Grounded Augmentations
:star:code - Fourier PlenOctrees for Dynamic Radiance Field Rendering in Real-Time
:star:code:house:project - Point-NeRF: Point-Based Neural Radiance Fields
- HumanNeRF: Efficiently Generated Human Radiance Field From Sparse Inputs
:house:project - Ray Priors through Reprojection: Improving Neural Radiance Fields for Novel View Extrapolation
57.Gaze Estimation(视线估计)
- GazeOnce: Real-Time Multi-Person Gaze Estimation
- Contrastive Regression for Domain Adaptation on Gaze Estimation
- Generalizing Gaze Estimation With Rotation Consistency
- GaTector: A Unified Framework for Gaze Object Prediction
- Dynamic 3D Gaze From Afar: Deep Gaze Estimation From Temporal Eye-Head-Body Coordination
:house:project
56.Sound
- Finding Fallen Objects via Asynchronous Audio-Visual Integration
:house:project - Weakly Paired Associative Learning for Sound and Image Representations via Bimodal Associative Memory
- MERLOT Reserve: Neural Script Knowledge Through Vision and Language and Sound
:star:code:house:project - Visual Acoustic Matching
:open_mouth:oral:house:project - 声源定位
- 音频配对
- 语音克隆
- 视听语音增强
- 文本转语音
- 语音转人脸图像
- 语音分离
- 语音手势生成
- 扬声器定位
- 语音手势生成
55.Novel View Synthesis(视图合成)
- NPBG++: Accelerating Neural Point-Based Graphics
:house:project - Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations
:house:project - AutoRF: Learning 3D Object Radiance Fields from Single View Observations
:house:project - NeurMiPs: Neural Mixture of Planar Experts for View Synthesis
:star:code:house:project:tv:video:newspaper:解读 - GeoNeRF: Generalizing NeRF with Geometry Priors
:star:code:house:project:tv:video - FWD: R eal-Time Novel View Synthesis With Forward Warping and Depth
:star:code - Block-NeRF: Scalable Large Scene Neural View Synthesis
- Boosting View Synthesis With Residual Transfer
:star:code:house:project - NeRF in the Dark: High Dynamic Range View Synthesis From Noisy Raw Images
- RegNeRF: Regularizing Neural Radiance Fields for View Synthesis from Sparse Inputs
:open_mouth:oral:star:code:house:project:tv:video - 视图连接
54.Dataset(数据集)
- ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer
:star:code:house:project:newspaper:粗解 - Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities
:star:code:house:project - 3MASSIV: Multilingual, Multimodal and Multi-Aspect dataset of Social Media Short Videos
:sunflower:dataset - Hephaestus: A large scale multitask dataset towards InSAR understanding
- SmartPortraits: Depth Powered Handheld Smartphone Dataset of Human Portraits for State Estimation, Reconstruction and Synthesis
:sunflower:dataset - AKB-48: A Real-World Articulated Object Knowledge Base
:star:code
:newspaper:粗解 - Primitive3D: 3D Object Dataset Synthesis from Randomly Assembled Primitives
- ZeroWaste Dataset: Towards Deformable Object Segmentation in Cluttered Scenes
:star:code:house:project - ETHSeg: An Amodel Instance Segmentation Network and a Real-World Dataset for X-Ray Waste Inspection
一个Amodel实例分割网络和一个用于X射线废物检查的真实数据集 - MAD: A Scalable Dataset for Language Grounding in Videos From Movie Audio Descriptions
:sunflower:dataset
一个可扩展的数据集,用于从电影音频描述中获得视频的Language Grounding - DiLiGenT102: A Photometric Stereo Benchmark Dataset With Controlled Shape and Material Variation
:sunflower:dataset
具有受控形状和材料变化的光度测量立体基准数据集 - DAD-3DHeads: A Large-Scale Dense, Accurate and Diverse Dataset for 3D Head Alignment From a Single Image
:sunflower:dataset
一个大规模的密集、准确和多样化的数据集,用于从单一图像中进行三维头部对准 - Rope3D: The Roadside Perception Dataset for Autonomous Driving and Monocular 3D Object Detection Task
:sunflower:dataset
用于自主驾驶和单眼3D物体检测任务的路边感知数据集 - Ithaca365: Dataset and Driving Perception Under Repeated and Challenging Weather Conditions
:sunflower:dataset - Open Challenges in Deep Stereo: The Booster Dataset
:sunflower:dataset - RGB-Multispectral Matching: Dataset, Learning Methodology, Evaluation
:house:project - 卫星数据集
- 动物行为理解数据集
- Animal Kingdom: A Large and Diverse Dataset for Animal Behavior Understanding
:open_mouth:oral:house:project:sunflower:dataset
- Animal Kingdom: A Large and Diverse Dataset for Animal Behavior Understanding
- 数据集(森林监测)
- 3D目标理解
- 数据集(AutoMine)
- AutoMine: An Unmanned Mine Dataset
:sunflower:dataset
- AutoMine: An Unmanned Mine Dataset
- 数据集(人脸表情识别)
- 数据集(手势识别)
- 数据集(谷物识别)
- 数据集(用于空间-时间行动、社会团体和活动检测)
53.Sign Language Translation(手语翻译)
- A Simple Multi-Modality Transfer Learning Baseline for Sign Language Translation
- Signing at Scale: Learning to Co-Articulate Signs for Large-Scale Photo-Realistic Sign Language Production
- MLSLT: Towards Multilingual Sign Language Translation
:house:project - 手语识别
52.Human Motion Forecasting(人体运动预测)
- Motron: Multimodal Probabilistic Human Motion Forecasting
:star:code - Progressively Generating Better Initial Guesses Towards Next Stages for High-Quality Human Motion Prediction
:star:code - Spatio-Temporal Gating-Adjacency GCN for Human Motion Prediction
- MotionAug: Augmentation With Physical Correction for Human Motion Prediction
:star:code - Future Transformer for Long-term Action Anticipation
:star:code:house:project - Weakly-Supervised Action Transition Learning for Stochastic Human Motion Prediction
:star:code - Multi-Objective Diverse Human Motion Prediction With Knowledge Distillation
- BE-STI: Spatial-Temporal Integrated Network for Class-Agnostic Motion Prediction With Bidirectional Enhancement
:star:code - Multi-Person Extreme Motion Prediction
51.光学、几何、光场成像
- Compressive Single-Photon 3D Cameras
- Fisher Information Guidance for Learned Time-of-Flight Imaging
- Light Field(光场)
- 深度重建
- 快门校正
- 热红外成像
- 相机姿势估计
- 相机重定位
- 成像
- 光学
- Quantization-aware Deep Optics for Diffractive Snapshot Hyperspectral Imaging
:star:code - Dual-Shutter Optical Vibration Sensing
- 相机姿势
- 相机成像
- 相机定位
- 孔径成像
- 高光谱成像
50.Anomaly Detection(异常检测)
- Catching Both Gray and Black Swans: Open-set Supervised Anomaly Detection
:star:code - Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection
:star:code - Anomaly Detection via Reverse Distillation From One-Class Embedding
- Towards Total Recall in Industrial Anomaly Detection
:star:code - 离群点检测
49.Image Geo-localization(图像地理定位)
- TransGeo: Transformer Is All You Need for Cross-view Image Geo-localization
:star:code - 视觉地理定位
- Rethinking Visual Geo-localization for Large-Scale Applications
:star:code - Deep Visual Geo-localization Benchmark
:open_mouth:oral:house:project
- Rethinking Visual Geo-localization for Large-Scale Applications
- 轨迹重建
48.Visual Grounding
- Multi-View Transformer for 3D Visual Grounding
:star:code - Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning
:star:code
视觉定位,通过自然语言定位目标位置 (很有意思的研究) - Shifting More Attention to Visual Backbone: Query-Modulated Refinement Networks for End-to-End Visual Grounding
:star:code - Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
:star:code - Multi-Modal Dynamic Graph Transformer for Visual Grounding
:star:code
47.Few/Zero-Shot Learning/Domain Generalization/Adaptation(小/零样本/域泛化/适应)
- 小样本
- Ranking Distance Calibration for Cross-Domain Few-Shot Learning
- Few-shot Learning with Noisy Labels
- Pushing the Limits of Simple Pipelines for Few-Shot Learning: External Data and Fine-Tuning Make a Difference
:house:project:tv:video - Few-shot Backdoor Defense Using Shapley Estimation
:newspaper:解读 - Attribute Surrogates Learning and Spectral Tokens Pooling in Transformers for Few-Shot Learning
:star:code - EASE: Unsupervised Discriminant Subspace Learning for Transductive Few-Shot Learning
:star:code - Semi-Supervised Few-Shot Learning via Multi-Factor Clustering
:star:code - Cross-Domain Few-Shot Learning With Task-Specific Adapters
:star:code
- 零样本
- MSDN: Mutually Semantic Distillation Network for Zero-Shot Learning
:star:code:newspaper:粗解 - Unseen Classes at a Later Time? No Problem
:star:code - En-Compactness: Self-Distillation Embedding & Contrastive Generation for Generalized Zero-Shot Learning
:newspaper:解读 - Non-Generative Generalized Zero-Shot Learning via Task-Correlated Disentanglement and Controllable Samples Synthesis
- Siamese Contrastive Embedding Network for Compositional Zero-Shot Learning
:star:code - KG-SP: Knowledge Guided Simple Primitives for Open World Compositional Zero-Shot Learning
:star:code
:newspaper:解读 - Uni-Perceiver: Pre-Training Unified Architecture for Generic Perception for Zero-Shot and Few-Shot Tasks
- Distinguishing Unseen From Seen for Generalized Zero-Shot Learning
- VGSE: Visually-Grounded Semantic Embeddings for Zero-Shot Learning
:star:code
:newspaper:零样本学习,大幅减少人工标注!马普所和北邮提出富含视觉信息的类别语义嵌入 - Audio-Visual Generalised Zero-Shot Learning With Cross-Modal Attention and Language
:star:code
- MSDN: Mutually Semantic Distillation Network for Zero-Shot Learning
- 域泛化
- Compound Domain Generalization via Meta-Knowledge Encoding
- Causality Inspired Representation Learning for Domain Generalization
:star:code - Towards Unsupervised Domain Generalization
:newspaper:CVPR 2022丨清华大学提出:无监督域泛化 (UDG)
本次任务的主要目标是域泛化(domain generalization(DG)),是首篇将DG推广到unsupervised learning 领域的,并提出一个新的研究领域 unsupervised domain generalization(UDG)。 - Towards Principled Disentanglement for Domain Generalization
:open_mouth:oral:star:code - Meta Convolutional Neural Networks for Single Domain Generalization
- PCL: Proxy-Based Contrastive Learning for Domain Generalization
- Localized Adversarial Domain Generalization
- Unsupervised Domain Generalization by Learning a Bridge Across Domains
- Style Neophile: Constantly Seeking Novel Styles for Domain Generalization
- BoosterNet: Improving Domain Generalization of Deep Neural Nets Using Culpability-Ranked Features
- Failure Modes of Domain Generalization Algorithms
- Geometric and Textural Augmentation for Domain Gap Reduction
:star:code - Revisiting Domain Generalized Stereo Matching Networks From a Feature Consistency Perspective
:star:code - 域外泛化
- 域适应
- Continual Test-Time Domain Adaptation
:star:code - Safe Self-Refinement for Transformer-based Domain Adaptation
:star:code:newspaper:解读 - Source-Free Domain Adaptation via Distribution Estimation
:newspaper:解读 - Learning Distinctive Margin toward Active Domain Adaptation
:star:code
:newspaper:解读 - DINE: Domain Adaptation from Single and Multiple Black-box Predictors
:star:code - Exploring Domain-Invariant Parameters for Source Free Domain Adaptation
- Physically Disentangled Intra- and Inter-Domain Adaptation for Varicolored Haze Removal
:star:code
- Continual Test-Time Domain Adaptation
- No-Reference Point Cloud Quality Assessment via Domain Adaptation
:star:code - Slimmable Domain Adaptation
:star:code - SHIFT: A Synthetic Driving Dataset for Continuous Multi-Task Domain Adaptation
- Geometric Anchor Correspondence Mining With Uncertainty Modeling for Universal Domain Adaptation
- 无监督域适应
- Reusing the Task-specific Classifier as a Discriminator: Discriminator-free Adversarial Domain Adaptation
:star:code - Category Contrast for Unsupervised Domain Adaptation in Visual Tasks
- The Norm Must Go On: Dynamic Unsupervised Domain Adaptation by Normalization
:star:code - Spectral Unsupervised Domain Adaptation for Visual Recognition
- Reusing the Task-specific Classifier as a Discriminator: Discriminator-free Adversarial Domain Adaptation
46.Scene Graph Generation(场景图生成)
- PPDL: Predicate Probability Distribution Based Loss for Unbiased Scene Graph Generation
- Fine-Grained Predicates Learning for Scene Graph Generation
:star:code - HL-Net: Heterophily Learning Network for Scene Graph Generatio
:star:code
场景图生成:异质学习网络
:newspaper:解读 - RU-Net: Regularized Unrolling Network for Scene Graph Generation
:star:code
场景图生成:正则展开网络
:newspaper:解读 - The Devil is in the Labels: Noisy Label Correction for Robust Scene Graph Generation
:star:code - Dynamic Scene Graph Generation via Anticipatory Pre-Training
- Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased Scene Graph Generation
:star:code - Structured Sparse R-CNN for Direct Scene Graph Generation
:star:code - HL-Net: Heterophily Learning Network for Scene Graph Generation
:star:code - Not All Relations Are Equal: Mining Informative Labels for Scene Graph Generation
- SGTR: End-to-end Scene Graph Generation with Transformer
:star:code - 视频场景图生成
45.Dense Prediction(密集预测)
- Does Robustness on ImageNet Transfer to Downstream Tasks?
- MPViT: Multi-Path Vision Transformer for Dense Prediction
:star:code - Learning Multiple Dense Prediction Tasks From Partially Annotated Data
:star:code
44.Federated Learning(联邦学习)
- CD2-pFed: Cyclic Distillation-guided Channel Decoupling for Model Personalization in Federated Learning
- Auditing Privacy Defenses in Federated Learning via Generative Gradient Leakage
:star:code - FedCorr: Multi-Stage Federated Learning for Label Noise Correction
:star:code - Fine-tuning Global Model via Data-Free Knowledge Distillation for Non-IID Federated Learning
- Rethinking Architecture Design for Tackling Data Heterogeneity in Federated Learning
- Layer-Wised Model Aggregation for Personalized Federated Learning
- Federated Learning With Position-Aware Neurons
- Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning
:star:code - FedDC: Federated Learning With Non-IID Data via Local Drift Decoupling and Correction
:star:code - Learn From Others and Be Yourself in Heterogeneous Federated Learning
:star:code - FedCor: Correlation-Based Active Client Selection Strategy for Heterogeneous Federated Learning
- Robust Federated Learning With Noisy and Heterogeneous Clients
:star:code - ResSFL: A Resistance Transfer Framework for Defending Model Inversion Attack in Split Federated Learning
:star:code
43.Multi-Task Learning(多任务学习)
- Controllable Dynamic Multi-Task Architectures
:house:project - Task Adaptive Parameter Sharing for Multi-Task Learning
- Raw High-Definition Radar for Multi-Task Learning
:star:code
42.Metric Learning(度量学习)
- Self-Taught Metric Learning without Labels
:star:code:house:project - Enhancing Adversarial Robustness for Deep Metric Learning
- Hypergraph-Induced Semantic Tuplet Loss for Deep Metric Learning
:star:code - Non-Isotropy Regularization for Proxy-Based Deep Metric Learning
:star:code - Hyperbolic Vision Transformers: Combining Improvements in Metric Learning
:star:code - Enhancing Adversarial Robustness for Deep Metric Learning
- Weakly-Supervised Metric Learning With Cross-Module Communications for the Classification of Anterior Chamber Angle Images
:star:code - Integrating Language Guidance Into Vision-Based Deep Metric Learning
:star:code
41.Incremental Learning(增量学习)
- 增量学习
- Energy-based Latent Aligner for Incremental Learning
:star:code - General Incremental Learning with Domain-aware Categorical Representations
- Forward Compatible Few-Shot Class-Incremental Learning
:star:code - Mimicking the Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning
:star:code - Few-Shot Incremental Learning for Label-to-Image Translation
- Energy-based Latent Aligner for Incremental Learning
- 类增量学习
- Doodle It Yourself: Class Incremental Learning by Drawing a Few Sketches
- Constrained Few-shot Class-incremental Learning
:star:code - Class-Incremental Learning with Strong Pre-trained Models
- Class-Incremental Learning by Knowledge Distillation With Adaptive Feature Consolidation
:star:code - Bring Evanescent Representations to Life in Lifelong Class Incremental Learning
- Self-Sustaining Representation Expansion for Non-Exemplar Class-Incremental Learning
- MetaFSCIL: A Meta-Learning Approach for Few-Shot Class Incremental Learning
- Federated Class-Incremental Learning
:star:code - vCLIMB: A Novel Video Class Incremental Learning Benchmark
:open_mouth:oral:star:code:house:project
40.Adversarial Learning(对抗学习)
- Give Me Your Attention: Dot-Product Attention Considered Harmful for Adversarial Patch Robustness
- Masking Adversarial Damage: Finding Adversarial Saliency for Robust and Sparse Network
- Towards Practical Certifiable Patch Defense with Vision Transformer
:newspaper:解读 - Enhancing Adversarial Training with Second-Order Statistics of Weights
:star:code - Practical Evaluation of Adversarial Robustness via Adaptive Auto Attack
:star:code - Improving Adversarial Transferability via Neuron Attribution-Based Attacks
:star:code - Two Coupled Rejection Metrics Can Tell Adversarial Examples Apart
:star:code - Bounded Adversarial Attack on Deep Content Features
- Subspace Adversarial Training
:star:code - Cross-Modal Transferable Adversarial Attacks From Images to Videos
:star:code - Understanding and Increasing Efficiency of Frank-Wolfe Adversarial Training
:star:code - Quarantine: Sparsity Can Uncover the Trojan Attack Trigger for Free
:star:code - Robust Combination of Distributed Gradients Under Adversarial Perturbations
- Adversarial Texture for Fooling Person Detectors in the Physical World
- DTA: Physical Camouflage Attacks Using Differentiable Transformation Network
:house:project - BppAttack: Stealthy and Efficient Trojan Attacks Against Deep Neural Networks via Image Quantization and Contrastive Adversarial Learning
:star:code - Pyramid Adversarial Training Improves ViT Performance
:house:project - NinjaDesc: Content-Concealing Visual Descriptors via Adversarial Learning
- 对抗样本
- Label-Only Model Inversion Attacks via Boundary Repulsion
:star:code - Self-supervised Learning of Adversarial Example: Towards Good Generalizations for Deepfake Detection
:star:code - Improving the Transferability of Targeted Adversarial Examples Through Object-Based Diverse Input
:star:code - Leveraging Adversarial Examples To Quantify Membership Information Leakage
:star:code
- Label-Only Model Inversion Attacks via Boundary Repulsion
- 对抗攻击
- Shadows can be Dangerous: Stealthy and Effective Physical-world Adversarial Attack by Natural Phenomenon
:star:code - Transferable Sparse Adversarial Attack
:star:code - Towards Efficient Data Free Black-Box Adversarial Attack
- Frequency-Driven Imperceptible Adversarial Attack on Semantic Similarity
:star:code - Stochastic Variance Reduced Ensemble Adversarial Attack for Boosting the Adversarial Transferability
:star:code
- Shadows can be Dangerous: Stealthy and Effective Physical-world Adversarial Attack by Natural Phenomenon
- 黑盒
- Investigating Top-k White-Box and Transferable Black-box Attack
:star:code - DST: Dynamic Substitute Training for Data-free Black-box Attack
:house:project - Bandits for Structure Perturbation-based Black-box Attacks to Graph Neural Networks with Theoretical Guarantees
:open_mouth:oral:star:code - Adversarial Eigen Attack on Black-Box Models
- Exploring Effective Data for Surrogate Training Towards Black-Box Attack
:star:code - Boosting Black-Box Attack With Partially Transferred Conditional Adversarial Distribution
:star:code
- Investigating Top-k White-Box and Transferable Black-box Attack
- 对抗训练
- LAS-AT: Adversarial Training with Learnable Attack Strategy
:open_mouth:oral:star:code
:newspaper:CVPR 2022 中科院、腾讯提出LAS-AT,利用“可学习攻击策略”进行“对抗训练”
- LAS-AT: Adversarial Training with Learnable Attack Strategy
39.Continual Learning(持续学习)
- On Generalizing Beyond Domains in Cross-Domain Continual Learning
- Probing Representation Forgetting in Supervised and Unsupervised Continual Learning
:star:code - Online Continual Learning on a Contaminated Data Stream with Blurry Task Boundaries
:star:code - Learning To Prompt for Continual Learning
:star:code - Learning Bayesian Sparse Networks With Full Experience Replay for Continual Learning
- Not Just Selection, but Exploration: Online Class-Incremental Continual Learning via Dual View Consistency
:star:code - Continual Learning for Visual Search With Backward Consistent Feature Embedding
:star:code - Meta-Attention for ViT-Backed Continual Learning
:star:code - Continual Learning with Lifelong Vision Transformer
:newspaper:解读 - DyTox: Transformers for Continual Learning With DYnamic TOken eXpansion
:star:code - GCR: Gradient Coreset Based Replay Buffer Selection for Continual Learning
:house:project
38.Meta-Learning(元学习)
- What Matters For Meta-Learning Vision Regression Tasks?
:star:code - Multidimensional Belief Quantification for Label-Efficient Meta-Learning
- Dynamic Kernel Selection for Improved Generalization and Memory Efficiency in Meta-learning
:star:code - Learning to Learn and Remember Super Long Multi-Domain Task Sequence
:open_mouth:oral:star:code
:newspaper:解读
37.Contrastive Learning(对比学习)
- Selective-Supervised Contrastive Learning with Noisy Labels
:star:code:newspaper:粗解 - Frame-wise Action Representations for Long Videos via Sequence Contrastive Learning
:star:code - Cam-Ready: UNICON: Combating Label Noise Through Uniform Selection and Contrastive Learning
:star:code - Use All The Labels: A Hierarchical Multi-Label Contrastive Learning Framework
:star:code - Crafting Better Contrastive Views for Siamese Representation Learning
:open_mouth:oral:star:code - Dual Temperature Helps Contrastive Learning Without Many Negative Samples: Towards Understanding and Simplifying MoCo
:star:code - Estimating Fine-Grained Noise Model via Contrastive Learning
- Contextual Outpainting With Object-Level Contrastive Learning
:house:project - Rethinking the Augmentation Module in Contrastive Learning: Learning Hierarchical Augmentation Invariance With Expanded Views
- Contrastive Dual Gating: Learning Sparse Features With Contrastive Learning
- Noise Is Also Useful: Negative Correlation-Steered Latent Contrastive Learning
- On Learning Contrastive Representations for Learning With Noisy Labels
- Unsupervised Deraining: Where Contrastive Learning Meets Self-Similarity
- Robust Contrastive Learning Against Noisy Views
:star:code - Unified Contrastive Learning in Image-Text-Label Space
:star:code - Consistent Explanations by Contrastive Learning
:star:code - Rethinking Minimal Sufficient Representation in Contrastive Learning
:star:code - Contrastive Learning for Space-Time Correspondence via Self-Cycle Consistency
- M5Product: Self-harmonized Contrastive Learning for E-commercial Multi-modal Pretraining
:sunflower:dataset - Multi-Marginal Contrastive Learning for Multi-Label Subcellular Protein Localization
:star:code - Unpaired Deep Image Deraining Using Dual Contrastive Learning
:star:code:house:project
36.Optical Flow(光流估计)
- CRAFT: Cross-Attentional Flow Transformer for Robust Optical Flow
:star:code - DIP: Deep Inverse Patchmatch for High-Resolution Optical Flow
:star:code - Imposing Consistency for Optical Flow Estimation
- Deep Equilibrium Optical Flow Estimation
:star:code:newspaper:解读 - GMFlow: Learning Optical Flow via Global Matching
:open_mouth:oral:star:code:newspaper:解读 - Optical Flow Estimation for Spiking Camera
:star:code - Learning Optical Flow with Kernel Patch Attention
:star:code:newspaper:解读 - CamLiFlow: Bidirectional Camera-LiDAR Fusion for Joint Optical Flow and Scene Flow Estimation
:star:code - Global Matching With Overlapping Attention for Optical Flow Estimation
:star:code - Towards Understanding Adversarial Robustness of Optical Flow Networks
:star:code
35.OCR
- XYLayoutLM: Towards Layout-Aware Multimodal Networks for Visually-Rich Document Understanding
- SwinTextSpotter: Scene Text Spotting via Better Synergy Between Text Detection and Text Recognition
:star:code - 场景文本检测
- Towards End-to-End Unified Scene Text Detection and Layout Analysis
:star:code - Pushing the Performance Limit of Scene Text Recognizer without Human Annotation
- Vision-Language Pre-Training for Boosting Scene Text Detectors
:star:code
视觉语言预训练,场景文本检测,代码将开源,地址尚未公布。 - Few Could Be Better Than All: Feature Sampling and Grouping for Scene Text Detection
- Towards End-to-End Unified Scene Text Detection and Layout Analysis
- 场景文本识别
- Text Spotting
- LOGO设计
- 字体生成
- 文本识别
- 表格结构识别
- 文本美观预测评估
- 表结构理解
- 文本分割
- 表格检测
- 文本修复
- 手写数学表达式识别
34.Model Compression/Knowledge Distillation/Pruning(模型压缩/知识蒸馏/剪枝)
- 知识蒸馏
- Knowledge Distillation with the Reused Teacher Classifier
- DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers
:newspaper:解读 - Decoupled Knowledge Distillation
:star:code
:newspaper:解耦知识蒸馏,让Hinton在7年前提出的方法重回SOTA行列 - Knowledge Distillation via the Target-aware Transformer
:open_mouth:oral:star:code
:newspaper:RMIT&阿里&UTS&中山提出Target-aware Transformer,进行one-to-all知识蒸馏!性能SOTA - Evaluation-oriented Knowledge Distillation for Deep Face Recognition
:open_mouth:oral:star:code
:newspaper:解读1
:newspaper:解读2 - Open-Vocabulary One-Stage Detection With Hierarchical Visual-Language Knowledge Distillation
:star:code - Self-Distillation From the Last Mini-Batch for Consistency Regularization
:star:code - Knowledge Distillation As Efficient Pre-Training: Faster Convergence, Higher Data-Efficiency, and Better Transferability
:star:code - Knowledge Distillation: A Good Teacher Is Patient and Consistent
- PCA-Based Knowledge Distillation Towards Lightweight and Content-Style Balanced Photorealistic Style Transfer Models
:star:code - Structural and Statistical Texture Knowledge Distillation for Semantic Segmentation
- 模型压缩
- 剪枝
- Revisiting Random Channel Pruning for Neural Network Compression
:star:code
:newspaper:解读 - Fire Together Wire Together: A Dynamic Pruning Approach With Self-Supervised Mask Prediction
- When To Prune? A Policy Towards Early Structural Pruning
- Interspace Pruning: Using Adaptive Filter Representations To Improve Training of Sparse CNNs
- Revisiting Random Channel Pruning for Neural Network Compression
- 量化
- A Deeper Dive Into What Deep Spatiotemporal Networks Encode: Quantifying Static vs. Dynamic Information
:star:code:house:project - Mr.BiQ: Post-Training Non-Uniform Quantization Based on Minimizing the Reconstruction Error
- Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation
:star:code - AlignQ: Alignment Quantization With ADMM-Based Correlation Preservation
:star:code - Data-Free Network Compression via Parametric Non-Uniform Mixed Precision Quantization
- Mutual Quantization for Cross-Modal Search With Noisy Labels
- Instance-Aware Dynamic Neural Network Quantization
:star:code - IntraQ: Learning Synthetic Images With Intra-Class Heterogeneity for Zero-Shot Network Quantization
:star:code - Learnable Lookup Table for Neural Network Quantization
- Channel Balancing for Accurate Quantization of Winograd Convolutions
- A Deeper Dive Into What Deep Spatiotemporal Networks Encode: Quantifying Static vs. Dynamic Information
- 超参数优化
33.Human-Object Interaction(人物交互)
- HOI4D: A 4D Egocentric Dataset for Category-Level Human-Object Interaction
:star:code - MSTR: Multi-Scale Transformer for End-to-End Human-Object Interaction Detection
- GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection
:star:code - Distillation Using Oracle Queries for Transformer-Based Human-Object Interaction Detection
:star:code - OakInk: A Large-scale Knowledge Repository for Understanding Hand-Object Interaction
:star:code
:newspaper:粗解 - D-Grasp: Physically Plausible Dynamic Grasp Synthesis for Hand-Object Interactions
:house:code - Learning Transferable Human-Object Interaction Detector With Natural Language Supervision
:star:code - What to look at and where: Semantic and Spatial Refined Transformer for detecting human-object interactions
:open_mouth:oral - Human-Object Interaction Detection via Disentangled Transformer
- Consistency Learning via Decoding Path Augmentation for Transformers in Human Object Interaction Detection
:star:code:newspaper:解读 - Interactiveness Field in Human-Object Interactions
:star:code - Stability-driven Contact Reconstruction From Monocular Color Images
:star:code
单目彩色图像的手物交互重建,人机交互 - Interactiveness Field of Human-Object Interactions
:star:code
:newspaper:粗解 - Exploring Structure-aware Transformer over Interaction Proposals for Human-Object Interaction Detection
:star:code
:newspaper:解读1
:newspaper:解读2 - Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions
:open_mouth:oral:star:code - Efficient Two-Stage Detection of Human-Object Interactions With a Novel Unary-Pairwise Transformer
:house:project - NeuralHOFusion: Neural Volumetric Rendering Under Human-Object Interactions
- Category-Aware Transformer Network for Better Human-Object Interaction Detection
- HOI跟踪
32.Data Augmentation(数据增强)
- 🐦️AlignMix: Improving representation by interpolating aligned features
- 3D Common Corruptions and Data Augmentation
:star:code:house:project:tv:video:newspaper:粗解 - Kubric: A scalable dataset generator
:star:code - Robust Optimization As Data Augmentation for Large-Scale Graphs
:star:code - AIM: an Auto-Augmenter for Images and Meshes
:star:code - Boosting Robustness of Image Matting With Context Assembling and Strong Data Augmentation
:house:project - TeachAugment: Data Augmentation Optimization Using Teacher Knowledge
:open_mouth:oral:star:code
31.Vision-Language(视觉语言)
- Unsupervised Vision-Language Parsing: Seamlessly Bridging Visual Scene Graphs with Language Structures via Dependency Relationships
:star:code - VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers
:star:code - Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality
:sunflower:dataset - Robust Cross-Modal Representation Learning with Progressive Self-Distillation
- Prompt Distribution Learning
在下游的识别任务中,作者提出的方法在12个数据集上均展示出了一致性的性能提升。 - Vision-Language Pre-Training with Triple Contrastive Learning
:star:code - Improving features Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning
:star:code
:newspaper:国科大&港中文提出带视觉语言验证和迭代推理的Visual Grounding框架,性能SOTA,代码已开源! - Towards General Purpose Vision Systems: An End-to-End Task-Agnostic Vision-Language Architecture
:star:code:house:project - VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks
:star:code - Lite-MDETR: A Lightweight Multi-Modal Detector
- Align and Prompt: Video-and-Language Pre-Training With Entity Prompts
:star:code - Unsupervised Vision-and-Language Pre-Training via Retrieval-Based Multi-Granular Alignment
- RegionCLIP: Region-based Language-Image Pretraining(https://github.com/microsoft/RegionCLIP)
- Grounded Language-Image Pre-Training
:star:code - Advancing High-Resolution Video-Language Representation With Large-Scale Video Transcriptions
:star:code - Conditional Prompt Learning for Vision-Language Models
:star:code - Multi-Modal Alignment Using Representation Codebook
- NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks
:open_mouth:oral:star:code - An Empirical Study of Training End-to-End Vision-and-Language Transformers
:star:code - DenseCLIP: Language-Guided Dense Prediction With Context-Aware Prompting
:star:code - FashionVLP: Vision Language Transformer for Fashion Retrieval With Feedback
:star:code:house:project - CLIP-Event: Connecting Text and Images With Event Structures
:star:code - Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model
:star:code - VLN
- EnvEdit: Environment Editing for Vision-and-Language Navigation
:star:code - Counterfactual Cycle-Consistent Learning for Instruction Following and Generation in Vision-Language Navigation
:star:code - Reinforced Structured State-Evolution for Vision-Language Navigation
:star:code:newspaper:解读 - Cross-modal Map Learning for Vision and Language Navigation
:star:code:house:project - One Step at a Time: Long-Horizon Vision-and-Language Navigation With Milestones
- What do navigation agents learn about their environment?
- Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation
:star:code - ADAPT: Vision-Language Navigation With Modality-Aligned Action Prompts
- HOP: History-and-Order Aware Pre-Training for Vision-and-Language Navigation
:star:code
- EnvEdit: Environment Editing for Vision-and-Language Navigation
- 视频-文本表示学习
- 视觉表征学习
- 视觉导航
- 视觉描述
30.Visual Answer Questions(视觉问答)
- VQA
- SimVQA: Exploring Simulated Environments for Visual Question Answering
:house:project - SwapMix: Diagnosing and Regularizing the Over-Reliance on Visual Context in Visual Question Answering
:star:code:newspaper:粗解 - V-Doc: Visual Questions Answers With Documents
:star:code - Grounding Answers for Visual Questions Asked by Visually Impaired People
:house:project - Query and Attention Augmentation for Knowledge-Based Explainable Reasoning
:star:code - MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-Based Visual Question Answering
:star:code - Transform-Retrieve-Generate: Natural Language-Centric Outside-Knowledge Visual Question Answering
- LaTr: Layout-Aware Transformer for Scene-Text VQA
- WebQA: Multihop and Multimodal QA
- SimVQA: Exploring Simulated Environments for Visual Question Answering
- AVQA
- Learning to Answer Questions in Dynamic Audio-Visual Scenarios
:open_mouth:oral:star:code
:newspaper:CVPR 2022 Oral | 人大高瓴AI学院提出面向动态视音场景的问答学习任务 - Dual-Key Multimodal Backdoors for Visual Question Answering
:star:code - Maintaining Reasoning Consistency in Compositional Visual Question Answering
:star:code - From Representation to Reasoning: Towards Both Evidence and Commonsense Reasoning for Video Question-Answering
:star:code
- Learning to Answer Questions in Dynamic Audio-Visual Scenarios
- Video-QA
- Measuring Compositional Consistency for Video Question Answering
- Invariant Grounding for Video Question Answering
:open_mouth:oral:star:code:newspaper:解读
29.SLAM/Augmented Reality/Virtual Reality/Robotics(增强/虚拟现实/机器人)
- SLAM
- 目标导航
- try-on
- Dressing in the Wild by Watching Dance Videos
:house:project - Style-Based Global Appearance Flow for Virtual Try-On
:star:code - ClothFormer:Taming Video Virtual Try-on in All Module
:open_mouth:oral:star:code:house:project:newspaper:解读 - Weakly Supervised High-Fidelity Clothing Model Generation
- Full-Range Virtual Try-On With Recurrent Tri-Level Transform
:house:project - ClothFormer: Taming Video Virtual Try-On in All Module
:open_mouth:oral:star:code
:newspaper:解读
- Dressing in the Wild by Watching Dance Videos
- AR
- Episodic Memory Question Answering
:open_mouth:oral:star:code
AI助理:情景记忆问答 (增强现实新任务,数据及代码均将开源)
- Episodic Memory Question Answering
- 机器人
- 机器人导航
28.Style Transfer(风格迁移)
- Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer
:star:code - Industrial Style Transfer with Large-scale Geometric Warping and Content Preservation
:star:code - Exact Feature Distribution Matching for Arbitrary Style Transfer and Domain Generalization
:open_mouth:oral:star:code - HEAT: Holistic Edge Attention Transformer for Structured Reconstruction
:star:code - StyTr2: Image Style Transfer With Transformers
:star:code - CLIPstyler: Image Style Transfer With a Single Text Condition
:star:code - 运动风格迁移
- 运动迁移
- Structure-Aware Motion Transfer with Deformable Anchor Model
:star:code:newspaper:解读
- Structure-Aware Motion Transfer with Deformable Anchor Model
- 场景风格化
- 外观迁移
- Splicing ViT Features for Semantic Appearance Transfer
:open_mouth:oral:star:code:house:project
- Splicing ViT Features for Semantic Appearance Transfer
- 风格化
27.Pose Estimation(物体姿势估计)
- OSOP: A Multi-Stage One Shot Object Pose Estimation Framework
- OnePose: One-Shot Object Pose Estimation without CAD Models
:star:code:house:project:newspaper:解读 - ABPN: Adaptive Blend Pyramid Network for Real-Time Local Retouching of Ultra High-Resolution Photo
- On the Instability of Relative Pose Estimation and RANSAC's Role
- SurfEmb: Dense and Continuous Correspondence Distributions for Object Pose Estimation With Learnt Surface Embeddings
:star:code:house:project - ConDor: Self-Supervised Canonicalization of 3D Pose for Partial Shapes
:star:code:house:project:tv:video - GPV-Pose: Category-Level Object Pose Estimation via Geometry-Guided Point-Wise Voting
- UDA-COPE: Unsupervised Domain Adaptation for Category-Level Object Pose Estimation
- 4D
- Revealing Occlusions with 4D Neural Fields
:open_mouth:oral:star:code:house:project - Ego4D: Around the World in 3,000 Hours of Egocentric Video
:star:code
- Revealing Occlusions with 4D Neural Fields
- 9D
- 单目目标姿势估计
- 6D
- RNNPose: Recurrent 6-DoF Object Pose Refinement with Robust Correspondence Field Estimation and Pose Optimization
:star:code - FS6D: Few-Shot 6D Pose Estimation of Novel Objects
:star:code:house:project:newspaper:解读 - Uni6D: A Unified CNN Framework without Projection Breakdown for 6D Pose Estimation
- ES6D: A Computation Efficient and Symmetry-Aware 6D Pose Regression Framework
:star:code - Focal Length and Object Pose Estimation via Render and Compare
:star:code:house:project:newspaper:解读 - DGECN: A Depth-Guided Edge Convolutional Network for End-to-End 6D Pose Estimation
:star:code:house:project:newspaper:解读 - Coupled Iterative Refinement for 6D Multi-Object Pose Estimation
:star:code:newspaper:解读 - ZebraPose: Coarse To Fine Surface Encoding for 6DoF Object Pose Estimation
:star:code - Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation
:star:code - OVE6D: Object Viewpoint Encoding for Depth-Based 6D Object Pose Estimation
:star:code - SAR-Net: Shape Alignment and Recovery Network for Category-Level 6D Object Pose and Size Estimation
:star:code:house:project
- RNNPose: Recurrent 6-DoF Object Pose Refinement with Robust Correspondence Field Estimation and Pose Optimization
- 3D Object Articulation
- 3Dope
26.GCN/GNN
- GNN
- 🐦️Lifelong Graph Learning
:star:code - AEGNN: Asynchronous Event-based Graph Neural Networks
:star:code:house:project - "The Pedestrian next to the Lamppost" Adaptive Object Graphs for Better Instantaneous Mapping
- OrphicX: A Causality-Inspired Latent Variable Model for Interpreting Graph Neural Networks
:open_mouth:oral:star:code - ClusterGNN: Cluster-Based Coarse-To-Fine Graph Neural Network for Efficient Feature Matching
- 🐦️Lifelong Graph Learning
25.Fine-Grained/Image Classification(细粒度/图像分类)
- Multimodal Dynamics: Dynamical Fusion for Trustworthy Multimodal Classification
- A Voxel Graph CNN for Object Classification with Event Cameras
- Multi-Modal Extreme Classification
:star:code - 细粒度分类
- 图像分类
- Label Relation Graphs Enhanced Hierarchical Residual Network for Hierarchical Multi-Granularity Classification
:star:code - DTFD-MIL: Double-Tier Feature Distillation Multiple Instance Learning for Histopathology Whole Slide Image Classification
:star:code - Contrastive Test-Time Adaptation
:house:project - A Comprehensive Study of Image Classification Model Sensitivity to Foregrounds, Backgrounds, and Visual Attributes
- VisCUIT: Visual Auditor for Bias in CNN Image Classifier
:tv:video - Multi-Label Iterated Learning for Image Classification With Label Ambiguity
:star:code - Efficient Classification of Very Large Images With Tiny Objects
- Node-Aligned Graph Convolutional Network for Whole-Slide Image Representation and Classification
:star:code - Deformable ProtoPNet: An Interpretable Image Classifier Using Deformable Prototypes
:star:code
- Label Relation Graphs Enhanced Hierarchical Residual Network for Hierarchical Multi-Granularity Classification
- 小样本分类
- CAD: Co-Adapting Discriminative Features for Improved Few-Shot Classification
- Matching Feature Sets for Few-Shot Image Classification
:star:code:house:project:tv:video - Joint Distribution Matters: Deep Brownian Distance Covariance for Few-Shot Classification
:open_mouth:oral:star:code:house:project:newspaper:解读 - Learning to Affiliate: Mutual Centralized Learning for Few-shot Classification
:newspaper:解读 - Generating Representative Samples for Few-Shot Classification
:star:code
:newspaper:粗解
在小样本分类问题中,通过生成更多代表性样本,去除非代表性样本,改善了分类结果。实现了SOTA的结果。 - Improving Adversarially Robust Few-Shot Image Classification With Generalizable Representations
- Task Discrepancy Maximization for Fine-Grained Few-Shot Classification
- 小样本分类与分割(FS-CS)
- 长尾识别
- Nested Collaborative Learning for Long-Tailed Visual Recognition
:star:code - Long-Tailed Recognition via Weight Balancing
:star:code - Targeted Supervised Contrastive Learning for Long-Tailed Recognition
- Long-Tail Recognition via Compositional Knowledge Transfer
- RelTransformer: A Transformer-Based Long-Tail Visual Relationship Recognition
:star:code - Trustworthy Long-Tailed Classification
:star:code - Balanced Contrastive Learning for Long-Tailed Visual Recognition
- The Majority Can Help the Minority: Context-Rich Minority Oversampling for Long-Tailed Classification
:star:code - Retrieval Augmented Classification for Long-Tail Visual Recognition
- Long-Tailed Visual Recognition via Gaussian Clouded Logit Adjustment
:star:code
- Nested Collaborative Learning for Long-Tailed Visual Recognition
- 细粒度识别
- Knowledge Mining with Scene Text for Fine-Grained Recognition
:star:code:newspaper:解读
- Knowledge Mining with Scene Text for Fine-Grained Recognition
- 多标签分类
- 类不平衡分类
- 图像-文本多模态分类
24.Super-Resolution(超分辨率)
- Learning Graph Regularisation for Guided Super-Resolution
:star:code - Self-Supervised Super-Resolution for Multi-Exposure Push-Frame Satellites
:star:code:house:project:newspaper:解读 - Deep Constrained Least Squares for Blind Image Super-Resolution
:star:code:newspaper:解读 - Discrete Cosine Transform Network for Guided Depth Map Super-Resolution
:open_mouth:oral:star:code - Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution
:star:code - LAR-SR: A Local Autoregressive Model for Image Super-Resolution
- VideoINR: Learning Video Implicit Neural Representation for Continuous Space-Time Super-Resolution
:star:code - Blind Image Super-Resolution With Elaborate Degradation Modeling on Noise and Kernel
:star:code - Dual Adversarial Adaptation for Cross-Device Real-World Image Super-Resolution
:star:code - SphereSR: 360deg Image Super-Resolution With Arbitrary Projection via Continuous Spherical Image Representation
- Reflash Dropout in Image Super-Resolution
- GCFSR: A Generative and Controllable Face Super Resolution Method Without Facial and GAN Priors
:star:code - Learning the Degradation Distribution for Blind Image Super-Resolution
:star:code - Texture-Based Error Analysis for Image Super-Resolution
- A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-Resolution
:star:code - Task Decoupled Framework for Reference-Based Super-Resolution
- MNSRNet: Multimodal Transformer Network for 3D Surface Super-Resolution
:star:code - VSR
- Stable Long-Term Recurrent Video Super-Resolution
- Reference-based Video Super-Resolution Using Multi-Camera Video Triplets
:star:code - Learning Trajectory-Aware Transformer for Video Super-Resolution
:open_mouth:oral:star:code - Investigating Tradeoffs in Real-World Video Super-Resolution
:star:code:newspaper:解读 - BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment
:star:code:house:project:tv:video
🏆NTIRE 2021年视频修复和增强挑战赛冠军 - Look Back and Forth: Video Super-Resolution with Explicit Temporal Difference Modeling
:newspaper:ETDM:基于显式时间差分建模的视频超分辨率 - Memory-Augmented Non-Local Attention for Video Super-Resolution
:star:code:newspaper:解读 - Spatial-Temporal Space Hand-in-Hand: Spatial-Temporal Video Super-Resolution via Cycle-Projected Mutual Learning
:star:code
:newspaper:解读 - RSTT: Real-Time Spatial Temporal Transformer for Space-Time Video Super-Resolution
:star:code
23.Image Retrieval(图像检索)
- Sketching without Worrying: Noise-Tolerant Sketch-Based Image Retrieval
:star:code - Correlation Verification for Image Retrieval
:open_mouth:oral:star:code - Sketch3T: Test-Time Training for Zero-Shot SBIR
- Beyond Cross-view Image Retrieval: Highly Accurate Vehicle Localization Using Satellite Image
:star:code - Forward Compatible Training for Large-Scale Embedding Retrieval Systems
:star:code - Contextual Similarity Distillation for Asymmetric Image Retrieval
- Object-Aware Video-Language Pre-Training for Retrieval
:star:code - Effective Conditioned and Composed Image Retrieval Combining CLIP-Based Features
- 视频检索
- 文本-视频检索
- 跨模太检索
- ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval
- Cross Modal Retrieval With Querybank Normalisation
:star:code:house:project - EI-CLIP: Entity-Aware Interventional Contrastive Learning for E-Commerce Cross-Modal Retrieval
- COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval
- 手语视频检索
22.Image Synthesis/Generation(图像合成)
- Interactive Image Synthesis with Panoptic Layout Generation
:star:code - Autoregressive Image Generation using Residual Quantization
:star:code:newspaper:粗解 - GIRAFFE HD: A High-Resolution 3D-aware Generative Model
- Arbitrary-Scale Image Synthesis
:star:code:newspaper:粗解 - Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis
:star:code:newspaper:解读 - Neural Texture Extraction and Distribution for Controllable Person Image Synthesis
:star:code - Unpaired Cartoon Image Synthesis via Gated Cycle Mapping
- 3D Scene Painting via Semantic Image Synthesis
- 3D-Aware Image Synthesis via Learning Structural and Textural Representations
:star:code:house:project:tv:video - High-Resolution Image Synthesis With Latent Diffusion Models
:star:code - Retrieval-Based Spatially Adaptive Normalization for Semantic Image Synthesis
:star:code - DPGEN: Differentially Private Generative Energy-Guided Network for Natural Image Synthesis
:star:code - Cluster-Guided Image Synthesis With Unconditional Models
- Day-to-Night Image Synthesis for Training Nighttime Neural ISPs
:open_mouth:oral:star:code - Semantic-Shape Adaptive Feature Modulation for Semantic Image Synthesis
:star:code - Modulated Contrast for Versatile Image Synthesis
:star:code - 文本引导的图像处理
- 姿势引导的图像合成
- 文本到图像合成
- StyleT2I: Toward Compositional and High-Fidelity Text-to-Image Synthesis
- Text-to-Image Synthesis based on Object-Guided Joint-Decoding Transformer
:newspaper:解读 - LAFITE: Towards Language-Free Training for Text-to-Image Generation
:star:code - DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis
:open_mouth:oral:star:code - Text to Image Generation With Semantic-Spatial Aware GAN
:star:code - Vector Quantized Diffusion Model for Text-to-Image Synthesis
:star:code
- 图像翻译
- 图像生成
- Marginal Contrastive Correspondence for Guided Image Generation
:open_mouth:oral - OSSGAN: Open-Set Semi-Supervised Image Generation
:star:code - A Closer Look at Few-shot Image Generation
- Modeling Image Composition for Complex Scene Generation
:star:code
:newspaper:解读 - Local Attention Pyramid for Scene Image Generation
- GRAM: Generative Radiance Manifolds for 3D-Aware Image Generation
:house:project - MaskGIT: Masked Generative Image Transformer
- Attribute Group Editing for Reliable Few-Shot Image Generation
:star:code - Learning to Memorize Feature Hallucination for One-Shot Image Generation
:newspaper:解读 - StyleSwin: Transformer-Based GAN for High-Resolution Image Generation
:star:code - Global Context With Discrete Diffusion in Vector Quantised Modelling for Image Generation
- Marginal Contrastive Correspondence for Guided Image Generation
- 图像到本文
- 文本-形状生成
- 图像-视频生成
- 基于文本的目标生成
- 人物图像生成
- 图像-文本匹配
- 图像和文本之间的双向生成
21.UAV/Remote Sensing/Satellite Image(无人机/遥感/卫星图像)
- CVNet: Contour Vibration Network for Building Extraction
:star:code - CrossLoc: Scalable Aerial Localization Assisted by Multimodal Synthetic Data
:house:project - Self-Supervised Material and Texture Representation Learning for Remote Sensing Tasks
:star:code - 遥感图像融合
- 航空图像分割
- 航空影像检测
- 卫星影像
20.Autonomous vehicles(自动驾驶)
- 自动驾驶
- Image-to-Lidar Self-Supervised Distillation for Autonomous Driving Data
:star:code - Exploiting Temporal Relations on Radar Perception for Autonomous Driving
- COOPERNAUT: End-to-End Driving with Cooperative Perception for Networked Vehicles
:star:code:house:project:newspaper:解读 - Generating Useful Accident-Prone Driving Scenarios via a Learned Traffic Prior
:house:project - Learning From All Vehicles
:star:code - Time3D: End-to-End Joint Monocular 3D Object Detection and Tracking for Autonomous Driving
- Unifying Panoptic Segmentation for Autonomous Driving
- Investigating the Impact of Multi-LiDAR Placement on Object Detection for Autonomous Driving
:star:code - On Adversarial Robustness of Trajectory Prediction for Autonomous Vehicles
:star:code
- Image-to-Lidar Self-Supervised Distillation for Autonomous Driving Data
- 车道线检测
- Rethinking Efficient Lane Detection via Curve Modeling
:star:code:newspaper:粗解
:notebook: - Towards Driving-Oriented Metric for Lane Detection Models
- A Keypoint-based Global Association Network for Lane Detection
:star:code:newspaper:解读 - 单目3D车道检测
- ONCE-3DLanes: Building Monocular 3D Lane Detection
:star:code
车道线检测技术再演进
- ONCE-3DLanes: Building Monocular 3D Lane Detection
- Rethinking Efficient Lane Detection via Curve Modeling
- 车道线描述
- 自动驾驶场景重新照明
- 行人轨迹预测
- 轨迹预测
- MUSE-VAE: Multi-Scale VAE for Environment-Aware Long Term Trajectory Prediction
- Remember Intentions: Retrospective-Memory-Based Trajectory Prediction
:star:code - LTP: Lane-Based Trajectory Prediction for Autonomous Driving
- Vehicle trajectory prediction works, but not everywhere
:house:project - End-to-End Trajectory Distribution Prediction Based on Occupancy Grid Maps
:star:code - Whose Track Is It Anyway? Improving Robustness to Tracking Errors With Affinity-Based Trajectory Prediction
- Adaptive Trajectory Prediction via Transferable GNN
- M2I: From Factored Marginal Trajectory Prediction to Interactive Prediction
- GroupNet: Multiscale Hypergraph Neural Networks for Trajectory Prediction With Relational Reasoning
:star:code - Towards Robust and Adaptive Motion Forecasting: A Causal Representation Perspective
:star:code - ScePT: Scene-Consistent, Policy-Based Trajectory Predictions for Planning
:star:code
- 车辆检测
19.Neural Architecture Search(神经架构搜索)
- 🐦️ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior
:star:code - Arch-Graph: Acyclic Architecture Relation Predictor for Task-Transferable Neural Architecture Search
:star:code:newspaper:解读 - GPUNet: Searching the Deployable Convolution Neural Networks for GPUs
神经架构搜索,面向GPUs部署的轻量级网络结构搜索 (比谷歌EfficientNet-X系列、Meta FBNetV3 速度更快,甚至性能都要好,作者来自英伟达) - Distribution Consistent Neural Architecture Search
- Performance-Aware Mutual Knowledge Distillation for Improving Neural Architecture Search
- BaLeNAS: Differentiable Architecture Search via the Bayesian Learning Rule
- GreedyNASv2: Greedier Search With a Greedy Path Filter
- Global Convergence of MAML and Theory-Inspired Neural Architecture Search for Few-Shot Learning
:star:code - Neural Architecture Search with Representation Mutual Information
:star:code - Demystifying the Neural Tangent Kernel From a Practical Perspective: Can It Be Trusted for Neural Architecture Search Without Training?
:star:code - b-DARTS: Beta-Decay Regularization for Differentiable Architecture Search
:star:code - Shapley-NAS: Discovering Operation Contribution for Neural Architecture Search
:star:code
18.Person Re-Identification(人员重识别)
- 组重识别
- Reid
- Part-based Pseudo Label Refinement for Unsupervised Person Re-identification
:star:code - Camera-Conditioned Stable Feature Generation for Isolated Camera Supervised Person Re-IDentification
:star:code - FMCNet: Feature-Level Modality Compensation for Visible-Infrared Person Re-Identification
- Large-Scale Pre-training for Person Re-identification with Noisy Labels
:star:code - Cloning Outfits from Real-World Images to 3D Characters for Generalizable Person Re-Identification
:star:code - Implicit Sample Extension for Unsupervised Person Re-Identification
:star:code:newspaper:解读 - Graph Sampling Based Deep Metric Learning for Generalizable Person Re-Identification
:star:code - NFormer: Robust Person Re-identification with Neighbor Transformer
:star:code:newspaper:解读 - Dual Cross-Attention Learning for Fine-Grained Visual Categorization and Object Re-Identification
- Unleashing Potential of Unsupervised Pre-Training With Intra-Identity Regularization for Person Re-Identification
- Learning With Twin Noisy Labels for Visible-Infrared Person Re-Identification
:star:code - Lifelong Unsupervised Domain Adaptive Person Re-Identification With Coordinated Anti-Forgetting and Adaptation
:house:project - Learning Memory-Augmented Unidirectional Metrics for Cross-Modality Person Re-Identification
- Augmented Geometric Distillation for Data-Free Incremental Person ReID
:star:code - Salient-to-Broad Transition for Video Person Re-Identification
:star:code - Learning Modal-Invariant and Temporal-Memory for Video-Based Visible-Infrared Person Re-Identification
:star:code - Meta Distribution Alignment for Generalizable Person Re-Identification
:star:code - AutoLoss-GMS: Searching Generalized Margin-Based Softmax Loss Function for Person Re-Identification
- Temporal Complementarity-Guided Reinforcement Learning for Image-to-Video Person Re-Identification
- Id-Free Person Similarity Learning
- 换装行人重识别
- 遮挡行人重识别
- Part-based Pseudo Label Refinement for Unsupervised Person Re-identification
- 人群计数
- 行人检测
- 步态识别
- Person Search
17.Medical Image(医学影像)
- Temporal Context Matters: Enhancing Single Image Prediction with Disease Progression Representations
:open_mouth:oral - BoostMIS: Boosting Medical Image Semi-supervised Learning with Adaptive Pseudo Labeling and Informative Active Annotation
:star:code - DeepLIIF: An Online Platform for Quantification of Clinical Pathology Slides
- DiRA: Discriminative, Restorative, and Adversarial Learning for Self-supervised Medical Image Analysis
:star:code:newspaper:解读 - Surpassing the Human Accuracy: Detecting Gallbladder Cancer from USG Images with Curriculum Learning
:star:code:house:project - What Makes Transfer Learning Work for Medical Images: Feature Reuse & Other Factors
- ImplicitAtlas: Learning Deformable Shape Templates in Medical Imaging
- Robust Equivariant Imaging: A Fully Unsupervised Framework for Learning To Image From Noisy and Partial Measurements
:star:code - ContIG: Self-Supervised Multimodal Contrastive Learning for Medical Imaging With Genetics
:star:code - 3D生物打印
- Generating 3D Bio-Printable Patches Using Wound Segmentation and Reconstruction to Treat Diabetic Foot Ulcers
利用伤口分割和重建生成3D生物打印贴片来治疗糖尿病足溃疡
- Generating 3D Bio-Printable Patches Using Wound Segmentation and Reconstruction to Treat Diabetic Foot Ulcers
- SR(MRI)
- 医学图像分割
- CycleMix: A Holistic Strategy for Medical Image Segmentation From Scribble Supervision
:star:code - C-CAM: Causal CAM for Weakly Supervised Semantic Segmentation on Medical Image
:star:code - HyperSegNAS: Bridging One-Shot Neural Architecture Search With 3D Medical Image Segmentation Using HyperNet
- Closing the Generalization Gap of Cross-Silo Federated Medical Image Segmentation
:star:code - Rethinking Bayesian Deep Learning Methods for Semi-Supervised Volumetric Medical Image Segmentation
:star:code
- CycleMix: A Holistic Strategy for Medical Image Segmentation From Scribble Supervision
- 医学图像配准
- 医学图像分析
- 自动生成报告
- 医学图像分类
- CT合成
- 医学影像关键点检测
- MRI
- 组织病理学
- 牙齿
- 3D医学分析
- 三维牙齿实例分割
- 疟疾检测
16.Semi/self-supervised learning(半/自监督)
- 自监督
- A study on the distribution of social biases in self-supervised learning visual models
:star:code - Learning Where to Learn in Cross-View Self-Supervised Learning
:star:code - Leverage Your Local and Global Representations: A New Self-Supervised Learning Strategy
:star:code - DATA: Domain-Aware and Task-Aware Self-Supervised Learning
:star:code - Contextualized Spatio-Temporal Contrastive Learning With Self-Supervision
:star:code - Self-Supervised Spatial Reasoning on Multi-View Line Drawings
:star:code:house:project - Self-Supervised Models Are Continual Learners
:star:code - Learning Pixel Trajectories With Multiscale Contrastive Random Walks
:star:code:house:project - Locality-Aware Inter- and Intra-Video Reconstruction for Self-Supervised Correspondence Learning
:star:code - Backdoor Attacks on Self-Supervised Learning
:star:code - Neural Shape Mating: Self-Supervised Object Assembly With Adversarial Shape Priors
:house:project - Masked Feature Prediction for Self-Supervised Visual Pre-Training
:star:code - Semantic-Aware Auto-Encoders for Self-Supervised Representation Learning
:star:code - Patch-Level Representation Learning for Self-Supervised Vision Transformers
- A Simple Data Mixing Prior for Improving Self-Supervised Learning
:star:code - Sound and Visual Representation Learning With Multiple Pretraining Tasks
- Align Representations With Base: A New Approach to Self-Supervised Learning
- UniVIP: A Unified Framework for Self-Supervised Visual Pre-Training
- Exploring the Equivalence of Siamese Self-Supervised Learning via a Unified Gradient Framework
:star:code - SLIC: Self-Supervised Learning With Iterative Clustering for Human Action Videos
- Exploring Set Similarity for Dense Self-Supervised Representation Learning
- A study on the distribution of social biases in self-supervised learning visual models
- 无监督
- RIM-Net: Recursive Implicit Fields for Unsupervised Learning of Hierarchical Shape Structures
- RM-Depth: Unsupervised Learning of Recurrent Monocular Depth in Dynamic Scenes
:star:code - Harmony: A Generic Unsupervised Approach for Disentangling Semantic Content From Parameterized Transformations
- Unsupervised Representation Learning for Binary Networks by Joint Classifier Learning
:star:code - PUMP: Pyramidal and Uniqueness Matching Priors for Unsupervised Learning of Local Descriptors
:star:code - Beyond Supervised vs. Unsupervised: Representative Benchmarking and Analysis of Image Representation Learning
- Unsupervised Learning of Debiased Representations With Pseudo-Attributes
:star:code
- 半监督
- Class-Aware Contrastive Semi-Supervised Learning
:star:code
:newspaper:解读 - RSCFed: Random Sampling Consensus Federated Semi-supervised Learning
:star:code - FisherMatch: Semi-Supervised Rotation Regression via Entropy-based Filtering
:open_mouth:oral:house:project - Semi-Supervised Learning of Semantic Correspondence with Pseudo-Labels
- SimMatch: Semi-Supervised Learning With Similarity Matching
:star:code - CoSSL: Co-Learning of Representation and Classifier for Imbalanced Semi-Supervised Learning
:star:code - DASO: Distribution-Aware Semantics-Oriented Pseudo-Label for Imbalanced Semi-Supervised Learning
:star:code:house:project - Semi-Weakly-Supervised Learning of Complex Actions From Instructional Task Videos
:star:code - Towards Discovering the Effectiveness of Moderately Confident Samples for Semi-Supervised Learning
- Safe-Student for Safe Deep Semi-Supervised Learning With Unseen-Class Unlabeled Data
- DC-SSL: Addressing Mismatched Class Distribution in Semi-Supervised Learning
- Class-Aware Contrastive Semi-Supervised Learning
- 弱监督
- P3IV: Probabilistic Procedure Planning from Instructional Videos with Weak Supervision
:star:code
使用教学视频进行概率性程序规划的弱监督方法 - Revisiting Weakly Supervised Pre-Training of Visual Perception Models
:star:code - Automatic Synthesis of Diverse Weak Supervision Sources for Behavior Analysis
:star:code - Decoupling Makes Weakly Supervised Local Feature Better
:star:code
- P3IV: Probabilistic Procedure Planning from Instructional Videos with Weak Supervision
15.Transformer
- Vision Transformer With Deformable Attention
:star:code - Delving Deep Into the Generalization of Vision Transformers Under Distribution Shifts
:star:code - HiVT: Hierarchical Vector Transformer for Multi-Agent Motion Prediction
- Vision Transformer Slimming: Multi-Dimension Searching in Continuous Optimization Space
:star:code - BoxeR: Box-Attention for 2D and 3D Transformers
:star:code - Video Swin Transformer
:star:code - APRIL: Finding the Achilles' Heel on Privacy for Vision Transformers
- Fast Point Transformer
:star:code - ChiTransformer:Towards Reliable Stereo from Cues
- Beyond Fixation: Dynamic Window Visual Transformer
:star:code - Training-free Transformer Architecture Search
:newspaper:解读 - Automated Progressive Learning for Efficient Training of Vision Transformers
:star:code - Collaborative Transformers for Grounded Situation Recognition
:star:code - TubeDETR: Spatio-Temporal Video Grounding with Transformers
:open_mouth:oral:star:code:house:project - Deformable Video Transformer
- MixFormer: Mixing Features across Windows and Dimensions
:open_mouth:oral:star:code:newspaper:粗解 - Are Multimodal Transformers Robust to Missing Modality?
- MiniViT: Compressing Vision Transformers with Weight Multiplexing
- Multimodal Token Fusion for Vision Transformers
:star:code - Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer
:open_mouth:oral:star:code:newspaper:解读 - UTC: A Unified Transformer with Inter-Task Contrastive Learning for Visual Dialog
对比学习用于视觉对话的统一Transformer架构 - Patch Slimming for Efficient Vision Transformers
:newspaper:解读 - Swin Transformer V2: Scaling Up Capacity and Resolution
:star:code
:newspaper:大大刷新记录!Swin Transformer v2.0 来了,30亿参数! - SimMIM: A Simple Framework for Masked Image Modeling
:star:code - NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition
:star:code
:newspaper:解读 - Mobile-Former: Bridging MobileNet and Transformer
:star:code - MulT: An End-to-End Multitask Learning Transformer
:star:code:house:project - Scaling Vision Transformers to Gigapixel Images via Hierarchical Self-Supervised Learning
:open_mouth:oral:star:code:newspaper:解读 - CodedVTR: Codebook-Based Sparse Voxel Transformer With Geometric Guidance
- MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger Tokens
:star:code - IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering in Indoor Scenes
:star:code - Reversible Vision Transformers
:star:code - MetaFormer Is Actually What You Need for Vision
:open_mouth:oral:star:code - GradViT: Gradient Inversion of Vision Transformers
:house:project - CSWin Transformer: A General Vision Transformer Backbone With Cross-Shaped Windows
:star:code - MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
:star:code
:newspaper:Meta&伯克利基于池化自注意力机制提出通用多尺度视觉Transformer,在ImageNet分类准确率达88.8%!开源 - A-ViT: Adaptive Tokens for Efficient Vision Transformer
:open_mouth:oral:house:project
:newspaper:不重要的token可以提前停止计算!英伟达提出自适应token的高效视觉Transformer网络A-ViT,提高模型的吞吐量! - Certified Patch Robustness via Smoothed Vision Transformers
:star:code - The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy
:star:code - Bootstrapping ViTs: Towards Liberating Vision Transformers From Pre-Training
:star:code - Object-Region Video Transformers
:star:code:house:project - Shunted Self-Attention via Multi-Scale Token Aggregation
:open_mouth:oral:star:code - Towards Robust Vision Transformer
:star:code - Fine-tuning Image Transformers using Learnable Memory
- Lite Vision Transformer With Enhanced Self-Attention
:star:code - Self-Supervised Video Transformer
:star:code - TransMix: Attend To Mix for Vision Transformers
:star:code - CMT: Convolutional Neural Networks Meet Vision Transformers
:star:code - 形状补全
- ShapeFormer: Transformer-based Shape Completion via Sparse Representation
:star:code:house:project
14.Video
- Improving Video Model Transfer With Dynamic Representation Learning
- 动作分割
- 动作理解
- Video Copy Detection(视频拷贝检测)
- 视频合成
- 视频异常检测
- Generative Cooperative Learning for Unsupervised Video Anomaly Detection
- Bayesian Nonparametric Submodular Video Partition for Robust Anomaly Detection
- Deep Anomaly Discovery From Unlabeled Videos via Normality Advantage and Self-Paced Refinement
- UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection
:star:code
- 视频监控
- 视频时刻检索和视频高光检测
- 视频时刻检索
- 视频预测
- STRPM: A Spatiotemporal Residual Predictive Model for High-Resolution Video Prediction
- Continual Predictive Learning from Videos
:open_mouth:oral:star:code - SimVP: Simpler yet Better Video Prediction
:star:code:newspaper:解读 - Comparing Correspondences: Video Prediction With Correspondence-Wise Losses
:star:code:house:project
- 视频个体计数
- 视频插值
- Many-to-many Splatting for Efficient Video Frame Interpolation
:star:code - TimeReplayer: Unlocking the Potential of Event Cameras for Video Interpolation
:house:project - Long-term Video Frame Interpolation via Feature Propagation
- Time Lens++: Event-based Frame Interpolation with Parametric Non-linear Flow and Multi-scale Fusion
:house:project
- Many-to-many Splatting for Efficient Video Frame Interpolation
- 视觉对应(视频)
- 视频识别
- BEVT: BERT Pretraining of Video Transformers
:star:code
:newspaper:视频Transformer自监督预训练新范式,复旦、微软云AI实现视频识别新SOTA - MLP-3D: A MLP-like 3D Architecture with Grouped Time Mixing
:star:code:newspaper:解读 - MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition
:newspaper:将模型的记忆保存下来!Meta&UC Berkeley提出MeMViT,建模时间支持比现有模型长30倍,计算量仅增加4.5% - Multiview Transformers for Video Recognition
:star:code - Group Contextualization for Video Recognition
:star:code - AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition
:star:code
- BEVT: BERT Pretraining of Video Transformers
- 视频分类
- 视频预测
- 视频分割
- Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation
:star:code - VSS
- Scene Consistency Representation Learning for Video Scene Segmentation
:star:code
:newspaper:解读1
:newspaper:解读2
- Scene Consistency Representation Learning for Video Scene Segmentation
- VOS
- Recurrent Dynamic Embedding for Video Object Segmentation
:star:code - Language-Bridged Spatial-Temporal Interaction for Referring Video Object Segmentation
:star:code:house:project
:newspaper:北航&信工所&美团提出LBDT,基于语言桥接的时空交互来进行准确指向性视频对象分割,性能SOTA!代码开源(CVPR 2022) - Accelerating Video Object Segmentation With Compressed Video
:star:code
:newspaper:CoVOS:无需解码!利用压缩视频比特流的运动矢量和残差进行半监督的VOS加速(CVPR 2022) - End-to-End Referring Video Object Segmentation With Multimodal Transformers
:star:code - HODOR: High-Level Object Descriptors for Object Re-Segmentation in Video Learned From Static Images
:star:code - SWEM: Towards Real-Time Video Object Segmentation With Sequential Weighted Expectation-Maximization
- Language As Queries for Referring Video Object Segmentation
:star:code - Wnet: Audio-Guided Video Object Segmentation via Wavelet-Based Cross-Modal Denoising Networks
:star:code - YouMVOS: An Actor-Centric Multi-Shot Video Object Segmentation Dataset
:star:code:house:project - Per-Clip Video Object Segmentation
- Recurrent Dynamic Embedding for Video Object Segmentation
- 视频实例分割(VIS)
- Efficient Video Instance Segmentation via Tracklet Query and Proposal
:house:project:tv:video:newspaper:粗解 - Temporally Efficient Vision Transformer for Video Instance Segmentation
:open_mouth:oral:star:code:newspaper:解读 - VISOLO: Grid-Based Space-Time Aggregation for Efficient Online Video Instance Segmentation
:star:code - Multi-Level Representation Learning With Semantic Alignment for Referring Video Object Segmentation
- Efficient Video Instance Segmentation via Tracklet Query and Proposal
- 视频语义分割
- 视频全景分割
- Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation
- 视频影像处理
- 视频恢复
- 视频修复
- Towards An End-to-End Framework for Flow-Guided Video Inpainting
:star:code - The DEVIL Is in the Details: A Diagnostic Evaluation Benchmark for Video Inpainting
:star:code - Revisiting Temporal Alignment for Video Restoration
:star:code - DLFormer: Discrete Latent Transformer for Video Inpainting
:star:code - Inertia-Guided Flow Completion and Style Fusion for Video Inpainting
:star:code
- Towards An End-to-End Framework for Flow-Guided Video Inpainting
- 视频去摩尔纹
- 视频去模糊
- 视频去噪
- 电影修复
- 视频表征学习
- TransRank: Self-supervised Video Representation Learning via Ranking-based Transformation Recognition
:open_mouth:oral:star:code:newspaper:解读 - Motion-Aware Contrastive Video Representation Learning via Foreground-Background Merging
:star:code - Motion-Adjustable Neural Implicit Video Representation
- 自监督视频表征学习
- Hierarchical Self-supervised Representation Learning for Movie Understanding
:star:code:house:project - Learning from Untrimmed Videos: Self-Supervised Video Representation Learning with Hierarchical Consistency
:star:code:house:project - Cross-Architecture Self-supervised Video Representation Learning
:star:code
:newspaper:解读
:newspaper:不同网络结构的特征也能进行对比学习?蚂蚁&美团&南大&阿里提出跨架构自监督视频表示学习方法CACL,性能SOTA!
- Hierarchical Self-supervised Representation Learning for Movie Understanding
- 视频对比学习
- TransRank: Self-supervised Video Representation Learning via Ranking-based Transformation Recognition
- 视频分解
- Deformable Sprites for Unsupervised Video Decomposition
:open_mouth:oral:house:project
- Deformable Sprites for Unsupervised Video Decomposition
- 视频阴影检测
- 视频帧插值
- IFRNet: Intermediate Feature Refine Network for Efficient Frame Interpolation
:star:code
:newspaper:解读 - Video Frame Interpolation with Transformer
:star:code
:newspaper:解读 - Video Frame Interpolation Transformer
:star:code - Optimizing Video Prediction via Video Frame Interpolation
- ST-MFNet: A Spatio-Temporal Multi-Flow Network for Frame Interpolation
:star:code
- IFRNet: Intermediate Feature Refine Network for Efficient Frame Interpolation
- 视频理解
- Revisiting the "Video" in Video-Language Understanding
:open_mouth:oral:star:code - Long-Short Temporal Contrastive Learning of Video Transformers
- 通用事件边界检测(视频理解)
- Revisiting the "Video" in Video-Language Understanding
- 视频字幕
- End-to-End Generative Pretraining for Multimodal Video Captioning
:newspaper:谷歌多模态预训练框架:视频字幕、动作分类、问答全部实现SOTA - Hierarchical Modular Network for Video Captioning
:star:code - SwinBERT: End-to-End Transformers With Sparse Attention for Video Captioning
:star:code - EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching
:star:code
- End-to-End Generative Pretraining for Multimodal Video Captioning
- 视频重构
- 视频相似度评估
- 视频摘要
- 视频编解码
- 视频建模
- Stand-Alone Inter-Frame Attention in Video Models
:star:code
:newspaper:解读
- Stand-Alone Inter-Frame Attention in Video Models
- 视频段落定位
- 句子定位
- 序列验证
- 视频编辑
- 视频视觉关系检测
- 视频动作推理
- 视频重建
13.GAN
- 🐦️HyperInverter: Improving StyleGAN Inversion via Hypernetwork
:house:project - InsetGAN for Full-Body Image Generation
:house:project
:newspaper:1024x1024 分辨率,效果惊人!InsetGAN:全身图像生成 - Commonality in Natural Images Rescues GANs: Pretraining GANs with Generic and Privacy-free Synthetic Data
:star:code - Deep Image-based Illumination Harmonization
- GAN-Supervised Dense Visual Alignment
:open_mouth:oral:star:code:house:project:tv:video
:newspaper:CVPR2022 Oral:GAN监督的密集视觉对齐,代码开源 - Styleformer: Transformer Based Generative Adversarial Networks With Style Vector
:star:code
:newspaper:解读 - HairMapper: Removing Hair from Portraits Using GANs
:star:code - Polymorphic-GAN: Generating Aligned Samples across Multiple Domains with Learned Morph Maps
:open_mouth:oral:house:project - Drop the GAN: In Defense of Patches Nearest Neighbors As Single Image Generative Models
- On Aliased Resizing and Surprising Subtleties in GAN Evaluation
:star:code:house:project - Few Shot Generative Model Adaption via Relaxed Spatial Structural Alignment
- Depth-Aware Generative Adversarial Network for Talking Head Video Generation
:star:code - Efficient Geometry-Aware 3D Generative Adversarial Networks
:star:code:house:project - DO-GAN: A Double Oracle Framework for Generative Adversarial Networks
- GANSeg: Learning to Segment by Unsupervised Hierarchical Image Generation
:star:code - CoordGAN: Self-Supervised Dense Correspondences Emerge From GANs
:star:code:house:project:tv:video - HyperStyle: StyleGAN Inversion With HyperNetworks for Real Image Editing
:star:code:house:project - Spatially-Adaptive Multilayer Selection for GAN Inversion and Editing
- Improving GAN Equilibrium by Raising Spatial Awareness
:star:code:house:project - SphericGAN: Semi-Supervised Hyper-Spherical Generative Adversarial Networks for Fine-Grained Image Synthesis
- Pix2NeRF: Unsupervised Conditional p-GAN for Single Image to Neural Radiance Fields Translation
:star:code - Think Twice Before Detecting GAN-Generated Fake Images From Their Spectral Domain Imprints
- Ensembling Off-the-Shelf Models for GAN Training
:open_mouth:oral:star:code:house:project - Style Transformer for Image Inversion and Editing
:star:code - BigDatasetGAN: Synthesizing ImageNet With Pixel-Wise Annotations
:star:code:house:project - High-Fidelity GAN Inversion for Image Attribute Editing
:star:code:house:project - Manifold Learning Benefits GANs
:star:code - BodyGAN: General-Purpose Controllable Neural Human Body Generation
- Feature Statistics Mixing Regularization for Generative Adversarial Networks
:star:code - StyleGAN-V: A Continuous Video Generator With the Price, Image Quality and Perks of StyleGAN2
:star:code:house:project - SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing
:star:code:house:project - LARGE: Latent-Based Regression Through GAN Semantics
:star:code - 图像篡改检测
- 头发编辑
12.Image-to-Image Translation(图像到图像翻译)
- Exploring Patch-wise Semantic Relation for Contrastive Learning in Image-to-Image Translation Tasks
- Maximum Spatial Perturbation Consistency for Unpaired Image-to-Image Translation
:star:code - InstaFormer: Instance-Aware Image-to-Image Translation with Transformer
- Unsupervised Image-to-Image Translation with Generative Prior
:star:code:house:project:tv:video - Alleviating Semantics Distortion in Unsupervised Low-Level Image-to-Image Translation via Structure Consistency Constraint
:star:code:newspaper:解读 - Wavelet Knowledge Distillation: Towards Efficient Image-to-Image Translation
- QS-Attn: Query-Selected Attention for Contrastive Learning in I2I Translation
:star:code - Self-Supervised Dense Consistency Regularization for Image-to-Image Translation
11.Face(人脸)
- Synthetic Generation of Face Videos With Plethysmograph Physiology
:house:project - Protecting Celebrities with Identity Consistency Transformer
- PhysFormer: Facial Video-based Physiological Measurement with Temporal Difference Transformer
:star:code - How Much Does Input Data Type Impact Final Face Model Accuracy?
- HP-Capsule: Unsupervised Face Part Discovery by Hierarchical Parsing Capsule Network
- Learning To Listen: Modeling Non-Deterministic Dyadic Facial Motion
:star:code:house:project - Estimating Structural Disparities for Face Models
- General Facial Representation Learning in a Visual-Linguistic Manner
:open_mouth:oral:star:code - Deepfake
- Voice-Face Homogeneity Tells Deepfake
:star:code:newspaper:粗解
- Voice-Face Homogeneity Tells Deepfake
- 妆容迁移
- 人脸识别
- Local-Adaptive Face Recognition via Graph-based Meta-Clustering and Regularized Adaptation
- Killing Two Birds with One Stone:Efficient and Robust Training of Face Recognition CNNs by Partial FC
:star:code - AdaFace: Quality Adaptive Margin for Face Recognition
:open_mouth:oral:star:code - Killing Two Birds With One Stone: Efficient and Robust Training of Face Recognition CNNs by Partial FC
:star:code - Learning To Learn Across Diverse Data Biases in Deep Face Recognition
- Simulated Adversarial Testing of Face Recognition Models
- Privacy-Preserving Online AutoML for Domain-Specific Face Detection
- An Efficient Training Approach for Very Large Scale Face Recognition
:star:code
- 人脸表情识别
- Towards Semi-Supervised Deep Facial Expression Recognition with An Adaptive Confidence Margin
:star:code - Multi-Dimensional, Nuanced and Subjective - Measuring the Perception of Facial Expressions
- Face2Exp: Combating Data Biases for Facial Expression Recognition
:star:code - Neural Emotion Director: Speech-Preserving Semantic Control of Facial Expressions in "In-the-Wild" Videos
:open_mouth:oral:star:code:house:project
- Towards Semi-Supervised Deep Facial Expression Recognition with An Adaptive Confidence Margin
- 三维人像
- 3D人脸
- 活体检测
- 假脸检测
- Exploring Frequency Adversarial Attacks for Face Forgery Detection
:newspaper:粗解 - Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection
- End-to-End Reconstruction-Classification Learning for Face Forgery Detection
:newspaper:解读 - Learning Second Order Local Anomaly for General Face Forgery Detection
- Protecting Celebrities From DeepFake With Identity Consistency Transformer
:star:code
- Exploring Frequency Adversarial Attacks for Face Forgery Detection
- 人脸交换
- 人脸属性分类
- Face Relighting(人脸重照光)
- 人脸编辑
- 人脸幻构
- Deepfake检测
- 人脸重建
- 人脸捕捉
- 换头
- Few-Shot Head Swapping in the Wild
:open_mouth:oral:star:code:house:project:tv:video:newspaper:解读
- Few-Shot Head Swapping in the Wild
- 人像畸变矫正
- 3D人脸建模
- 人脸修复
- 人脸对齐
- 语音驱动的3D脸部动画
- 舌头三维重建
- 伪造图像检测
- 人脸解析
- 人脸表情
- 人脸检测
- 人脸重现
- 说话人脸生成
- 人脸关键点
- 人脸变形
- 3D人脸表情合成
- 语音驱动的动画舌头
- Speech Driven Tongue Animation
:star:code:house:project
- Speech Driven Tongue Animation
- 文本-人脸
- 面部动作单元识别
- 人脸验证
10.3D(三维视觉)
- Disentangled3D: Learning a 3D Generative Model with Disentangled Geometry and Appearance from Monocular Images
- Depth-Guided Sparse Structure-from-Motion for Movies and TV Shows
:star:code - 3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive Selection
:open_mouth:oral:star:code:newspaper:解读 - Physical Simulation Layer for Accurate 3D Modeling
- φ-SfT: Shape-from-Template with a Physics-Based Deformation Model
:house:project - ICON: Implicit Clothed Humans Obtained From Normals
:star:code:house:project - Representing 3D Shapes With Probabilistic Directed Distance Fields
- Improving Neural Implicit Surfaces Geometry With Patch Warping
:star:code - LOLNerf: Learn From One Look
:house:project - Neural Mesh Simplification
- Extracting Triangular 3D Models, Materials, and Lighting From Images
:open_mouth:oral:star:code:house:project - PlanarRecon: Real-Time 3D Plane Detection and Reconstruction From Posed Monocular Videos
:star:code:house:project - The Wanderings of Odysseus in 3D Scenes
:star:code:house:project - Volumetric Bundle Adjustment for Online Photorealistic Scene Capture
- Stereo Merging
- PSMNet: Position-aware Stereo Merging Network for Room Layout Estimation
- GraftNet: Towards Domain Generalized Stereo Matching with a Broad-Spectrum and Task-Oriented Feature
:star:code - Degradation-agnostic Correspondence from Resolution-asymmetric Stereo
- Practical Stereo Matching via Cascaded Recurrent Network with Adaptive Correlation
:open_mouth:oral:star:code:newspaper:解读
- stereo matching
- Chitransformer: Towards Reliable Stereo From Cues
:star:code - Uniform Subdivision of Omnidirectional Camera Space for Efficient Spherical Stereo Matching
- FoggyStereo: Stereo Matching With Fog Volume Representation
- ITSA: An Information-Theoretic Approach to Automatic Shortcut Avoidance and Domain Generalization in Stereo Matching Networks
:star:code
- Chitransformer: Towards Reliable Stereo From Cues
- 深度估计
- OmniFusion: 360 Monocular Depth Estimation via Geometry-Aware Fusion
:open_mouth:oral:star:code - NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation
:star:code:house:project - 🐦️Toward Practical Self-Supervised Monocular Indoor Depth Estimation
- P3Depth: Monocular Depth Estimation with a Piecewise Planarity Prior
:star:code - HiMODE: A Hybrid Monocular Omnidirectional Depth Estimation Model
- Multi-Frame Self-Supervised Depth with Transformers
- Layered Depth Refinement with Mask Guidance
:house:project - 360MonoDepth: High-Resolution 360deg Monocular Depth Estimation
:star:code:house:project - Towards Multimodal Depth Estimation from Light Fields
- Multi-Frame Self-Supervised Depth with Transformers
- Exploiting Pseudo Labels in a Self-Supervised Learning Framework for Improved Monocular Depth Estimation
- Rethinking Depth Estimation for Multi-View Stereo: A Unified Representation
:star:code - Multi-View Depth Estimation by Fusing Single-View Depth Probability With Multi-View Geometry
:open_mouth:oral:star:code - Toward Practical Monocular Indoor Depth Estimation
- Single-Stage 3D Geometry-Preserving Depth Estimation Model Training on Dataset Mixtures With Uncalibrated Stereo Data
- Stereo Depth From Events Cameras: Concentrate and Focus on the Future
:star:code - Depth Estimation by Combining Binocular Stereo and Monocular Structured-Light
:star:code - CroMo: Cross-Modal Learning for Monocular Depth Estimation
- Deep Depth From Focus With Differential Focus Volume
- Gated2Gated: Self-Supervised Depth Estimation From Gated Images
:star:code
- OmniFusion: 360 Monocular Depth Estimation via Geometry-Aware Fusion
- 房间布局
- MVS
- RayMVSNet: Learning Ray-based 1D Implicit Fields for Accurate Multi-View Stereo
- TransMVSNet: Global Context-aware Multi-view Stereo Network with Transformers
:star:code:newspaper:解读 - Non-parametric Depth Distribution Modelling based Depth Inference for Multi-view Stereo
:star:code - IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo
:star:code - Generalized Binary Search Network for Highly-Efficient Multi-View Stereo
:star:code - Differentiable Stereopsis: Meshes From Multiple Views Using Differentiable Rendering
:star:code:house:project - Efficient Multi-View Stereo by Iterative Dynamic Cost Volume
:star:code - MVS2D: Efficient Multi-View Stereo via Attention-Driven 2D Convolutions
:star:code - MVPS
- 三维重建
- PlaneMVS: 3D Plane Reconstruction from Multi-View Stereo
- Self-supervised Neural Articulated Shape and Appearance Models
:house:project - BNV-Fusion: Dense 3D Reconstruction using Bi-level Neural Volume Fusion
- Topologically-Aware Deformation Fields for Single-View 3D Reconstruction
:star:code:house:project - Pre-train, Self-train, Distill: A simple recipe for Supersizing 3D Reconstruction
:star:code:house:project:newspaper:解读 - What's in your hands? 3D Reconstruction of Generic Objects in Hands
:star:code:house:project:tv:video:newspaper:解读 - Surface Reconstruction from Point Clouds by Learning Predictive Context Priors
:star:code - FvOR: Robust Joint Shape and Pose Optimization for Few-view Object Reconstruction
:star:code
:newspaper:解读 - KeyTr: Keypoint Transporter for 3D Reconstruction of Deformable Objects in Videos
- SPAMs: Structured Implicit Parametric Models
:house:project:tv:video - Enhancing Face Recognition With Self-Supervised 3D Reconstruction
- Neural Fields As Learnable Kernels for 3D Reconstruction
:house:project - Input-Level Inductive Biases for 3D Reconstruction
- Human-Aware Object Placement for Visual Environment Reconstruction
:star:code:house:project - Gradient-SDF: A Semi-Implicit Surface Representation for 3D Reconstruction
:star:code - OcclusionFusion: Occlusion-Aware Motion Estimation for Real-Time Dynamic 3D Reconstruction
:star:code:house:project - 三维场景重建
- Neural 3D Scene Reconstruction with the Manhattan-world Assumption
:open_mouth:oral:star:code:house:project:tv:video:newspaper:解读 - StyleMesh: Style Transfer for Indoor 3D Scene Reconstructions
:star:code:house:project:tv:video - PhotoScene: Photorealistic Material and Lighting Transfer for Indoor Scenes
:star:code - Look Outside the Room: Synthesizing a Consistent Long-Term 3D Scene Video From a Single Image
:star:code:house:project - NeRFusion: Fusing Radiance Fields for Large-Scale Scene Reconstruction
:star:code:house:project
- Neural 3D Scene Reconstruction with the Manhattan-world Assumption
- 手物重建
- 三维服装网格重建
- 三维网格重建
- 三维形状重建
- 三维服装变形
- SNUG: Self-Supervised Neural Dynamic Garments
:open_mouth:oral:star:code
- SNUG: Self-Supervised Neural Dynamic Garments
- 纹理迁移与合成
- 形状匹配
- 表面重建
- 多视图网格重建
- 3D形状分析
- 三维补全
- 图像重建
- PS
- 预测三维物体形状
- 三维形状
- 神经三维内容生成
- 深度补全
- 线段重建
- 形状重建
- 3D形状生成
- 3D Part Segmentation
- 3D语义场景完成
9.Human Pose Estimation(人体姿态估计)
- COAP: Compositional Articulated Occupancy of People
:star:code:house:project:tv:video:newspaper:解读 - Context-Aware Sequence Alignment using 4D Skeletal Augmentation
:open_mouth:oral:star:code:house:project - Generalizable Human Pose Triangulation
- Location-Free Human Pose Estimation
:newspaper:解读 - Meta Agent Teaming Active Learning for Pose Estimation
- Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation
:star:code - 多人姿态估计
- 基于视频的HPE
- 3D pose
- MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video
:star:code - PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision
:open_mouth:oral:star:code - Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose Estimation
:house:project - Single-Stage Is Enough: Multi-Person Absolute 3D Pose Estimation
- Distribution-Aware Single-Stage Models for Multi-Person 3D Pose Estimation
:newspaper:精准高效估计多人3D姿态,美图&北航分布感知式单阶段模型 - Forecasting Characteristic 3D Poses of Human Actions
:tv:video - Ray3D: Ray-Based 3D Human Pose Estimation for Monocular Absolute 3D Localization
:star:code - Estimating Egocentric 3D Human Pose in the Wild With External Weak Supervision
:house:project - ElePose: Unsupervised 3D Human Pose Estimation by Predicting Camera Elevation and Learning Normalizing Flows on 2D Poses
:star:code - MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation
:star:code - PoseKernelLifter: Metric Lifting of 3D Human Pose Using Sound
- Capturing Humans in Motion: Temporal-Attentive 3D Human Pose and Shape Estimation From Monocular Video
:star:code:house:project - GraFormer: Graph-Oriented Transformer for 3D Pose Estimation
- AdaptPose: Cross-Dataset Adaptation for 3D Human Pose Estimation by Learnable Motion Generation
- MetaPose: Fast 3D Pose From Multiple Views Without 3D Supervision
:star:code:house:project - Keypoint Transformer: Solving Joint Identification in Challenging Hands and Object Interactions for Accurate 3D Pose Estimation
- MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video
- 4D 人体捕获
- 运动捕捉
- 手臂-手部动态估计
- 3D人体形状
- OSSO: Obtaining Skeletal Shape from Outside
:star:code:house:project:tv:video:newspaper:解读
- OSSO: Obtaining Skeletal Shape from Outside
- Dense correspondence
- 3D人体运动重建
- 三维人体姿态重建
- 人体网格恢复
- Human Mesh Recovery From Multiple Shots
:star:code:house:project - Occluded Human Mesh Recovery
:house:project - GLAMR: Global Occlusion-Aware Human Mesh Recovery With Dynamic Cameras
:open_mouth:oral:star:code:house:project
- Human Mesh Recovery From Multiple Shots
- 人体运动描述
- 三维人体动作
- 三维人体合成
- HSC
- 3D人体运动合成
- 人体重建
- 手部姿态
- 手部网格重建
- 3D手部姿势
- 音频驱动的手势重演
- 3D手重建
- 手部跟踪
- 手势生成
- 3D手网格估计
- 三维人体
8.Action Detection(人体动作检测与识别)
- 动作检测
- Colar: Effective and Efficient Online Action Detection by Consulting Exemplars
:star:code - Learnable Irrelevant Modality Dropout for Multimodal Action Recognition on Modality-Specific Annotated Videos
- End-to-End Semi-Supervised Learning for Video Action Detection
- SPAct: Self-supervised Privacy Preservation for Action Recognition
:star:code - Temporal Alignment Networks for Long-term Video
:open_mouth:oral:star:code:house:project:newspaper:粗解 - SOS! Self-supervised Learning Over Sets Of Handled Objects In Egocentric Action Recognition
- GateHUB: Gated History Unit With Background Suppression for Online Action Detection
- MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection
:star:code
:newspaper:MS-TCT:Inria&SBU提出用于动作检测的多尺度时间Transformer,效果SOTA!已开源!(CVPR2022) - Look for the Change: Learning Object States and State-Modifying Actions From Untrimmed Web Videos
:house:project - Uncertainty-Guided Probabilistic Transformer for Complex Action Recognition
- Learning From Temporal Gradient for Semi-Supervised Action Recognition
:star:code - DirecFormer: A Directed Attention in Transformer Approach to Robust Action Recognition
:star:code - Interact Before Align: Leveraging Cross-Modal Knowledge for Domain Adaptive Action Recognition
- Object-Relation Reasoning Graph for Action Recognition
- Revisiting Skeleton-Based Action Recognition
:open_mouth:oral:star:code - InfoGCN: Representation Learning for Human Skeleton-Based Action Recognition
- E2(GO)MOTION: Motion Augmented Event Stream for Egocentric Action Recognition
:star:code - End-to-End Semi-Supervised Learning for Video Action Detection
:star:code - Stochastic Backpropagation: A Memory Efficient Strategy for Training Video Models
:open_mouth:oral - TubeR: Tubelet Transformer for Video Action Detection
:open_mouth:oral:house:project - 半监督动作识别
- 零样本动作识别
- Cross-modal Representation Learning for Zero-shot Action Recognition
:star:code
零样本动作识别:跨模态表示学习
- Cross-modal Representation Learning for Zero-shot Action Recognition
- 小样本动作识别
- 时序动作检测
- Colar: Effective and Efficient Online Action Detection by Consulting Exemplars
- 时序动作定位
- Weakly Supervised Temporal Action Localization via Representative Snippet Knowledge Propagation
:star:code:newspaper:粗解 - Unsupervised Pre-training for Temporal Action Localization Tasks
:star:code - ASM-Loc: Action-aware Segment Modeling for Weakly-Supervised Temporal Action Localization
:star:code - Fine-grained Temporal Contrastive Learning for Weakly-supervised Temporal Action Localization
:star:code - Structured Attention Composition for Temporal Action Localization
:star:code - Learning To Refactor Action and Co-Occurrence Features for Temporal Action Localization
- Exploring Denoised Cross-Video Contrast for Weakly-Supervised Temporal Action Localization
- OpenTAL: Towards Open Set Temporal Action Localization
:star:code
- Weakly Supervised Temporal Action Localization via Representative Snippet Knowledge Propagation
- 重复动作计数
- TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting
:open_mouth:oral:star:code:house:project
- TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting
- 组动作识别
- 动作质量评估
- FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment
:open_mouth:oral:star:code:house:project:newspaper:解读
- FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment
- 活动识别
7.Point Cloud(点云)
- Shape-invariant 3D Adversarial Point Clouds
:star:code - AziNorm: Exploiting the Radial Symmetry of Point Cloud for Azimuth-Normalized 3D Perception
:star:code - REGTR: End-to-end Point Cloud Correspondences with Transformers
:star:code - Equivariant Point Cloud Analysis via Learning Orientations for Message Passing
:star:code - Text2Pos: Text-to-Point-Cloud Cross-Modal Localization
:star:code:house:project - Deformation and Correspondence Aware Unsupervised Synthetic-to-Real Scene Flow Estimation for Point Clouds
:star:code - Self-Supervised Arbitrary-Scale Point Clouds Upsampling via Implicit Neural Representation
:star:code:newspaper:解读 - 3DeformRS: Certifying Spatial Deformations on Point Clouds
:star:code - Reconstructing Surfaces for Sparse Point Clouds with On-Surface Priors
:star:code:newspaper:解读 - Density-preserving Deep Point Cloud Compression
:star:code:house:project:newspaper:解读 - Surface Representation for Point Clouds
:open_mouth:oral:star:code
:newspaper:解读1
:newspaper:解读2 - Neural Points: Point Cloud Representation With Neural Fields for Arbitrary Upsampling
:star:code - Point Cloud Pre-Training With Natural 3D Structures
:star:code:house:project - Not All Points Are Equal: Learning Highly Efficient Point-Based Detectors for 3D LiDAR Point Clouds
:star:code - Point2Cyl: Reverse Engineering 3D Objects from Point Clouds to Extrusion Cylinders
- RigidFlow: Self-Supervised Scene Flow Learning on Point Clouds by Local Rigidity Prior
- PatchFormer: An Efficient Point Transformer With Patch Attention
- PhyIR: Physics-Based Inverse Rendering for Panoramic Indoor Images
- Point Cloud Color Constancy
:star:code - Multimodal Colored Point Cloud to Image Alignment
- No Pain, Big Gain: Classify Dynamic Point Cloud Sequences With Static Models by Fitting Feature-Level Space-Time Surfaces
:star:code - Domain Adaptation on Point Clouds via Geometry-Aware Implicits
:star:code - ZZ-Net: A Universal Rotation Equivariant Architecture for 2D Point Clouds
- 3DAC: Learning Attribute Compression for Point Clouds
- RCP: Recurrent Closest Point for Point Cloud
:star:code - Self-Supervised Global-Local Structure Modeling for Point Cloud Domain Adaptation With Reliable Voted Pseudo Labels
- DiGS: Divergence Guided Shape Implicit Neural Representation for Unoriented Point Clouds
:star:code:house:project - The Devil Is in the Pose: Ambiguity-Free 3D Rotation-Invariant Learning via Pose-Aware Convolution
- 3D 点云
- Point-BERT: Pre-Training 3D Point Cloud Transformers With Masked Point Modeling
:star:code - CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding
:star:code:newspaper:粗解
CrossPoint,一个用于 3D 点云表征学习的简单自监督学习框架。虽然该方法是在合成的三维物体数据集上训练的,但在下游任务中的实验结果,如三维物体分类和三维物体部分分割,在合成和真实世界的数据集中都证明了该方法在学习可迁移表征方面的有效性。 - IDEA-Net: Dynamic 3D Point Cloud Interpolation via Deep Embedding Alignment
:star:code - A Unified Query-based Paradigm for Point Cloud Understanding
:star:code - WarpingGAN: Warping Multiple Uniform Priors for Adversarial 3D Point Cloud Generation
:star:code - 3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds
- Robust Structured Declarative Classifiers for 3D Point Clouds: Defending Adversarial Attacks With Implicit Gradients
:house:project - Why Discard if You Can Recycle?: A Recycling Max Pooling Module for 3D Point Cloud Analysis
- Upright-Net: Learning Upright Orientation for 3D Point Cloud
- Point-BERT: Pre-Training 3D Point Cloud Transformers With Masked Point Modeling
- 3D点云分割
- 点云分类
- 点云配准
- SC^2-PCR: A Second Order Spatial Compatibility for Efficient and Robust Point Cloud Registration
:star:code
:newspaper:二阶相似性测度,让传统配准方法取得比深度学习更好的性能,并达到深度学习的速度 - Multi-Instance Point Cloud Registration by Efficient Correspondence Clustering
:star:code - Deterministic Point Cloud Registration via Novel Transformation Decomposition
:newspaper:解读 - SC2-PCR: A Second Order Spatial Compatibility for Efficient and Robust Point Cloud Registration
:star:code - Geometric Transformer for Fast and Robust Point Cloud Registration
:star:code
- SC^2-PCR: A Second Order Spatial Compatibility for Efficient and Robust Point Cloud Registration
- 点云补全
- Learning a Structured Latent Space for Unsupervised Point Cloud Completion
- Learning Local Displacements for Point Cloud Completion
- LAKe-Net: Topology-Aware Point Cloud Completionby Localizing Aligned Keypoints
:newspaper:粗解 - LAKe-Net: Topology-Aware Point Cloud Completion by Localizing Aligned Keypoints
- 点云分割
- Contrastive Boundary Learning for Point Cloud Segmentation
:star:code:newspaper:解读 - SemAffiNet: Semantic-Affine Transformation for Point Cloud Segmentation
:star:code:newspaper:解读 - An MIL-Derived Transformer for Weakly Supervised Point Cloud Segmentation
:star:code - Pyramid Architecture for Multi-Scale Processing in Point Cloud Segmentation
:star:code
- Contrastive Boundary Learning for Point Cloud Segmentation
- 点云匹配
- 场景流估计
- 点云理解
6.Object Tracking(目标跟踪)
- TCTrack: Temporal Contexts for Aerial Tracking
:star:code:newspaper:粗解
:newspaper:TCTrack: 用于空中跟踪的时序信息框架 - Correlation-Aware Deep Tracking
- Global Tracking Transformers
:star:code - Unified Transformer Tracker for Object Tracking
:star:code - Global Tracking via Ensemble of Local Trackers
:star:code - Unsupervised Learning of Accurate Siamese Tracking
:star:code - Transformer Tracking with Cyclic Shifting Window Attention
:star:code
Transformer 跟踪:循环为一窗口注意力模型。该算法在五个数据集VOT2020, UAV123, LaSOT, TrackingNet, GOT-10k上均实现了新的SOTA. - Tracking People by Predicting 3D Appearance, Location and Pose
:open_mouth:oral:star:code:house:project - Cannot See the Forest for the Trees: Aggregating Multiple Viewpoints to Better Classify Objects in Videos
:star:code - Opening Up Open World Tracking
:open_mouth:oral:star:code:house:project - Transforming Model Prediction for Tracking
:star:code - PyMiceTracking: An Open-Source Toolbox for Real-Time Behavioral Neuroscience Experiments
:star:code - Spiking Transformers for Event-Based Single Object Tracking
:star:code - Correlation-Aware Deep Tracking
- MixFormer: End-to-End Tracking With Iterative Mixed Attention
:open_mouth:oral:star:code - PTTR: Relational 3D Point Cloud Object Tracking With Transformer
:star:code - GridShift: A Faster Mode-Seeking Algorithm for Image Segmentation and Object Tracking
:star:code - 3D 目标跟踪
- Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds
:star:code:newspaper:粗解 - Iterative Corresponding Geometry: Fusing Region and Depth for Highly Efficient 3D Tracking of Textureless Objects
:star:code - BCOT: A Markerless High-Precision 3D Object Tracking Benchmark
:star:code
- Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds
- 多目标跟踪
- Learning of Global Objective for Network Flow in Multi-Object Tracking
- MeMOT: Multi-Object Tracking with Memory
:open_mouth:oral - Multi-Object Tracking Meets Moving UAV
- Adiabatic Quantum Computing for Multi Object Tracking
- Towards Discriminative Representation: Multi-View Trajectory Contrastive Learning for Online Multi-Object Tracking
- LMGP: Lifted Multicut Meets Geometry Projections for Multi-Camera Multi-Object Tracking
:star:code - TrackFormer: Multi-Object Tracking With Transformers
:star:code - DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion
:star:code
- RGB-T跟踪
- 视觉跟踪
- Ranking-Based Siamese Visual Tracking
:star:code:newspaper:解读
- Ranking-Based Siamese Visual Tracking
- 夜间跟踪
- 人类运动跟踪
- 多人姿态跟踪
5.Object Detection(目标检测)
- DN-DETR: Accelerate DETR Training by Introducing Query DeNoising
:star:code:newspaper:粗解 - Overcoming Catastrophic Forgetting in Incremental Object Detection via Elastic Response Distillation
:star:code - ESCNet: Gaze Target Detection with the Understanding of 3D Scenes
:star:code - Segment and Complete: Defending Object Detectors Against Adversarial Patch Attacks With Robust Patch Detection
:star:code - Interactron: Embodied Adaptive Object Detection
:star:code - Beyond Bounding Box: Multimodal Knowledge Learning for Object Detection
以往目标检测往往以目标包围框作为标注训练,作者引入语言提示信息,提炼语言知识到目标检测模型中,获得了1.6~2.1%的性能增益。 - Dynamic Sparse R-CNN
- Unknown-Aware Object Detection: Learning What You Don't Know from Videos in the Wild
:star:code:newspaper:粗解 - Focal and Global Knowledge Distillation for Detectors
:star:code:newspaper:解读
关于目标检测的知识蒸馏工作,只需要30行代码就可以在 anchor-base, anchor-free 的单阶段、两阶段各种检测器上稳定涨点,现在代码已经开源。 - Group R-CNN for Weakly Semi-supervised Object Detection with Points
:star:code
:newspaper:解读 - Real-time Object Detection for Streaming Perception
:star:code:newspaper:解读 - Ev-TTA: Test-Time Adaptation for Event-Based Object Recognition
- Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model
:star:code - Optimal Correction Cost for Object Detection Evaluation
- Expanding Low-Density Latent Regions for Open-Set Object Detection
:star:code
:newspaper:解读 - SIOD: Single Instance Annotated Per Category Per Image for Object Detection
:star:code
:newspaper:解读 - Task-specific Inconsistency Alignment for Domain Adaptive Object Detection
:star:code - Zero-Query Transfer Attacks on Context-Aware Object Detectors
- AdaMixer: A Fast-Converging Query-Based Object Detector
:open_mouth:oral:star:code - Learning to Detect Mobile Objects from LiDAR Scans Without Labels
:star:code - Forecasting from LiDAR via Future Object Detection
:star:code - Target-aware Dual Adversarial Learning and a Multi-scenario Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection
:open_mouth:oral:star:code - Multi-Granularity Alignment Domain Adaptation for Object Detection
:star:code - Proper Reuse of Image Classification Features Improves Object Detection
:star:code - R(Det)^2: Randomized Decision Routing for Object Detection
- Towards Robust Adaptive Object Detection under Noisy Annotations
:star:code - Entropy-based Active Learning for Object Detection with Progressive Diversity Constraint
- Target-Relevant Knowledge Preservation for Multi-Source Domain Adaptive Object Detection
- Interactive Segmentation and Visualization for Tiny Objects in Multi-megapixel Images
:star:code - Cross Domain Object Detection by Target-Perceived Dual Branch Distillation
:star:code
跨域目标检测:目标感知双分支蒸馏 - Progressive End-to-End Object Detection in Crowded Scenes
:star:code
:newspaper:解读 - HCSC: Hierarchical Contrastive Selective Coding
:star:code
:newspaper:CNN自监督预训练新SOTA:上交、Mila、字节联合提出具有层级结构的图像表征自学习新框架 - Recurrent Glimpse-based Decoder for Detection with Transformer
:open_mouth:oral:star:code
:newspaper:解读 - Continual Object Detection via Prototypical Task Correlation Guided Gating Mechanism
:star:code - Balanced and Hierarchical Relation Learning for One-Shot Object Detection
:star:code - Accelerating DETR Convergence via Semantic-Aligned Matching
:star:code - DETReg: Unsupervised Pretraining With Region Priors for Object Detection
:star:code:house:project - Source-Free Object Detection by Learning To Overlook Domain Style
- DESTR: Object Detection With Split Transformer
- SmartAdapt: Multi-Branch Object Detection Framework for Videos on Mobiles
- Explore Spatio-Temporal Aggregation for Insubstantial Object Detection: Benchmark Dataset and Baseline
:star:code - Exploring Endogenous Shift for Cross-Domain Detection: A Large-Scale Benchmark and Perturbation Suppression Network
- Not All Labels Are Equal: Rationalizing the Labeling Costs for Training Object Detection
:star:code - Training Object Detectors From Scratch: An Empirical Study in the Era of Vision Transformer
- Sequential Voting With Relational Box Fields for Active Object Detection
:star:code:house:project - Simple Multi-dataset Detection
:star:code - ObjectFormer for Image Manipulation Detection and Localization
- A Dual Weighting Label Assignment Scheme for Object Detection
:star:code - Point-Level Region Contrast for Object Detection Pre-Training
:open_mouth:oral - Neural Volumetric Object Selection
:house:project - Confidence Propagation Cluster: Unleash Full Potential of Object Detectors
- Single-Domain Generalized Object Detection in Urban Scene via Cyclic-Disentangled Self-Distillation
:star:code - DetectorDetective: Investigating the Effects of Adversarial Examples on Object Detectors
:tv:video - Cross-Domain Adaptive Teacher for Object Detection
:star:code:house:project - End-to-End Human-Gaze-Target Detection With Transformers
- 小目标检测
- 零样本目标检测
- 小样本目标检测
- Sylph: A Hypernetwork Framework for Incremental Few-shot Object Detection
- Few-Shot Object Detection with Fully Cross-Transformer
- Kernelized Few-Shot Object Detection With Efficient Integral Aggregation
:star:code - Label, Verify, Correct: A Simple Few Shot Object Detection Method
:star:code:house:project
- 目标定位
- Weakly Supervised Object Localization as Domain Adaption
:star:code:newspaper:粗解 - Bridging the Gap between Classification and Localization for Weakly Supervised Object Localization
- Object Localization under Single Coarse Point Supervision
:star:code
:newspaper:解读 - CREAM: Weakly Supervised Object Localization via Class RE-Activation Mapping
:star:code - Spatial Commonsense Graph for Object Localisation in Partial Scenes
:house:project
:star:code:house:project
- Weakly Supervised Object Localization as Domain Adaption
- 3D目标检测
- Point Density-Aware Voxels for LiDAR 3D Object Detection
:star:code - A Versatile Multi-View Framework for LiDAR-based 3D Object Detection with Guidance from Panoptic Segmentation
- Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection From Point Clouds
:star:code - Pseudo-Stereo for Monocular 3D Object Detection in Autonomous Driving
:star:code:newspaper:粗解 - Rope3D: TheRoadside Perception Dataset for Autonomous Driving and Monocular 3D Object Detection Task
:house:project - Point2Seq: Detecting 3D Objects as Sequences
:star:code - MonoDETR: Depth-aware Transformer for Monocular 3D Object Detection
:star:code - Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes
:star:code
:newspaper:粗解 - Exploring Geometric Consistency for Monocular 3D Object Detection
- LiDAR Snowfall Simulation for Robust 3D Object Detection
:open_mouth:oral:star:code - CAT-Det: Contrastively Augmented Transformer for Multi-modal 3D Object Detection
- Homography Loss for Monocular 3D Object Detection
- HyperDet3D: Learning a Scene-conditioned 3D Object Detector
- DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection
:star:code - OccAM's Laser: Occlusion-based Attribution Maps for 3D Object Detectors on LiDAR Data
:star:code - Focal Sparse Convolutional Networks for 3D Object Detection
:open_mouth:oral:star:code:newspaper:解读:notebook: - Rotationally Equivariant 3D Object Detection
:house:project - Bridged Transformer for Vision and Point Cloud 3D Object Detection
:newspaper:解读 - Sparse Fuse Dense: Towards High Quality 3D Detection with Depth Completion
:open_mouth:oral:star:code
:newspaper:解读 - VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention
:star:code
:newspaper:华南理工提出VISTA:双跨视角空间注意力机制实现3D目标检测SOTA,即插即用 - Diversity Matters: Fully Exploiting Depth Clues for Reliable Monocular 3D Object Detection
:open_mouth:oral - MonoDTR: Monocular 3D Object Detection With Depth-Aware Transformer
:star:code - Voxel Field Fusion for 3D Object Detection
:star:code
:newspaper:解读 - DisARM: Displacement Aware Relation Module for 3D Detection
:star:code - Back to Reality: Weakly-supervised 3D Object Detection with Shape-guided Label Enhancement
:star:code - Embracing Single Stride 3D Object Detector With Sparse Transformer
:star:code - 3D-VField: Adversarial Augmentation of Point Clouds for Domain Generalization in 3D Object Detection
:house:project - Dimension Embeddings for Monocular 3D Object Detection
- MonoJSG: Joint Semantic and Geometric Cost Volume for Monocular 3D Object Detection
:star:code - RBGNet: Ray-Based Grouping for 3D Object Detection
:star:code - LIFT: Learning 4D LiDAR Image Fusion Transformer for 3D Object Detection
- SS3D: Sparsely-Supervised 3D Object Detection From Point Cloud
- DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection
:star:code - MonoGround: Detecting Monocular 3D Objects From the Ground
:star:code - TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection With Transformers
:star:code - Boosting 3D Object Detection by Simulating Multimodality on Point Clouds
- Point Density-Aware Voxels for LiDAR 3D Object Detection
- 伪装目标检测
- Zoom In and Out: A Mixed-scale Triplet Network for Camouflaged Object Detection
:star:code - Detecting Camouflaged Object in Frequency Domain
- Implicit Motion Handling for Video Camouflaged Object Detection
:house:project - Segment, Magnify and Reiterate: Detecting Camouflaged Objects the Hard Way
:star:code
- Zoom In and Out: A Mixed-scale Triplet Network for Camouflaged Object Detection
- 全监督目标检测
- 自监督目标检测
- 半监督目标检测
- Dense Learning based Semi-Supervised Object Detection
:star:code:newspaper:解读 - Label Matching Semi-Supervised Object Detection
:star:code - Semi-Supervised Object Detection via Multi-Instance Alignment With Global Class Prototypes
- Active Teacher for Semi-Supervised Object Detection
:star:code - Scale-Equivalent Distillation for Semi-Supervised Object Detection
- Unbiased Teacher v2: Semi-Supervised Object Detection for Anchor-Free and Anchor-Based Detectors
- MUM: Mix Image Tiles and UnMix Feature Tiles for Semi-Supervised Object Detection
:star:code
- Dense Learning based Semi-Supervised Object Detection
- 弱监督目标检测
- 显著目标检测
- Pyramid Grafting Network for One-Stage High Resolution Saliency Detection
:star:code:newspaper:解读
:newspaper:超高分辨率显著目标检测,新颖高效的错层嫁接架构PGNet(CVPR2022) - Learning from Pixel-Level Noisy Label : A New Perspective for Light Field Saliency Detection
:star:code:newspaper:解读 - Bi-directional Object-context Prioritization Learning for Saliency Ranking
:star:code - Multi-Source Uncertainty Mining for Deep Unsupervised Saliency Detection
- Learning From Pixel-Level Noisy Label: A New Perspective for Light Field Saliency Detection
:star:code
- Pyramid Grafting Network for One-Stage High Resolution Saliency Detection
- 密集目标检测
- Co-Salient目标检测
- 长尾目标检测
- 旋转目标检测
- 关键点检测
- Self-Supervised Equivariant Learning for Oriented Keypoint Detection
:star:code:house:project - UKPGAN: A General Self-Supervised Keypoint Detector
:star:code
:newspaper:粗解 - Contour-Hugging Heatmaps for Landmark Detection
:star:code - Few-Shot Keypoint Detection With Uncertainty Learning for Unseen Species
- 关键点发现
- Self-Supervised Equivariant Learning for Oriented Keypoint Detection
- object discovery
- Affordance grounding
- 图像对齐
- Unsupervised Homography Estimation with Coplanarity-Aware GAN
:star:code:newspaper:解读
- Unsupervised Homography Estimation with Coplanarity-Aware GAN
- 物体属性识别
- Disentangling Visual Embeddings for Attributes and Objects
:open_mouth:oral:star:code
- Disentangling Visual Embeddings for Attributes and Objects
- 消影点检测
- 红外探测线
- OOD
- Deep Hybrid Models for Out-of-Distribution Detection
- Rethinking Reconstruction Autoencoder-Based Out-of-Distribution Detection
- Evading the Simplicity Bias: Training a Diverse Set of Models Discovers Solutions with Superior OOD Generalization
:sunflower:dataset - PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures
:star:code - The Two Dimensions of Worst-Case Training and Their Integrated Effect for Out-of-Domain Generalization
- Out-of-Distribution Generalization With Causal Invariant Transformations
- ViM: Out-Of-Distribution with Virtual-logit Matching
:star:code - OoD-Bench: Quantifying and Understanding Two Dimensions of Out-of-Distribution Generalization
:star:code - Neural Mean Discrepancy for Efficient Out-of-Distribution Detection
- 开放世界目标检测
- 域适应目标检测
- 密集目标检测
- 图像复制检测
- 变化检测
- 图像识别
4.Image Captioning(图像字幕)
- X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning
:star:code - Quantifying Societal Bias Amplification in Image Captioning
- NICGSlowDown: Evaluating the Efficiency Robustness of Neural Image Caption Generation Models
- It is Okay to Not Be Okay: Overcoming Emotional Bias in Affective Image Captioning by Contrastive Data Collection
:star:code:house:project - Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning
- DIFNet: Boosting Visual Information Flow for Image Captioning
:star:code
:newspaper:解读 - VisualGPT: Data-Efficient Adaptation of Pretrained Language Models for Image Captioning
:star:code - Comprehending and Ordering Semantics for Image Captioning
:star:code
:newspaper:解读 - DeeCap: Dynamic Early Exiting for Efficient Image Captioning
:star:code - Show, Deconfound and Tell: Image Captioning With Causal Inference
:star:code - Scaling Up Vision-Language Pre-Training for Image Captioning
:sunflower:dataset - NICGSlowDown: Evaluating the Efficiency Robustness of Neural Image Caption Generation Models
:star:code - Injecting Semantic Concepts Into End-to-End Image Captioning
- Novel Object Captioning
3.Image Progress(图像处理)
- 图像恢复
- Attentive Fine-Grained Structured Sparsity for Image Restoration
:star:code:newspaper:解读 - Uformer: A General U-Shaped Transformer for Image Restoration
:star:code - Burst Image Restoration and Enhancement
:open_mouth:oral:star:code - BNUDC: A Two-Branched Deep Neural Network for Restoring Images From Under-Display Cameras
- Restormer: Efficient Transformer for High-Resolution Image Restoration
:open_mouth:oral:star:code - TransWeather: Transformer-Based Restoration of Images Degraded by Adverse Weather Conditions
:star:code - Deep Generalized Unfolding Networks for Image Restoration
:star:code - Self-Supervised Deep Image Restoration via Adaptive Stochastic Gradient Langevin Dynamics
:star:code - All-in-One Image Restoration for Unknown Corruption
:star:code - Exploring and Evaluating Image Restoration Potential in Dynamic Scenes
:star:code - KNN Local Attention for Image Restoration
- Attentive Fine-Grained Structured Sparsity for Image Restoration
- 图像修复
- Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding
:star:code:newspaper:粗解 - MAT: Mask-Aware Transformer for Large Hole Image Inpainting
:star:code - Reduce Information Loss in Transformers for Pluralistic Image Inpainting
:star:code - UniCoRN: A Unified Conditional Image Repainting Network
- Dual-Path Image Inpainting With Auxiliary GAN Inversion
- MISF: Multi-Level Interactive Siamese Filtering for High-Fidelity Image Inpainting
:star:code
- Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding
- 图像拼接
- 运动去模糊
- image outpainting
- 图像美学评估
- 图像质量评估
- 图像去雨
- 图像去模糊
- Learning to Deblur using Light Field Generated and Real Defocus Images
:star:code:house:project - Pixel Screening Based Intermediate Correction for Blind Deblurring
- Deblurring via Stochastic Refinement
- XYDeblur: Divide and Conquer for Single Image Deblurring
- Towards Multi-Domain Single Image Dehazing via Test-Time Training
- Learning to Deblur using Light Field Generated and Real Defocus Images
- 图像压缩
- SASIC: Stereo Image Compression With Latent Shifts and Stereo Attention
:star:code - Global Sensing and Measurements Reuse for Image Compressed Sensing
:star:code - DPICT: Deep Progressive Image Compression Using Trit-Planes
:open_mouth:oral:star:code - Joint Global and Local Hierarchical Priors for Learned Image Compression
:star:code - Neural Data-Dependent Transform for Learned Image Compression
:star:code:house:project - LC-FDNet: Learned Lossless Image Compression With Frequency Decomposition Network
:star:code - ELIC: Efficient Learned Image Compression With Unevenly Grouped Space-Channel Contextual Adaptive Coding
:open_mouth:oral - Deep Stereo Image Compression via Bi-Directional Coding
- Unified Multivariate Gaussian Mixture for Efficient Neural Image Compression
:star:code - The Devil Is in the Details: Window-Based Attention for Image Compression
:star:code
- SASIC: Stereo Image Compression With Latent Shifts and Stereo Attention
- 图像无损压缩
- 图像去噪
- CVF-SID: Cyclic multi-Variate Function for Self-Supervised Image Denoising by Disentangling Noise from Image
:star:code - NAN: Noise-Aware NeRFs for Burst-Denoising
- Blind2Unblind: Self-Supervised Image Denoising With Visible Blind Spots
:star:code - AP-BSN: Self-Supervised Denoising for Real-World Images via Asymmetric PD and Blind-Spot Network
:star:code - RePaint: Inpainting Using Denoising Diffusion Probabilistic Models
:star:code - Noise Distribution Adaptive Self-Supervised Image Denoising Using Tweedie Distribution and Score Matching
- CVF-SID: Cyclic multi-Variate Function for Self-Supervised Image Denoising by Disentangling Noise from Image
- 图像去雾
- De-rendering
- Learning sRGB-to-Raw-RGB De-rendering with Content-Aware Metadata
:star:code:newspaper:解读 - De-Rendering 3D Objects in the Wild
:star:code - IDR: Self-Supervised Image Denoising via Iterative Data Refinement
:star:code - RADU: Ray-Aligned Depth Update Convolutions for ToF Data Denoising
:star:code - Self-augmented Unpaired Image Dehazing via Density and Depth Decomposition
:star:code
:newspaper:解读
:newspaper:D4:非成对图像去雾,基于密度与深度分解的自增强方法(CVPR 2022)
- Learning sRGB-to-Raw-RGB De-rendering with Content-Aware Metadata
- 图像增强
- Toward Fast, Flexible, and Robust Low-Light Image Enhancement
:open_mouth:oral:star:code:newspaper:解读
:newspaper:SCI:快速、灵活与稳健的低光照图像增强方法(CVPR 2022 Oral) - AdaInt: Learning Adaptive Intervals for 3D Lookup Tables on Real-time Image Enhancement
:star:code - Directional Self-supervised Learning for Heavy Image Augmentations
:star:code
:newspaper:解读 - Abandoning the Bayer-Filter To See in the Dark
:star:code - URetinex-Net: Retinex-Based Deep Unfolding Network for Low-Light Image Enhancement
:star:code - GIQE: Generic Image Quality Enhancement via Nth Order Iterative Degradation
- Deep Color Consistent Network for Low-Light Image Enhancement
- SNR-Aware Low-Light Image Enhancement
:star:code
- Toward Fast, Flexible, and Robust Low-Light Image Enhancement
- 图像和谐化
- 图像超级补全
- Scene Graph Expansion for Semantics-Guided Image Outpainting
该文解决了一个非常有意思的问题,通过对图像场景图的扩展,对图像边缘以外的内容进行语义引导的内容生成,可帮助设计师快速绘就自然和谐的图像扩展内容。
- Scene Graph Expansion for Semantics-Guided Image Outpainting
- 语义图像匹配
- TransforMatcher: Match-to-Match Attention for Semantic Correspondence
:star:code:house:project
:newspaper:解读
- TransforMatcher: Match-to-Match Attention for Semantic Correspondence
- 图像修饰
- 图像着色
- 图像校正
- 图像分解
- 图像重建
- 图像配准
- A Variational Bayesian Method for Similarity Learning in Non-Rigid Image Registration
:star:code - NODEO: A Neural Ordinary Differential Equation Based Optimization Framework for Deformable Image Registration
- RFNet: Unsupervised Network for Mutually Reinforcing Multi-Modal Image Registration and Fusion
- Aladdin: Joint Atlas Building and Diffeomorphic Registration Learning With Pairwise Alignment
:star:code
- A Variational Bayesian Method for Similarity Learning in Non-Rigid Image Registration
- 图像编辑
- 图像缩放
- 图像色彩编辑
- 图像拼图
- 图像裁剪
- 图像补全
- 基于文本指导的图像操作
- Image Dewarping
- 恶劣天气消除
- Image Outpainting
- 消除阴影
- 图像隐写术
- 声音引导的语义图像处理
- Sound-Guided Semantic Image Manipulation
:star:code:house:project
- Sound-Guided Semantic Image Manipulation
- 用于文本驱动的自然图像编辑
- 伪影去除
2.Image Segmentation(图像分割)
- FocalClick: Towards Practical Interactive Image Segmentation
:star:code:newspaper:粗解 - Multimodal Material Segmentation
- Semantic-Aware Domain Generalized Segmentation
:open_mouth:oral:star:code - ReSTR: Convolution-free Referring Image Segmentation Using Transformers
:star:code:house:project - CRIS: CLIP-Driven Referring Image Segmentation
- Panoptic Neural Fields: A Semantic Object-Aware Neural Scene Representation
:house:project
全景神经场:谷歌新提出的语义级目标感知的神经场景表示模型。该表示模型可以有效地用于新视图合成、2D 全景分割、3D 场景编辑和多视图深度预测等多项任务。相信这又会是一个引领潮流的新方向。 - FocusCut: Diving Into a Focus View in Interactive Segmentation
:house:project - Hyperbolic Image Segmentation
:star:code - Clustering Plotted Data by Image Segmentation
:star:code - Generalizable Cross-Modality Medical Image Segmentation via Style Augmentation and Dual Normalization
:star:code - Image Segmentation Using Text and Image Prompts
:star:code
:newspaper:CLIP还能做分割任务?哥廷根大学提出一个使用文本和图像prompt,能同时作三个分割任务的模型CLIPSeg,榨干CLIP能力 - ISDNet: Integrating Shallow and Deep Networks for Efficient Ultra-high Resolution Segmentation
:star:code
:newspaper:解读 - Adaptive Early-Learning Correction for Segmentation From Noisy Annotations
:star:code - Weakly Supervised Segmentation on Outdoor 4D Point Clouds With Temporal Matching and Spatial Graph Propagation
- Masked-Attention Mask Transformer for Universal Image Segmentation
:star:code:house:project
:newspaper:能同时做三个分割任务的模型,性能和效率优于MaskFormer!Meta&UIUC提出通用分割模型,性能优于任务特定模型!开源! - High Quality Segmentation for Ultra High-Resolution Images
:star:code - LAVT: Language-Aware Vision Transformer for Referring Image Segmentation
:newspaper:性能超群!牛津&上海AI Lab&港大&商汤&清华强强联手,提出用于引用图像分割的语言感知视觉Transformer!代码已开源 - 实例分割
- E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation
:star:code:newspaper:粗解 - Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling
:star:code - Sparse Instance Activation for Real-Time Instance Segmentation
:star:code - SharpContour: A Contour-based Boundary Refinement Approach for Efficient and Accurate Instance Segmentation
:house:project - Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity
:star:code:house:project - DArch: Dental Arch Prior-assisted 3D Tooth Instance Segmentation
- Relieving Long-tailed Instance Segmentation via Pairwise Class Balance
:star:code:newspaper:解读 - ContrastMask: Contrastive Learning to Segment Every Thing
:newspaper:解读
基于像素级对比学习的不完全监督实例分割算法 - GASP, a Generalized Framework for Agglomerative Clustering of Signed Graphs and Its Application to Instance Segmentation
:star:code - TWIST: Two-Way Inter-Label Self-Training for Semi-Supervised 3D Instance Segmentation
:star:code - Pointly-Supervised Instance Segmentation
:open_mouth:oral:star:code:house:project - Instance Segmentation With Mask-Supervised Polygonal Boundary Transformers
:star:code - Beyond Semantic to Instance Segmentation: Weakly-Supervised Instance Segmentation via Semantic Knowledge Transfer and Self-Refinement
:star:code - Sparse Object-Level Supervision for Instance Segmentation With Pixel Embeddings
:star:code - Mask Transfiner for High-Quality Instance Segmentation
:star:code - 半监督实例分割
- 3D 实例分割
- SoftGroup for 3D Instance Segmentation on Point Clouds
:star:code:newspaper:粗解
- SoftGroup for 3D Instance Segmentation on Point Clouds
- 🐦️FreeSOLO: Learning to Segment Objects without Annotations
:star:code - 小样本分割
- E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation
- 语义分割
- Generalized Few-Shot Semantic Segmentation
:star:code - Scribble-Supervised LiDAR Semantic Segmentation
:open_mouth:oral:star:code - Novel Class Discovery in Semantic Segmentation
:star:code:house:project - Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation
:star:code - Semi-Supervised Video Semantic Segmentation With Inter-Frame Feature Reconstruction
:star:code - Pin the Memory: Learning to Generalize Semantic Segmentation
:star:code:newspaper:解读 - Representation Compensation Networks for Continual Semantic Segmentation
:star:code - Tree Energy Loss: Towards Sparsely Annotated Semantic Segmentation
:star:code:newspaper:解读 - GroupViT: Semantic Segmentation Emerges from Text Supervision
:star:code:house:project:tv:video
:newspaper:做语义分割不用任何像素标签,UCSD、英伟达在ViT中加入分组模块 - Bending Reality: Distortion-aware Transformers for Adapting to Panoramic Semantic Segmentation
:star:code:newspaper:粗解 - Deep Hierarchical Semantic Segmentation
:star:code - Semantic Segmentation by Early Region Proxy
:star:code:newspaper:粗解 - SimT: Handling Open-set Noise for Domain Adaptive Semantic Segmentation
:star:code - Rethinking Semantic Segmentation: A Prototype View
:open_mouth:oral:star:code - On the Road to Online Adaptation for Semantic Image Segmentation
:star:code - Threshold Matters in WSSS: Manipulating the Activation for the Robust and Accurate Segmentation Model Against Thresholds
:star:code - NightLab: A Dual-level Architecture with Hardness Detection for Segmentation at Night
:star:code:newspaper:解读 - TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation
:star:code - Cross-Image Relational Knowledge Distillation for Semantic Segmentation
:star:code:newspaper:解读 - Dynamic Prototype Convolution Network for Few-Shot Semantic Segmentation
- Unsupervised Hierarchical Semantic Segmentation with Multiview Cosegmentation and Clustering Transformers
:star:code - Self-Supervised Learning of Object Parts for Semantic Segmentation
:star:code - Cross-view Transformers for real-time Map-view Semantic Segmentation
:open_mouth:oral:star:code - Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised Semantic Segmentation and Localization
:house:project - Point-to-Voxel Knowledge Distillation for LiDAR Semantic Segmentation
:star:code:newspaper:解读 - Real-Time, Accurate, and Consistent Video Semantic Segmentation via Unsupervised Adaptation and Cross-Unit Deployment on Mobile Device
- Partial Class Activation Attention for Semantic Segmentation
:star:code - Incremental Learning in Semantic Segmentation From Image Labels
:star:code - HybridCR: Weakly-Supervised 3D Point Cloud Semantic Segmentation via Hybrid Contrastive Regularization
:newspaper:解读 - ADeLA: Automatic Dense Labeling With Attention for Viewpoint Shift in Semantic Segmentation
- Domain-Agnostic Prior for Transfer Semantic Segmentation
- Class Similarity Weighted Knowledge Distillation for Continual Semantic Segmentation
- Sparse and Complete Latent Organization for Geospatial Semantic Segmentation
- 3D语义分割
- 弱监督语义分割
- Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation
:star:code:newspaper:粗解 - Self-supervised Image-specific Prototype Exploration for Weakly Supervised Semantic Segmentation
:star:code - Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation
:star:code - Cross Language Image Matching for Weakly Supervised Semantic Segmentation
:star:code - Multi-class Token Transformer for Weakly Supervised Semantic Segmentation
:star:code - Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers
:star:code:newspaper:解读 - Weakly Supervised Semantic Segmentation using Out-of-Distribution Data
:star:code:newspaper:粗解 - L2G: A Simple Local-to-Global Knowledge Transfer Framework for Weakly Supervised Semantic Segmentation
:star:code - Weakly Supervised Semantic Segmentation by Pixel-to-Prototype Contrast
- CLIMS: Cross Language Image Matching for Weakly Supervised Semantic Segmentation
:star:code - Regional Semantic Contrast and Aggregation for Weakly Supervised Semantic Segmentation
:star:code - C2AM: Contrastive Learning of Class-Agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation
:star:code
- Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation
- Towards Noiseless Object Contours for Weakly Supervised Semantic Segmentation
:star:code - 无监督语义分割
- 半监督语义分割
- Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels
:star:code:house:project - Semi-supervised Semantic Segmentation with Error Localization Network
:star:code:house:project:newspaper:粗解 - UCC: Uncertainty guided Cross-head Co-training for Semi-Supervised Semantic Segmentation
- Perturbed and Strict Mean Teachers for Semi-Supervised Semantic Segmentation
:star:code - Unbiased Subclass Regularization for Semi-Supervised Semantic Segmentation
:star:code - ST++: Make Self-Training Work Better for Semi-Supervised Semantic Segmentation
:star:code
- Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels
- 域适应语义分割
- Towards Fewer Annotations: Active Learning via Region Impurity and Prediction Uncertainty for Domain Adaptive Semantic Segmentation
:star:code - ADAS: A Direct Adaptation Strategy for Multi-Target Domain Adaptive Semantic Segmentation
- Class-Balanced Pixel-Level Self-Labeling for Domain Adaptive Semantic Segmentation
:star:code - DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation
:star:code
- Towards Fewer Annotations: Active Learning via Region Impurity and Prediction Uncertainty for Domain Adaptive Semantic Segmentation
- 域泛化语义分割
- 零样本语义分割
- 小样本语义分割
- 跨域语义分割
- Generalized Few-Shot Semantic Segmentation
- 动作分割
- 场景解析
- 雾景分割
- FIFO: Learning Fog-invariant Features for Foggy Scene Segmentation
:open_mouth:oral:star:code:house:project
- FIFO: Learning Fog-invariant Features for Foggy Scene Segmentation
- 全景分割
- Panoptic, Instance and Semantic Relations: A Relational Context Encoder to Enhance Panoptic Segmentation
- Joint Forecasting of Panoptic Segmentations with Difference Attention
:star:code:newspaper:解读 - PanopticDepth: A Unified Framework for Depth-aware Panoptic Segmentation
:star:code:newspaper:解读 - Amodal Panoptic Segmentation
:house:project - Panoptic-PHNet: Towards Real-Time and High-Precision LiDAR Panoptic Segmentation via Clustering Pseudo Heatmap
- CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation
- Panoptic SegFormer: Delving Deeper Into Panoptic Segmentation With Transformers
:star:code
- 抠图
- 玻璃分割
- Amodal Segmentation
- 场景理解
- 人体解析
- Part Segmentation
- 小样本分割
- 3D分割
- 零件分割
- PartGlot: Learning Shape Part Segmentation From Language Reference Games
:open_mouth:oral:star:code
- PartGlot: Learning Shape Part Segmentation From Language Reference Games
1.其它
- Learning to Anticipate Future with Dynamic Context Removal
:star:code:newspaper:粗解 - Learning Optimal K-space Acquisition and Reconstruction using Physics-Informed Neural Networks
- Instance-wise Occlusion and Depth Orders in Natural Scenes
:star:code - IFOR: Iterative Flow Minimization for Robotic Object Rearrangement
:house:project - PINA: Learning a Personalized Implicit Neural Avatar from a Single RGB-D Video Sequence
:star:code:house:project:tv:video:newspaper:粗解 - LiT: Zero-Shot Transfer with Locked-image text Tuning
- CAFE: Learning to Condense Dataset by Aligning Features
:star:code:newspaper:粗解 - BatchFormer: Learning to Explore Sample Relationships for Robust Representation Learning
:star:code:newspaper:粗解:notebook: - ACVNet: Attention Concatenation Volume for Accurate and Efficient Stereo Matching
:star:code:newspaper:粗解 - Polarity Sampling: Quality and Diversity Control of Pre-Trained Generative Networks via Singular Values
:star:code - Do Explanations Explain? Model Knows Best
:star:code - HDNet: High-resolution Dual-domain Learning for Spectral Compressive Imaging
:star:code - E-CIR: Event-Enhanced Continuous Intensity Recovery
:star:code - 🐦️Transferability Estimation using Bhattacharyya Class Separability
- Interpretable part-whole hierarchies and conceptual-semantic relationships in neural networks
:star:code - GlideNet: Global, Local and Intrinsic based Dense Embedding NETwork for Multi-category Attributes Prediction
:star:code - Differentially Private Federated Learning with Local Regularization and Sparsification
- Towards Efficient and Scalable Sharpness-Aware Minimization
:star:code - DeltaCNN: End-to-End CNN Inference of Sparse Frame Differences in Videos
- Probabilistic Warp Consistency for Weakly-Supervised Semantic Correspondences
:star:code:newspaper:粗解 - Dynamic Dual-Output Diffusion Models
- Moving Window Regression: A Novel Approach to Ordinal Regression
- Egocentric Prediction of Action Target in 3D
- Compositional Temporal Grounding
with Structured Variational Cross-Graph Correspondence Learning
:star:code - Hierarchical Nearest Neighbor Graph Embedding for Efficient Dimensionality Reduction
:star:code - Neural Reflectance for Shape Recovery with Shadow Handling
:star:code - DyRep: Bootstrapping Training with Dynamic Re-parameterization
:star:code - Enhancing Classifier Conservativeness and Robustness by Polynomiality
- Versatile Multi-Modal Pre-Training for Human-Centric Perception
:star:code - Attributable Visual Similarity Learning
:star:code - Optimizing Elimination Templates by Greedy Parameter Search
- Partially Does It: Towards Scene-Level FG-SBIR with Partial Input
- Bi-level Doubly Variational Learning for Energy-based Latent Variable Models
- Brain-inspired Multilayer Perceptron with Spiking Neurons
- ARCS: Accurate Rotation and Correspondence Search
:star:code - iPLAN: Interactive and Procedural Layout Planning
- HINT: Hierarchical Neuron Concept Explainer
:star:code - Visual Abductive Reasoning
:star:code - A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved Neural Network Calibration
:star:code - Learning Structured Gaussians to Approximate Deep Ensembles
- Self-Supervised Image Representation Learning with Geometric Set Consistency
- Balanced Multimodal Learning via On-the-fly Gradient Modulation
:open_mouth:oral:star:code - CNN Filter DB: An Empirical Investigation of Trained Convolutional Filters
:star:code - Eigencontours: Novel Contour Descriptors Based on Low-Rank Approximation
:open_mouth:oral - Pop-Out Motion: 3D-Aware Image Deformation via Learning the Shape Laplacian
- Long-term Visual Map Sparsification with Heterogeneous GNN
- Clean Implicit 3D Structure from Noisy 2D STEM Images
- Equivariance Allows Handling Multiple Nuisance Variables When Analyzing Pooled Neuroimaging Datasets
:star:code - CaDeX: Learning Canonical Deformation Coordinate Space for Dynamic Surface Representation via Neural Homeomorphism
:star:code:house:project - Fast Light-Weight Near-Field Photometric Stereo
- Fast, Accurate and Memory-Efficient Partial Permutation Synchronization
- Multi-Robot Active Mapping via Neural Bipartite Graph Matching
- Learning Program Representations for Food Images and Cooking Recipes
:open_mouth:oral:star:code:house:project - Iterative Deep Homography Estimation
:star:code - Practical Learned Lossless JPEG Recompression with Multi-Level Cross-Channel Entropy Model in the DCT Domain
- Generating High Fidelity Data from Low-density Regions using Diffusion Models
- Continuous Scene Representations for Embodied AI
:star:code:house:project - It's All In the Teacher: Zero-Shot Quantization Brought Closer to the Teacher
- End-to-End Trajectory Distribution Prediction Based on Occupancy Grid Maps
- Reflection and Rotation Symmetry Detection via Equivariant Learning
:star:code:house:project - Exploiting Explainable Metrics for Augmented SGD
:star:code - On the Importance of Asymmetry for Siamese Representation Learning
:star:code - Unimodal-Concentrated Loss: Fully Adaptive Label Distribution Learning for Ordinal Regression
- Perception Prioritized Training of Diffusion Models
:star:code - LASER: LAtent SpacE Rendering for 2D Visual Localization
:open_mouth:oral - Efficient Maximal Coding Rate Reduction by Variational Forms
- Exemplar-bsaed Pattern Synthesis with Implicit Periodic Field Network
- Progressive Minimal Path Method with Embedded CNN
- Online Convolutional Re-parameterization
:star:code - Consistency driven Sequential Transformers Attention Model for Partially Observable Scenes
:star:code - Leveraging Equivariant Features for Absolute Pose Regression
- Neural Convolutional Surfaces
:house:project - GLASS: Geometric Latent Augmentation for Shape Spaces
:star:code:house:project - Total Variation Optimization Layers for Computer Vision
- Identifying Ambiguous Similarity Conditions via Semantic Matching
:star:code:newspaper:解读 - TemporalUV: Capturing Loose Clothing with Temporally Coherent UV Coordinates
- Gravitationally Lensed Black Hole Emission Tomography
:star:code:house:project:tv:video - Robust and Accurate Superquadric Recovery: a Probabilistic Approach
:star:code - Projective Manifold Gradient Layer for Deep Rotation Regression
:star:code - Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale
:star:code - Single-Photon Structured Light
- Explaining Deep Convolutional Neural Networks via Latent Visual-Semantic Filter Attention
:open_mouth:oral:star:code - Defensive Patches for Robust Recognition in the Physical World
:star:code:newspaper:解读 - Event-aided Direct Sparse Odometry
:open_mouth:oral:star:code:house:project:tv:video - Deep Unlearning via Randomized Conditionally Independent Hessians
:star:code - Learning to Imagine: Diversify Memory for Incremental Learning using Unlabeled Data
- Towards Data-Free Model Stealing in a Hard Label Setting
:star:code:house:project - Proto2Proto: Can you recognize the car, the way I do?
:star:code - Balanced MSE for Imbalanced Visual Regression
:open_mouth:oral:star:code
:newspaper:CVPR 2022 (Oral) | 回归标签不平衡? 试试Balanced MSE - Leveraging Unlabeled Data for Sketch-based Understanding
- Fixing Malfunctional Objects With Learned Physical Simulation and Functional Prediction
:star:code:house:project - Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs
:star:code:newspaper:解读 - RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality
:star:code:newspaper:解读 - An Image Patch is a Wave: Quantum Inspired Vision MLP
:open_mouth:oral:star:code - A ConvNet for the 2020s
:star:code - NeuralHDHair: Automatic High-fidelity Hair Modeling from a Single Image Using Implicit Neural Representations
头发建模:仅用一幅图像,构建高保真度的头发模型,使用隐式神经表示的方法。出自浙大CAD&CG组、ETH Zurich、香港城市大学。 - A Unified Framework for Implicit Sinkhorn Differentiation
:star:code
:newspaper:解读 - Towards Better Understanding Attribution Methods
:star:code - Universal Photometric Stereo Network using Global Lighting Contexts
:star:code:house:project:tv:video:newspaper:解读 - Estimating Example Difficulty Using Variance of Gradients
- One Loss for Quantization: Deep Hashing with Discrete Wasserstein Distributional Matching
- Holocurtains: Programming Light Curtains via Binary Holography
- Do Learned Representations Respect Causal Relationships?
- CAPRI-Net: Learning Compact CAD Shapes With Adaptive Primitive Assembly
- Mixed Differential Privacy in Computer Vision
- Which Model To Transfer? Finding the Needle in the Growing Haystack
- Learning Soft Estimator of Keypoint Scale and Orientation With Probabilistic Covariant Loss
:star:code - RAGO: Recurrent Graph Optimizer For Multiple Rotation Averaging
:star:code - Virtual Elastic Objects
:house:project - Bayesian Invariant Risk Minimization
:star:code - Shape From Polarization for Complex Scenes in the Wild
:star:code - Non-Iterative Recovery from Nonlinear Observations using Generative Models
- Moving Window Regression: A Novel Approach to Ordinal Regression
:star:code - Generative Flows With Invertible Attentions
- Clipped Hyperbolic Classifiers Are Super-Hyperbolic Classifiers
- The Flag Median and FlagIRLS
- Implicit Feature Decoupling With Depthwise Quantization
:star:code - UNIST: Unpaired Neural Implicit Shape Translation Network
:star:code:house:project - Mutual Information-Driven Pan-Sharpening
- A Framework for Learning Ante-Hoc Explainable Models via Concepts
- SeeThroughNet: Resurrection of Auxiliary Loss by Preserving Class Probability Information
- Learning ABCs: Approximate Bijective Correspondence for Isolating Factors of Variation With Weak Supervision
:star:code - Convolutions for Spatial Interaction Modeling
- FastDOG: Fast Discrete Optimization on GPU
:star:code - Convolution of Convolution: Let Kernels Spatially Collaborate
:star:code - Generalized Category Discovery
:star:code:house:project - Maximum Consensus by Weighted Influences of Monotone Boolean Functions
- Divide and Conquer: Compositional Experts for Generalized Novel Class Discovery
:star:code - Fast Algorithm for Low-Rank Tensor Completion in Delay-Embedded Space
- Less Is More: Generating Grounded Navigation Instructions From Landmarks
- HEAT: Holistic Edge Attention Transformer for Structured Reconstruction
:star:code:house:project - Instance-Dependent Label-Noise Learning With Manifold-Regularized Transition Matrix Estimation
- Node Representation Learning in Graph via Node-to-Neighbourhood Mutual Information Maximization
:star:code - How Well Do Sparse ImageNet Models Transfer?
:star:code - REX: Reasoning-Aware and Grounded Explanation
:star:code - Coherent Point Drift Revisited for Non-Rigid Shape Matching and Registration
- Hire-MLP: Vision MLP via Hierarchical Rearrangement
:star:code - One-Bit Active Query With Contrastive Pairs
- Sparse Non-Local CRF
- Dataset Distillation by Matching Training Trajectories
:star:code:house:project - Deep Decomposition for Stochastic Normal-Abnormal Transport
:open_mouth:oral - Parametric Scattering Networks
:star:code - ScaleNet: A Shallow Architecture for Scale Estimation
:star:code - Learning To Solve Hard Minimal Problems
- Learning Canonical F-Correlation Projection for Compact Multiview Representation
- CellTypeGraph: A New Geometric Computer Vision Benchmark
:star:code - RIDDLE: Lidar Data Compression With Range Image Deep Delta Encoding
- HODEC: Towards Efficient High-Order DEcomposed Convolutional Neural Networks
- Smooth Maximum Unit: Smooth Activation Function for Deep Networks Using Smoothing Maximum Technique
- Learning Invisible Markers for Hidden Codes in Offline-to-Online Photography
- Task2Sim: Towards Effective Pre-Training and Transfer From Synthetic Data
:star:code:house:project - Neural Prior for Trajectory Estimation
- ActiveZero: Mixed Domain Learning for Active Stereovision with Zero Annotation
:star:code - Global-Aware Registration of Less-Overlap RGB-D Scans
- Efficient Deep Embedded Subspace Clustering
- Rep-Net: Efficient On-Device Learning via Feature Reprogramming
:star:code - WALT: Watch and Learn 2D Amodal Representation From Time-Lapse Imagery
- FLAVA: A Foundational Language and Vision Alignment Model
:star:code:house:project - Scanline Homographies for Rolling-Shutter Plane Absolute Pose
:star:code - Exemplar-based Pattern Synthesis with Implicit Periodic Field Network
- Understanding Uncertainty Maps in Vision With Statistical Testing
:star:code - B-Cos Networks: Alignment Is All We Need for Interpretability
- Learning to Collaborate in Decentralized Learning of Personalized Models
:newspaper:解读 - 360-Attack: Distortion-Aware Perturbations From Perspective-Views
- A Unified Model for Line Projections in Catadioptric Cameras With Rotationally Symmetric Mirrors
- A Hybrid Quantum-Classical Algorithm for Robust Fitting
:star:code - Topology Preserving Local Road Network Estimation From Single Onboard Camera Image
:star:code - RendNet: Unified 2D/3D Recognizer With Latent Space Rendering
- Towards Real-World Navigation With Deep Differentiable Planners
:star:code - An Iterative Quantum Approach for Transformation Estimation From Point Sets
- UnweaveNet: Unweaving Activity Stories
:star:code - Faithful Extreme Rescaling via Generative Prior Reciprocated Invertible Representations
:star:code - Learning Video Representations of Human Motion From Synthetic Data
- TVConv: Efficient Translation Variant Convolution for Layout-Aware Visual Processing
:star:code - The Probabilistic Normal Epipolar Constraint for Frame-to-Frame Rotation Optimization Under Uncertain Feature Positions
:star:code:house:project - Simple but Effective: CLIP Embeddings for Embodied AI
:star:code - Interactive Disentanglement: Learning Concepts by Interacting with their Prototype Representations
:star:code - Recall@k Surrogate Loss With Large Batches and Similarity Mixup
:star:code - Bending Graphs: Hierarchical Shape Matching Using Gated Optimal Transport
:star:code - Nested Hyperbolic Spaces for Dimensionality Reduction and Hyperbolic NN Design
- HeadNeRF: A Real-Time NeRF-Based Parametric Head Model
:star:code - Replacing Labeled Real-Image Datasets With Auto-Generated Contours
:star:code:house:project - Pushing the Envelope of Gradient Boosting Forests via Globally-Optimized Oblique Trees
- Omnivore: A Single Model for Many Visual Modalities
:house:project - Leveling Down in Computer Vision: Pareto Inefficiencies in Fair Deep Classifiers
- Open-Domain, Content-Based, Multi-Modal Fact-Checking of Out-of-Context Images via Online Resources
:star:code:house:project - Memory-Augmented Deep Conditional Unfolding Network for Pan-Sharpening
:star:code - HVH: Learning a Hybrid Neural Volumetric Representation for Dynamic Hair Performance Capture
:house:project - Deep Image-based Illumination Harmonization
:star:code - Ditto: Building Digital Twins of Articulated Objects From Interaction
:open_mouth:oral:star:code:house:project - TO-FLOW: Efficient Continuous Normalizing Flows With Temporal Optimization Adjoint With Moving Speed
- Masked Autoencoders Are Scalable Vision Learners
- Neural Inertial Localization
:star:code:house:project - Neural Recognition of Dashed Curves With Gestalt Law of Continuity
- BACON: Band-Limited Coordinate Networks for Multiscale Scene Representation
:house:project - Merry Go Round: Rotate a Frame and Fool a DNN
- Modeling sRGB Camera Noise With Normalizing Flows
:house:project - Co-Advise: Cross Inductive Bias Distillation
:star:code - Automatic Relation-Aware Graph Network Proliferation
:star:code - Stereo Magnification With Multi-Layer Images
:star:code:house:project - CO-SNE: Dimensionality Reduction and Visualization for Hyperbolic Data
- Rethinking Controllable Variational Autoencoders
- BigDL 2.0: Seamless Scaling of AI Pipelines From Laptops to Distributed Cluster
- HARA: A Hierarchical Approach for Robust Rotation Averaging
:star:code - Diffusion Autoencoders: Toward a Meaningful and Decodable Representation
:open_mouth:oral:star:code:house:project - Learning Fair Classifiers with Partially Annotated Group Labels
:star:code - Come-Closer-Diffuse-Faster: Accelerating Conditional Diffusion Models for Inverse Problems Through Stochastic Contraction
- High-Fidelity Human Avatars From a Single RGB Camera
:star:code:house:project - RIO: Rotation-Equivariance Supervised Learning of Robust Inertial Odometry
- How Good Is Aesthetic Ability of a Fashion Model?
:star:code - Learning With Neighbor Consistency for Noisy Labels
- GeoEngine: A Platform for Production-Ready Geospatial Research
- Using 3D Topological Connectivity for Ghost Particle Reduction in Flow Reconstruction
- On the Integration of Self-Attention and Convolution
:star:code - Towards Better Plasticity-Stability Trade-Off in Incremental Learning: A Simple Linear Connector
:star:code - MAXIM: Multi-Axis MLP for Image Processing
:open_mouth:oral:star:code - Delving Into the Estimation Shift of Batch Normalization in a Network
:star:code - Learning Object Context for Novel-View Scene Layout Generation
- Dist-PU: Positive-Unlabeled Learning From a Label Distribution Perspective
- Relative Pose From a Calibrated and an Uncalibrated Smartphone Image
- The Devil Is in the Margin: Margin-Based Label Smoothing for Network Calibration
:star:code - The Neurally-Guided Shape Parser: Grammar-Based Labeling of 3D Shape Regions With Approximate Inference
:star:code - AutoLoss-Zero: Searching Loss Functions From Scratch for Generic Tasks
- Scalable Penalized Regression for Noise Detection in Learning With Noisy Labels
:star:code - Parameter-Free Online Test-Time Adaptation
:open_mouth:oral:star:code - AlignMixup: Improving Representations by Interpolating Aligned Features
:star:code - HerosNet: Hyperspectral Explicable Reconstruction and Optimal Sampling Deep Network for Snapshot Compressive Imaging
:star:code - Brain-inspired Multilayer Perceptron with Spiking Neurons
- SplitNets: Designing Neural Architectures for Efficient Distributed Computing on Head-Mounted Systems
- Mega-NERF: Scalable Construction of Large-Scale NeRFs for Virtual Fly-Throughs
- Training Quantised Neural Networks with STE Variants: the Additive Noise Annealing Algorithm
:star:code - Split Hierarchical Variational Compression
- Privacy Preserving Partial Localization
- Can Neural Nets Learn the Same Model Twice? Investigating Reproducibility and Double Descent From the Decision Boundary Perspective
:star:code - Frame Averaging for Equivariant Shape Space Learning
- Training High-Performance Low-Latency Spiking Neural Networks by Differentiation on Spike Representation
:star:code - Co-domain Symmetry for Complex-Valued Deep Learning
- DeepCurrents: Learning Implicit Representations of Shapes With Boundaries
- Paramixer: Parameterizing Mixing Links in Sparse Factors Works Better Than Dot-Product Self-Attention
- Continual Stereo Matching of Continuous Driving Scenes With Growing Architecture
:star:code - Cycle-Consistent Counterfactuals by Latent Transformations
- FAM: Visual Explanations for the Feature Representations From Deep Convolutional Networks
- Local Texture Estimator for Implicit Representation Function
:star:code - Degree-of-Linear-Polarization-Based Color Constancy
- Learning To Learn by Jointly Optimizing Neural Architecture and Weights
- Discrete Time Convolution for Fast Event-Based Stereo
- SelfD: Self-Learning Large-Scale Driving Policies From the Web
- Autofocus for Event Cameras
:star:code:house:project - Super-Fibonacci Spirals: Fast, Low-Discrepancy Sampling of SO(3)
- 3PSDF: Three-Pole Signed Distance Function for Learning Surfaces With Arbitrary Topologies
- PNP: Robust Learning from Noisy Labels by Probabilistic Noise Prediction
- Revisiting the Transferability of Supervised Pretraining: An MLP Perspective
- PLAD: Learning To Infer Shape Programs With Pseudo-Labels and Approximate Distributions
:star:code - Contrastive Conditional Neural Processes
- Visual Vibration Tomography: Estimating Interior Material Properties From Monocular Video
:open_mouth:oral:star:code:house:project - Scenic: A JAX Library for Computer Vision Research and Beyond
:star:code - Calibrating Deep Neural Networks by Pairwise Constraints
- Deep Saliency Prior for Reducing Visual Distraction
:house:project - Efficient Large-Scale Localization by Global Instance Recognition
- VisualHow: Multimodal Problem Solving
:star:code - Learning To Generate Line Drawings That Convey Geometry and Semantics
:star:code:house:project:tv:video - On Guiding Visual Attention With Language Specification
:star:code - Learning To Align Sequential Actions in the Wild
:star:code - A Sampling-Based Approach for Efficient Clustering in Large Datasets
:star:code - AdaSTE: An Adaptive Straight-Through Estimator To Train Binary Neural Networks
- Pooling Revisited: Your Receptive Field Is Suboptimal
- Learning to Find Good Models in RANSAC
:star:code - Image Disentanglement Autoencoder for Steganography Without Embedding
:star:code - Fairness-aware Adversarial Perturbation Towards Bias Mitigation for Deployed Deep Models
- Globetrotter: Connecting Languages by Connecting Images
:open_mouth:oral:star:code:house:project - Symmetry-Aware Neural Architecture for Embodied Visual Exploration
- Deep 3D-to-2D Watermarking: Embedding Messages in 3D Meshes and Extracting Them From 2D Renderings
- Gaussian Process Modeling of Approximate Inference Errors for Variational Autoencoders
- HLRTF: Hierarchical Low-Rank Tensor Factorization for Inverse Problems in Multi-Dimensional Imaging
- DeltaCNN: End-to-End CNN Inference of Sparse Frame Differences in Videos
:star:code - Stereoscopic Universal Perturbations Across Different Architectures and Datasets
:star:code - Learned Queries for Efficient Local Attention
:open_mouth:oral:star:code - Structure-Aware Flow Generation for Human Body Reshaping
:star:code - A Structured Dictionary Perspective on Implicit Neural Representations
- The Implicit Values of a Good Hand Shake: Handheld Multi-Frame Neural Depth Refinement
:open_mouth:oral:star:code:house:project - How Much More Data Do I Need? Estimating Requirements for Downstream Tasks
- GPU-Based Homotopy Continuation for Minimal Problems in Computer Vision
- Enabling Equivariance for Arbitrary Lie Groups
:star:code - Robust Fine-Tuning of Zero-Shot Models
- SOMSI: Spherical Novel View Synthesis With Soft Occlusion Multi-Sphere Images
:star:code:house:project - Compressing Models With Few Samples: Mimicking Then Replacing
:star:code - Weakly but Deeply Supervised Occlusion-Reasoned Parametric Road Layouts
- Exposure Normalization and Compensation for Multiple-Exposure Correction
:star:code - Improving Robustness Against Stealthy Weight Bit-Flip Attacks by Output Code Matching
:star:code - Optimal LED Spectral Multiplexing for NIR2RGB Translation
:star:code - Watch It Move: Unsupervised Discovery of 3D Joints for Re-Posing of Articulated Objects
:star:code:house:project - Transferability Metrics for Selecting Source Model Ensembles
- Adversarial Parametric Pose Prior
:star:code - RAMA: A Rapid Multicut Algorithm on GPU
:star:code - RecDis-SNN: Rectifying Membrane Potential Distribution for Directly Training Spiking Neural Networks
- Complex Backdoor Detection by Symmetric Feature Differencing
- Bilateral Video Magnification Filter
- Disentangling Visual and Written Concepts in CLIP
- Image Animation With Perturbed Masks
:star:code - Hyperspherical Consistency Regularization
扫码CV君微信(注明:CVPR)入微信交流群:
