CVPR-2022-Papers

July 28, 2022 · View on GitHub

5533b620402406dba74eb9a452e32d4

官网链接：https://cvpr2022.thecvf.com/

开会时间：2022年6月19日-6月24日

❣❣❣近日，CVPR 2022 接收论文公布！总计2067篇！，全部论文已发布，多多关注!!

❣❣❣另外打包下载所有论文，可在 【我爱计算机视觉】微信公众号后台回复“paper”。

历年综述论文分类汇总戳这里↘️CV-Surveys施工中~~~~~~~~~~

2022 年论文分类汇总戳这里

↘️CVPR-2022-Papers ↘️WACV-2022-Papers

2021年论文分类汇总戳这里

↘️ICCV-2021-Papers ↘️CVPR-2021-Papers

2020 年论文分类汇总戳这里

↘️CVPR-2020-Papers ↘️ECCV-2020-Papers

目录

:cat:	:dog:	:tiger:	:wolf:
1.其它	2.Image Segmentation(图像分割)	3.Image Progress(图像处理)	4.Image Captioning(图像字幕)
5.Object Detection(目标检测)	6.Object Tracking(目标跟踪)	7.Point Cloud(点云)	8.Action Detection(人体动作检测与识别)
9.Human Pose Estimation(人体姿态估计)	10.3D(三维视觉)	11.Face	12.Image-to-Image Translation(图像到图像翻译)
13.GAN	14.Video	15.Transformer	16.Semi/self-supervised learning(半/自监督)
17.Medical Image(医学影像)	18.Person Re-Identification(人员重识别)	19.Neural Architecture Search(神经架构搜索)	20.Autonomous vehicles(自动驾驶)
21.UAV/Remote Sensing/Satellite Image(无人机/遥感/卫星图像)	22.Image Synthesis/Generation(图像合成)	23.Image Retrieval(图像检索)	24.Super-Resolution(超分辨率)
25.Fine-Grained/Image Classification(细粒度/图像分类)	26.GCN/GNN	27.Pose Estimation(物体姿势估计)	28.Style Transfer(风格迁移)
29.Augmented Reality/Virtual Reality/Robotics(增强/虚拟现实/机器人)	30.Visual Answer Questions(视觉问答)	31.Vision-Language(视觉语言)	32.Data Augmentation(数据增强)
33.Human-Object Interaction(人物交互)	34.Model Compression/Knowledge Distillation/Pruning(模型压缩/知识蒸馏/剪枝)	35.OCR	36.Optical Flow(光流估计)
37.Contrastive Learning(对比学习)	38.Meta-Learning(元学习)	39.Continual Learning(持续学习)	40.Adversarial Learning(对抗学习)
41.Incremental Learning(增量学习)	42.Metric Learning(度量学习)	43.Multi-Task Learning(多任务学习)	44.Federated Learning(联邦学习)
45.Dense Prediction(密集预测)	46.Scene Graph Generation(场景图生成)	47.Few/Zero-Shot Learning/Domain Generalization/Adaptation(小/零样本/域泛化/适应)	48.Visual Grounding
49.Image Geo-localization(图像地理定位)	50.Anomaly Detection(异常检测)	51.光学、几何、光场成像	52.Human Motion Forecasting(人体运动预测)
53.Sign Language Translation(手语翻译)	54.Dataset(数据集)	55.Novel View Synthesis(视图合成)	56.Sound
57.Gaze Estimation(视线估计)	58.Neural rendering(神经渲染)	59.动画	60.Visual Emotion Analysis(视觉情感分析)

聚类
- DeepDPM: Deep Clustering With an Unknown Number of Clusters
  :star:code
场景流
- Exploiting Rigidity Constraints for LiDAR Scene Flow Estimation
  :star:code
图识别
- Improving Subgraph Recognition With Variational Graph Information Bottleneck
  :star:code
运动模糊
- Motion-From-Blur: 3D Shape and Motion Estimation of Motion-Blurred Objects in Videos
人像眼镜和阴影消除
- Portrait Eyeglasses and Shadow Removal by Leveraging 3D Synthetic Data
  :star:code
识别唇语
- Sub-Word Level Lip Reading With Visual Attention
模拟时钟读数
- It's About Time: Analog Clock Reading in the Wild
  :star:code:house:project
指纹识别
- Fingerprinting Deep Neural Networks Globally via Universal Adversarial Perturbations
  :open_mouth:oral
基于草图的图像操作
- SketchEdit: Mask-Free Local Image Manipulation with Partial Sketches
  :star:code:house:project
草图识别
- Finding Badly Drawn Bunnies
去偏移
- Debiased Learning From Naturally Imbalanced Pseudo-Labels
  :star:code
线段分类
- Transformer Based Line Segment Classifier With Image Context for Real-Time Vanishing Point Detection in Manhattan World
Interactive object understanding
- Human Hands as Probes for Interactive Object Understanding
  :star:code:house:project
数字人类
- GOAL: Generating 4D Whole-Body Motion for Hand-Object Grasping
  :star:code:house:project
强化学习
- DECORE: Deep Compression With Reinforcement Learning
视觉关系检测
- A Probabilistic Graphical Model Based on Neural-symbolic Reasoning for Visual Relationship Detection
裂缝识别
- Geometry-Aware Guided Loss for Deep Crack Recognition
眼球认证
- EyePAD++: A Distillation-Based Approach for Joint Eye Authentication and Presentation Attack Detection Using Periocular Images
视听事件定位
- Cross-Modal Background Suppression for Audio-Visual Event Localization
  :star:code
无偏见学习
- A Conservative Approach for Unbiased Learning on Unknown Biases
  :star:code
Object Proposal Generation
- ProposalCLIP: Unsupervised Open-Category Object Proposal Generation via Exploiting CLIP Cues
读唇术
- Multi-Grained Spatio-Temporal Features Perceived Network for Event-Based Lip-Reading
  :star:code:house:project
对应学习
- MS2DG-Net: Progressive Correspondence Learning via Multiple Sparse Semantics Dynamic Graph
  :star:code
视觉定位
- Zero Experience Required: Plug & Play Modular Transfer Learning for Semantic Visual Navigation
  :star:code:house:project
视觉识别
- Causal Transportability for Visual Recognition
  :star:code
- A Simple Episodic Linear Probe Improves Visual Recognition in the Wild
- Contextual Debiasing for Visual Recognition With Causal Mechanisms
Long-term action quality assessment
- Likert Scoring With Grade Decoupling for Long-Term Action Assessment
运动识别
- Decoupling and Recoupling Spatiotemporal Representation for RGB-D-Based Motion Recognition
  :star:code
CNN
- An Image Patch Is a Wave: Phase-Aware Vision MLP
  :star:code
Volume Rendering
- DIVeR: Real-Time and Accurate Neural Radiance Fields With Deterministic Integration for Volume Rendering
  :star:code
virtual correspondences
- Virtual Correspondence: Humans as a Cue for Extreme-View Geometry
  :house:project
红外测量
- Shape From Thermal Radiation: Passive Ranging Using Multi-Spectral LWIR Measurements
4D场景捕捉
- HSC4D: Human-Centered 4D Scene Capture in Large-Scale Indoor-Outdoor Space Using Wearable IMUs and LiDAR
  :star:code:house:project
可变形头像
- I M Avatar: Implicit Morphable Head Avatars From Videos
  :star:code:house:project
活动预测
- A Hybrid Egocentric Activity Anticipation Framework via Memory-Augmented Recurrent and One-Shot Representation Forecasting
Mirror Detection
- Learning Semantic Associations for Mirror Detection
  :star:code
双手重建
- Interacting Attention Graph for Single Image Two-Hand Reconstruction
  :star:code
Image Vectorization
- Towards Layer-wise Image Vectorization
  :star:code
行动学习
- Set-Supervised Action Learning in Procedural Task Videos via Pairwise Order Consistency
  :star:code
BNN
- PokeBNN: A Binary Pursuit of Lightweight Accuracy
  :star:code
CNN
- Condensing CNNs With Partial Differential Equations
  :star:code
Place Recognition
- TransVPR: Transformer-based place recognition with multi-level attention aggregation
  :open_mouth:oral
物体识别
- AirObject: A Temporally Evolving Graph Embedding for Object Identification
  :star:code
边缘检测
- EDTER: Edge Detection with Transformer
  :star:code
缺陷检测
- Semiconductor Defect Detection by Hybrid Classical-Quantum Deep Learning

Open-Set Recognition(开集识别)

Task-Adaptive Negative Envision for Few-Shot Open-Set Recognition
:star:code

Active Learning(主动学习)

Backdoor Attacks(后门攻击)

Multi-view Clustering(多视图聚类)

Machine Translation(机器翻译)

VALHALLA: Visual Hallucination for Machine Translation
:house:project

Object Counting(目标计数)

computer-aided design (CAD)

Transfer Learning(迁移学习)

Revisiting Learnable Affines for Batch Norm in Few-Shot Transfer Learning

Graph Matching(图匹配)

Noise Modeling(图像噪声建模)

Noise2NoiseFlow: Realistic Camera Noise Modeling Without Clean Images
:house:project

60.Visual Emotion Analysis(视觉情感分析)

MDAN: Multi-level Dependent Attention Network for Visual Emotion Analysis

59.动画

APES: Articulated Part Extraction From Sprite Sheets
:house:project
BANMo: Building Animatable 3D Neural Models From Many Casual Videos
:open_mouth:oral:house:project
Neural Head Avatars From Monocular RGB Videos
:star:code:house:project
FLAG: Flow-Based 3D Avatar Generation From Sparse Observations
:house:project
图像动画
- Thin-Plate Spline Motion Model for Image Animation
  :star:code
人物动画
- Structured Local Radiance Fields for Human Avatar Modeling
3D character animation(三维角色动画)
- 皮肤预测
  - SkinningNet: Two-Stream Graph Convolutional Neural Network for Skinning Prediction of Synthetic Characters
    :house:project
3D 舞蹈生成
- Bailando: 3D Dance Generation by Actor-Critic GPT with Choreographic Memory
  :star:code
- A Brand New Dance Partner: Music-Conditioned Pluralistic Dancing Controlled by Multiple Dance Genres
静止图像到动画
- Controllable Animation of Fluid Elements in Still Images
  :house:project
3D human avatars
- gDNA: Towards Generative Detailed Neural Avatars
  :star:code:house:project

58.Neural rendering(神经渲染)

Learning Motion-Dependent Appearance for High-Fidelity Rendering of Dynamic Humans from a Single Camera
IRON: Inverse Rendering by Optimizing Neural SDFs and Materials from Photometric Images
:open_mouth:oral:house:project
SqueezeNeRF: Further factorized FastNeRF for memory-efficient inference
Direct Voxel Grid Optimization: Super-fast Convergence for Radiance Fields Reconstruction
:star:code
Modeling Indirect Illumination for Inverse Rendering
:star:code:house:project
GenDR: A Generalized Differentiable Renderer
:star:code
泛化可微渲染器
CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields
:star:code:house:project
NeRF-Editing: Geometry Editing of Neural Radiance Fields
AR-NeRF: Unsupervised Learning of Depth and Defocus Effects from Natural Images with Aperture Rendering Neural Radiance Fields
:house:project
Neural Rays for Occlusion-Aware Image-Based Rendering
:star:code:house:project
EfficientNeRF Efficient Neural Radiance Fields
:star:code
CoNeRF: Controllable Neural Radiance Fields
:star:code:house:project
Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields
:house:project
Hallucinated Neural Radiance Fields in the Wild
:star:code:house:project
HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video
:open_mouth:oral:star:code:house:project:tv:video
Ref-NeRF: Structured View-Dependent Appearance for Neural Radiance Fields
Deblur-NeRF: Neural Radiance Fields From Blurry Images
:star:code:house:project
NeRFReN: Neural Radiance Fields With Reflections
:house:project
Depth-Supervised NeRF: Fewer Views and Faster Training for Free
:star:code:house:project
Dense Depth Priors for Neural Radiance Fields From Sparse Input Views
:star:code:house:project:tv:video
Light Field Neural Rendering
:star:code:house:project
InfoNeRF: Ray Entropy Minimization for Few-Shot Neural Volume Rendering
:star:code:house:project
BokehMe: When Neural Rendering Meets Classical Rendering
:open_mouth:oral:star:code
Plenoxels: Radiance Fields Without Neural Networks
:star:code:house:project
HDR-NeRF: High Dynamic Range Neural Radiance Fields
Urban Radiance Fields
:house:project
Aug-NeRF: Training Stronger Neural Radiance Fields With Triple-Level Physically-Grounded Augmentations
:star:code
Fourier PlenOctrees for Dynamic Radiance Field Rendering in Real-Time
:star:code:house:project
Point-NeRF: Point-Based Neural Radiance Fields
HumanNeRF: Efficiently Generated Human Radiance Field From Sparse Inputs
:house:project
Ray Priors through Reprojection: Improving Neural Radiance Fields for Novel View Extrapolation

57.Gaze Estimation(视线估计)

56.Sound

Finding Fallen Objects via Asynchronous Audio-Visual Integration
:house:project
Weakly Paired Associative Learning for Sound and Image Representations via Bimodal Associative Memory
MERLOT Reserve: Neural Script Knowledge Through Vision and Language and Sound
:star:code:house:project
Visual Acoustic Matching
:open_mouth:oral:house:project
声源定位
音频配对
- It's Time for Artistic Correspondence in Music and Video
  :house:project
语音克隆
- V2C: Visual Voice Cloning
  :star:code
视听语音增强
- Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis
  :tv:video
文本转语音
- More Than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech
  :star:code
语音转人脸图像
- Cross-Modal Perceptionist: Can Face Geometry be Gleaned from Voices?
  :star:code:house:project
语音分离
- Reading To Listen at the Cocktail Party: Multi-Modal Speech Separation
  :house:project
语音手势生成
- Low-Resource Adaptation for Personalized Co-Speech Gesture Generation
  :house:project
扬声器定位
- Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization
语音手势生成
- SEEG: Semantic Energized Co-Speech Gesture Generation
  :star:code

55.Novel View Synthesis(视图合成)

NPBG++: Accelerating Neural Point-Based Graphics
:house:project
Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations
:house:project
AutoRF: Learning 3D Object Radiance Fields from Single View Observations
:house:project
NeurMiPs: Neural Mixture of Planar Experts for View Synthesis
:star:code:house:project:tv:video:newspaper:解读
GeoNeRF: Generalizing NeRF with Geometry Priors
:star:code:house:project:tv:video
FWD: R eal-Time Novel View Synthesis With Forward Warping and Depth
:star:code
Block-NeRF: Scalable Large Scene Neural View Synthesis
Boosting View Synthesis With Residual Transfer
:star:code:house:project
NeRF in the Dark: High Dynamic Range View Synthesis From Noisy Raw Images
RegNeRF: Regularizing Neural Radiance Fields for View Synthesis from Sparse Inputs
:open_mouth:oral:star:code:house:project:tv:video
视图连接
- Connecting the Complementary-View Videos: Joint Camera Identification and Subject Association
  :star:code

54.Dataset(数据集)

ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer
:star:code:house:project:newspaper:粗解
Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities
:star:code:house:project
3MASSIV: Multilingual, Multimodal and Multi-Aspect dataset of Social Media Short Videos
:sunflower:dataset
Hephaestus: A large scale multitask dataset towards InSAR understanding
SmartPortraits: Depth Powered Handheld Smartphone Dataset of Human Portraits for State Estimation, Reconstruction and Synthesis
:sunflower:dataset
AKB-48: A Real-World Articulated Object Knowledge Base
:star:code
:newspaper:粗解
Primitive3D: 3D Object Dataset Synthesis from Randomly Assembled Primitives
ZeroWaste Dataset: Towards Deformable Object Segmentation in Cluttered Scenes
:star:code:house:project
ETHSeg: An Amodel Instance Segmentation Network and a Real-World Dataset for X-Ray Waste Inspection
一个Amodel实例分割网络和一个用于X射线废物检查的真实数据集
MAD: A Scalable Dataset for Language Grounding in Videos From Movie Audio Descriptions
:sunflower:dataset
一个可扩展的数据集，用于从电影音频描述中获得视频的Language Grounding
DiLiGenT102: A Photometric Stereo Benchmark Dataset With Controlled Shape and Material Variation
:sunflower:dataset
具有受控形状和材料变化的光度测量立体基准数据集
DAD-3DHeads: A Large-Scale Dense, Accurate and Diverse Dataset for 3D Head Alignment From a Single Image
:sunflower:dataset
一个大规模的密集、准确和多样化的数据集，用于从单一图像中进行三维头部对准
Rope3D: The Roadside Perception Dataset for Autonomous Driving and Monocular 3D Object Detection Task
:sunflower:dataset
用于自主驾驶和单眼3D物体检测任务的路边感知数据集
Ithaca365: Dataset and Driving Perception Under Repeated and Challenging Weather Conditions
:sunflower:dataset
Open Challenges in Deep Stereo: The Booster Dataset
:sunflower:dataset
RGB-Multispectral Matching: Dataset, Learning Methodology, Evaluation
:house:project
卫星数据集
- DynamicEarthNet: Daily Multi-Spectral Satellite Dataset for Semantic Change Segmentation
动物行为理解数据集
- Animal Kingdom: A Large and Diverse Dataset for Animal Behavior Understanding
  :open_mouth:oral:house:project:sunflower:dataset
数据集(森林监测)
- The Auto Arborist Dataset: A Large-Scale Benchmark for Multiview Urban Forest Monitoring Under Domain Shift
  :sunflower:dataset
3D目标理解
- ABO: Dataset and Benchmarks for Real-World 3D Object Understanding
  :sunflower:dataset
数据集(AutoMine)
- AutoMine: An Unmanned Mine Dataset
  :sunflower:dataset
数据集(人脸表情识别)
- FERV39k: A Large-Scale Multi-Scene Dataset for Facial Expression Recognition in Videos
  :star:code
数据集(手势识别)
- LD-ConGR: A Large RGB-D Video Dataset for Long-Distance Continuous Gesture Recognition
  :star:code
数据集(谷物识别)
- GrainSpace: A Large-Scale Dataset for Fine-Grained and Domain-Adaptive Recognition of Cereal Grains
  :sunflower:dataset
数据集(用于空间-时间行动、社会团体和活动检测)
- JRDB-Act: A Large-Scale Dataset for Spatio-Temporal Action, Social Group and Activity Detection)
  :sunflower:dataset

53.Sign Language Translation(手语翻译)

52.Human Motion Forecasting(人体运动预测)

51.光学、几何、光场成像

Compressive Single-Photon 3D Cameras
Fisher Information Guidance for Learned Time-of-Flight Imaging
Light Field(光场)
- Occlusion-Aware Cost Constructor for Light Field Depth Estimation
  :star:code:newspaper:粗解
- Neural Point Light Fields
  :star:code:house:project
- Acquiring a Dynamic Light Field Through a Single-Shot Coded Image
- Learning Neural Light Fields With Ray-Space Embedding
  :star:code:house:project
深度重建
- Deep Hyperspectral-Depth Reconstruction Using Single Color-Dot Projection
  :star:code:house:project:tv:video
快门校正
- Learning Adaptive Warping for Real-World Rolling Shutter Correction
  :star:code
热红外成像
- Infrared Invisible Clothing:Hiding from Infrared Detectors at Multiple Angles in Real World
  :open_mouth:oral
相机姿势估计
- DiffPoseNet: Direct Differentiable Camera Pose Estimation
相机重定位
- SceneSqueezer: Learning to Compress Scene for Camera Relocalization
  :open_mouth:oral
成像
光学
Quantization-aware Deep Optics for Diffractive Snapshot Hyperspectral Imaging
:star:code
Dual-Shutter Optical Vibration Sensing
相机姿势
- Camera Pose Estimation Using Implicit Distortion Models
相机成像
- Learning to Zoom Inside Camera Imaging Pipeline
相机定位
- Learning To Detect Scene Landmarks for Camera Localization
  :star:code
孔径成像
- Synthetic Aperture Imaging With Events and Frames
  :star:code
高光谱成像
- Real-Time Hyperspectral Imaging in Hardware via Trained Metasurface Encoders
  :star:code

50.Anomaly Detection(异常检测)

49.Image Geo-localization(图像地理定位)

TransGeo: Transformer Is All You Need for Cross-view Image Geo-localization
:star:code
视觉地理定位
- Rethinking Visual Geo-localization for Large-Scale Applications
  :star:code
- Deep Visual Geo-localization Benchmark
  :open_mouth:oral:house:project
轨迹重建
- MonoTrack: Shuttle trajectory reconstruction from monocular badminton video

48.Visual Grounding

Multi-View Transformer for 3D Visual Grounding
:star:code
Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning
:star:code
视觉定位，通过自然语言定位目标位置（很有意思的研究）
Shifting More Attention to Visual Backbone: Query-Modulated Refinement Networks for End-to-End Visual Grounding
:star:code
Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
:star:code
Multi-Modal Dynamic Graph Transformer for Visual Grounding
:star:code

47.Few/Zero-Shot Learning/Domain Generalization/Adaptation(小/零样本/域泛化/适应)

小样本
零样本
域泛化
- Compound Domain Generalization via Meta-Knowledge Encoding
- Causality Inspired Representation Learning for Domain Generalization
  :star:code
- Towards Unsupervised Domain Generalization
  :newspaper:CVPR 2022丨清华大学提出：无监督域泛化 (UDG)
  本次任务的主要目标是域泛化（domain generalization(DG)），是首篇将DG推广到unsupervised learning 领域的，并提出一个新的研究领域 unsupervised domain generalization(UDG)。
- Towards Principled Disentanglement for Domain Generalization
  :open_mouth:oral:star:code
- Meta Convolutional Neural Networks for Single Domain Generalization
- PCL: Proxy-Based Contrastive Learning for Domain Generalization
- Localized Adversarial Domain Generalization
- Unsupervised Domain Generalization by Learning a Bridge Across Domains
- Style Neophile: Constantly Seeking Novel Styles for Domain Generalization
- BoosterNet: Improving Domain Generalization of Deep Neural Nets Using Culpability-Ranked Features
- Failure Modes of Domain Generalization Algorithms
- Geometric and Textural Augmentation for Domain Gap Reduction
  :star:code
- Revisiting Domain Generalized Stereo Matching Networks From a Feature Consistency Perspective
  :star:code
- 域外泛化
  - The Two Dimensions of Worst-case Training and the Integrated Effect for Out-of-domain Generalization
域适应
No-Reference Point Cloud Quality Assessment via Domain Adaptation
:star:code
Slimmable Domain Adaptation
:star:code
SHIFT: A Synthetic Driving Dataset for Continuous Multi-Task Domain Adaptation
Geometric Anchor Correspondence Mining With Uncertainty Modeling for Universal Domain Adaptation
无监督域适应

46.Scene Graph Generation(场景图生成)

PPDL: Predicate Probability Distribution Based Loss for Unbiased Scene Graph Generation
Fine-Grained Predicates Learning for Scene Graph Generation
:star:code
HL-Net: Heterophily Learning Network for Scene Graph Generatio
:star:code
场景图生成：异质学习网络
:newspaper:解读
RU-Net: Regularized Unrolling Network for Scene Graph Generation
:star:code
场景图生成：正则展开网络
:newspaper:解读
The Devil is in the Labels: Noisy Label Correction for Robust Scene Graph Generation
:star:code
Dynamic Scene Graph Generation via Anticipatory Pre-Training
Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased Scene Graph Generation
:star:code
Structured Sparse R-CNN for Direct Scene Graph Generation
:star:code
HL-Net: Heterophily Learning Network for Scene Graph Generation
:star:code
Not All Relations Are Equal: Mining Informative Labels for Scene Graph Generation
SGTR: End-to-end Scene Graph Generation with Transformer
:star:code
视频场景图生成
- Classification-Then-Grounding: Reformulating Video Scene Graphs As Temporal Bipartite Graphs
  :star:code

45.Dense Prediction(密集预测)

44.Federated Learning(联邦学习)

43.Multi-Task Learning(多任务学习)

42.Metric Learning(度量学习)

41.Incremental Learning(增量学习)

40.Adversarial Learning(对抗学习)

39.Continual Learning(持续学习)

38.Meta-Learning(元学习)

What Matters For Meta-Learning Vision Regression Tasks?
:star:code
Multidimensional Belief Quantification for Label-Efficient Meta-Learning
Dynamic Kernel Selection for Improved Generalization and Memory Efficiency in Meta-learning
:star:code
Learning to Learn and Remember Super Long Multi-Domain Task Sequence
:open_mouth:oral:star:code
:newspaper:解读

37.Contrastive Learning(对比学习)

36.Optical Flow(光流估计)

CRAFT: Cross-Attentional Flow Transformer for Robust Optical Flow
:star:code
DIP: Deep Inverse Patchmatch for High-Resolution Optical Flow
:star:code
Imposing Consistency for Optical Flow Estimation
Deep Equilibrium Optical Flow Estimation
:star:code:newspaper:解读
GMFlow: Learning Optical Flow via Global Matching
:open_mouth:oral:star:code:newspaper:解读
Optical Flow Estimation for Spiking Camera
:star:code
Learning Optical Flow with Kernel Patch Attention
:star:code:newspaper:解读
CamLiFlow: Bidirectional Camera-LiDAR Fusion for Joint Optical Flow and Scene Flow Estimation
:star:code
Global Matching With Overlapping Attention for Optical Flow Estimation
:star:code
Towards Understanding Adversarial Robustness of Optical Flow Networks
:star:code

35.OCR

XYLayoutLM: Towards Layout-Aware Multimodal Networks for Visually-Rich Document Understanding
SwinTextSpotter: Scene Text Spotting via Better Synergy Between Text Detection and Text Recognition
:star:code
场景文本检测
- Towards End-to-End Unified Scene Text Detection and Layout Analysis
  :star:code
- Pushing the Performance Limit of Scene Text Recognizer without Human Annotation
- Vision-Language Pre-Training for Boosting Scene Text Detectors
  :star:code
  视觉语言预训练，场景文本检测,代码将开源，地址尚未公布。
- Few Could Be Better Than All: Feature Sampling and Grouping for Scene Text Detection
场景文本识别
- SimAN: Exploring Self-Supervised Representation Learning of Scene Text via Similarity-Aware Normalization
  :star:code
Text Spotting
- Text Spotting Transformers
  :star:code:newspaper:粗解
- Towards Weakly-Supervised Text Spotting Using a Multi-Task Transformer
LOGO设计
- Aesthetic Text Logo Synthesis via Content-aware Layout Inferring
  :star:code
  :newspaper:CVPR 2022 | 北大、腾讯提出文字logo生成模型，脑洞大开堪比设计师
字体生成
- XMP-Font: Self-Supervised Cross-Modality Pre-training for Few-Shot Font Generation
- (Oral)Look Closer to Supervise Better: One-Shot Font Generation via Component-Based Discriminator
  字体生成（很有商业价值的方向）
- Few-Shot Font Generation by Learning Fine-Grained Local Styles
文本识别
- Open-set Text Recognition via Character-Context Decoupling
表格结构识别
- Neural Collaborative Graph Machines for Table Structure Recognition
  :newspaper:解读
文本美观预测评估
- Does Text Attract Attention on E-Commerce Images: A Novel Saliency Prediction Dataset and Method
  :star:code
表结构理解
- TableFormer: Table Structure Understanding with Transformers
文本分割
- BTS: A Bi-Lingual Benchmark for Text Segmentation in the Wild
表格检测
- PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents
  :star:code
文本修复
- Fourier Document Restoration for Robust Document Dewarping and Recognition
  :house:project
手写数学表达式识别
- Syntax-Aware Network for Handwritten Mathematical Expression Recognition

34.Model Compression/Knowledge Distillation/Pruning(模型压缩/知识蒸馏/剪枝)

33.Human-Object Interaction(人物交互)

HOI4D: A 4D Egocentric Dataset for Category-Level Human-Object Interaction
:star:code
MSTR: Multi-Scale Transformer for End-to-End Human-Object Interaction Detection
GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection
:star:code
Distillation Using Oracle Queries for Transformer-Based Human-Object Interaction Detection
:star:code
OakInk: A Large-scale Knowledge Repository for Understanding Hand-Object Interaction
:star:code
:newspaper:粗解
D-Grasp: Physically Plausible Dynamic Grasp Synthesis for Hand-Object Interactions
:house:code
Learning Transferable Human-Object Interaction Detector With Natural Language Supervision
:star:code
What to look at and where: Semantic and Spatial Refined Transformer for detecting human-object interactions
:open_mouth:oral
Human-Object Interaction Detection via Disentangled Transformer
Consistency Learning via Decoding Path Augmentation for Transformers in Human Object Interaction Detection
:star:code:newspaper:解读
Interactiveness Field in Human-Object Interactions
:star:code
Stability-driven Contact Reconstruction From Monocular Color Images
:star:code
单目彩色图像的手物交互重建，人机交互
Interactiveness Field of Human-Object Interactions
:star:code
:newspaper:粗解
Exploring Structure-aware Transformer over Interaction Proposals for Human-Object Interaction Detection
:star:code
:newspaper:解读1
:newspaper:解读2
Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions
:open_mouth:oral:star:code
Efficient Two-Stage Detection of Human-Object Interactions With a Novel Unary-Pairwise Transformer
:house:project
NeuralHOFusion: Neural Volumetric Rendering Under Human-Object Interactions
Category-Aware Transformer Network for Better Human-Object Interaction Detection
HOI跟踪
- BEHAVE: Dataset and Method for Tracking Human Object Interactions
  :house:project

32.Data Augmentation(数据增强)

🐦️AlignMix: Improving representation by interpolating aligned features
3D Common Corruptions and Data Augmentation
:star:code:house:project:tv:video:newspaper:粗解
Kubric: A scalable dataset generator
:star:code
Robust Optimization As Data Augmentation for Large-Scale Graphs
:star:code
AIM: an Auto-Augmenter for Images and Meshes
:star:code
Boosting Robustness of Image Matting With Context Assembling and Strong Data Augmentation
:house:project
TeachAugment: Data Augmentation Optimization Using Teacher Knowledge
:open_mouth:oral:star:code

31.Vision-Language(视觉语言)

30.Visual Answer Questions(视觉问答)

29.SLAM/Augmented Reality/Virtual Reality/Robotics(增强/虚拟现实/机器人)

SLAM
- NICE-SLAM: Neural Implicit Scalable Encoding for SLAM
  :star:code:house:project:tv:video
目标导航
- Online Learning of Reusable Abstract Models for Object Goal Navigation
- Is Mapping Necessary for Realistic PointGoal Navigation?
  :star:code:house:project
try-on
- Dressing in the Wild by Watching Dance Videos
  :house:project
- Style-Based Global Appearance Flow for Virtual Try-On
  :star:code
- ClothFormer:Taming Video Virtual Try-on in All Module
  :open_mouth:oral:star:code:house:project:newspaper:解读
- Weakly Supervised High-Fidelity Clothing Model Generation
- Full-Range Virtual Try-On With Recurrent Tri-Level Transform
  :house:project
- ClothFormer: Taming Video Virtual Try-On in All Module
  :open_mouth:oral:star:code
  :newspaper:解读
AR
- Episodic Memory Question Answering
  :open_mouth:oral:star:code
  AI助理：情景记忆问答（增强现实新任务，数据及代码均将开源）
机器人
- Coarse-To-Fine Q-Attention: Efficient Learning for Visual Robotic Manipulation via Discretisation
- 手-物姿态估计
  - ArtiBoost: Boosting Articulated 3D Hand-Object Pose Estimation via Online Exploration and Synthesis
    :star:code
    :newspaper:粗解
机器人导航
- Coupling Vision and Proprioception for Navigation of Legged Robots
  :star:code:house:project:tv:video

28.Style Transfer(风格迁移)

Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer
:star:code
Industrial Style Transfer with Large-scale Geometric Warping and Content Preservation
:star:code
Exact Feature Distribution Matching for Arbitrary Style Transfer and Domain Generalization
:open_mouth:oral:star:code
HEAT: Holistic Edge Attention Transformer for Structured Reconstruction
:star:code
StyTr2: Image Style Transfer With Transformers
:star:code
CLIPstyler: Image Style Transfer With a Single Text Condition
:star:code
运动风格迁移
- Style-ERD: Responsive and Coherent Online Motion Style Transfer
运动迁移
- Structure-Aware Motion Transfer with Deformable Anchor Model
  :star:code:newspaper:解读
场景风格化
- StylizedNeRF: Consistent 3D Scene Stylization as Stylized NeRF via 2D-3D Mutual Learning
外观迁移
- Splicing ViT Features for Semantic Appearance Transfer
  :open_mouth:oral:star:code:house:project
风格化
- Text2Mesh: Text-Driven Neural Stylization for Meshes
  :star:code:house:project
- 3D Photo Stylization: Learning To Generate Stylized Novel Views From a Single Image
  :star:code:house:project

27.Pose Estimation(物体姿势估计)

OSOP: A Multi-Stage One Shot Object Pose Estimation Framework
OnePose: One-Shot Object Pose Estimation without CAD Models
:star:code:house:project:newspaper:解读
ABPN: Adaptive Blend Pyramid Network for Real-Time Local Retouching of Ultra High-Resolution Photo
On the Instability of Relative Pose Estimation and RANSAC's Role
SurfEmb: Dense and Continuous Correspondence Distributions for Object Pose Estimation With Learnt Surface Embeddings
:star:code:house:project
ConDor: Self-Supervised Canonicalization of 3D Pose for Partial Shapes
:star:code:house:project:tv:video
GPV-Pose: Category-Level Object Pose Estimation via Geometry-Guided Point-Wise Voting
UDA-COPE: Unsupervised Domain Adaptation for Category-Level Object Pose Estimation
4D
- Revealing Occlusions with 4D Neural Fields
  :open_mouth:oral:star:code:house:project
- Ego4D: Around the World in 3,000 Hours of Egocentric Video
  :star:code
9D
- CPPF: Towards Robust Category-Level 9D Pose Estimation in the Wild
  :star:code:newspaper:粗解 :notebook:
单目目标姿势估计
- EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation
  :star:code
6D
3D Object Articulation
- Understanding 3D Object Articulation in Internet Videos
  :house:project
3Dope
- Templates for 3D Object Pose Estimation Revisited: Generalization to New Objects and Robustness to Occlusions
  :star:code

26.GCN/GNN

GNN

25.Fine-Grained/Image Classification(细粒度/图像分类)

Multimodal Dynamics: Dynamical Fusion for Trustworthy Multimodal Classification
A Voxel Graph CNN for Object Classification with Event Cameras
Multi-Modal Extreme Classification
:star:code
细粒度分类
- Dynamic MLP for Fine-Grained Image Classification by Leveraging Geographical and Temporal Information
  :star:code:newspaper:粗解:notebook:粗解
- Fine-Grained Object Classification via Self-Supervised Pose Alignment
  :star:code
图像分类
小样本分类
- CAD: Co-Adapting Discriminative Features for Improved Few-Shot Classification
- Matching Feature Sets for Few-Shot Image Classification
  :star:code:house:project:tv:video
- Joint Distribution Matters: Deep Brownian Distance Covariance for Few-Shot Classification
  :open_mouth:oral:star:code:house:project:newspaper:解读
- Learning to Affiliate: Mutual Centralized Learning for Few-shot Classification
  :newspaper:解读
- Generating Representative Samples for Few-Shot Classification
  :star:code
  :newspaper:粗解
  在小样本分类问题中，通过生成更多代表性样本，去除非代表性样本，改善了分类结果。实现了SOTA的结果。
- Improving Adversarially Robust Few-Shot Image Classification With Generalizable Representations
- Task Discrepancy Maximization for Fine-Grained Few-Shot Classification
- 小样本分类与分割(FS-CS)
  - Integrative Few-Shot Learning for Classification and Segmentation
    :star:code
长尾识别
细粒度识别
- Knowledge Mining with Scene Text for Fine-Grained Recognition
  :star:code:newspaper:解读
多标签分类
- Large Loss Matters in Weakly Supervised Multi-Label Classification
  :star:code:house:project
- Multi-Label Classification With Partial Annotations Using Class-Aware Selective Loss
  :star:code
类不平衡分类
- A Re-Balancing Strategy for Class-Imbalanced Classification Based on Instance Difficulty
图像-文本多模态分类
- Expanding Large Pre-Trained Unimodal Models With Multimodal Information Injection for Image-Text Multimodal Classification

24.Super-Resolution(超分辨率)

23.Image Retrieval(图像检索)

22.Image Synthesis/Generation(图像合成)

Interactive Image Synthesis with Panoptic Layout Generation
:star:code
Autoregressive Image Generation using Residual Quantization
:star:code:newspaper:粗解
GIRAFFE HD: A High-Resolution 3D-aware Generative Model
Arbitrary-Scale Image Synthesis
:star:code:newspaper:粗解
Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis
:star:code:newspaper:解读
Neural Texture Extraction and Distribution for Controllable Person Image Synthesis
:star:code
Unpaired Cartoon Image Synthesis via Gated Cycle Mapping
3D Scene Painting via Semantic Image Synthesis
3D-Aware Image Synthesis via Learning Structural and Textural Representations
:star:code:house:project:tv:video
High-Resolution Image Synthesis With Latent Diffusion Models
:star:code
Retrieval-Based Spatially Adaptive Normalization for Semantic Image Synthesis
:star:code
DPGEN: Differentially Private Generative Energy-Guided Network for Natural Image Synthesis
:star:code
Cluster-Guided Image Synthesis With Unconditional Models
Day-to-Night Image Synthesis for Training Nighttime Neural ISPs
:open_mouth:oral:star:code
Semantic-Shape Adaptive Feature Modulation for Semantic Image Synthesis
:star:code
Modulated Contrast for Versatile Image Synthesis
:star:code
文本引导的图像处理
- ManiTrans: Entity-Level Text-Guided Image Manipulation via Token-wise Semantic Alignment and Generation
  :open_mouth:oral:house:project
姿势引导的图像合成
- Exploring Dual-task Correlation for Pose Guided Person Image Generation
  :star:code:newspaper:粗解
文本到图像合成
- StyleT2I: Toward Compositional and High-Fidelity Text-to-Image Synthesis
- Text-to-Image Synthesis based on Object-Guided Joint-Decoding Transformer
  :newspaper:解读
- LAFITE: Towards Language-Free Training for Text-to-Image Generation
  :star:code
- DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis
  :open_mouth:oral:star:code
- Text to Image Generation With Semantic-Spatial Aware GAN
  :star:code
- Vector Quantized Diffusion Model for Text-to-Image Synthesis
  :star:code
图像翻译
- FlexIT: Towards Flexible Semantic Image Translation
  :star:code
- A Style-aware Discriminator for Controllable Image Translation
图像生成
- Marginal Contrastive Correspondence for Guided Image Generation
  :open_mouth:oral
- OSSGAN: Open-Set Semi-Supervised Image Generation
  :star:code
- A Closer Look at Few-shot Image Generation
- Modeling Image Composition for Complex Scene Generation
  :star:code
  :newspaper:解读
- Local Attention Pyramid for Scene Image Generation
- GRAM: Generative Radiance Manifolds for 3D-Aware Image Generation
  :house:project
- MaskGIT: Masked Generative Image Transformer
- Attribute Group Editing for Reliable Few-Shot Image Generation
  :star:code
- Learning to Memorize Feature Hallucination for One-Shot Image Generation
  :newspaper:解读
- StyleSwin: Transformer-Based GAN for High-Resolution Image Generation
  :star:code
- Global Context With Discrete Diffusion in Vector Quantised Modelling for Image Generation
图像到本文
- ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
  :star:code
文本-形状生成
- CLIP-Forge: Towards Zero-Shot Text-To-Shape Generation
  :star:code
图像-视频生成
- Make It Move: Controllable Image-to-Video Generation With Text Descriptions
  :star:code
基于文本的目标生成
- Zero-Shot Text-Guided Object Generation With Dream Fields
  :star:code:house:project
人物图像生成
- Self-supervised Correlation Mining Network for Person Image Generation
图像-文本匹配
- Negative-Aware Attention Framework for Image-Text Matching
  :star:code
图像和文本之间的双向生成
- L-Verse: Bidirectional Generation Between Image and Text
  :star:code

21.UAV/Remote Sensing/Satellite Image(无人机/遥感/卫星图像)

CVNet: Contour Vibration Network for Building Extraction
:star:code
CrossLoc: Scalable Aerial Localization Assisted by Multimodal Synthetic Data
:house:project
Self-Supervised Material and Texture Representation Learning for Remote Sensing Tasks
:star:code
遥感图像融合
- HyperTransformer: A Textural and Spectral Feature Fusion Transformer for Pansharpening
  :star:code:newspaper:粗解
航空图像分割
- Revisiting Near/Remote Sensing with Geospatial Attention
航空影像检测
- Oriented RepPoints for Aerial Object Detection
  :star:code
卫星影像
- PolyWorld: Polygonal Building Extraction with Graph Neural Networks in Satellite Images
  :star:code

20.Autonomous vehicles(自动驾驶)

自动驾驶
车道线检测
- Rethinking Efficient Lane Detection via Curve Modeling
  :star:code:newspaper:粗解
   :notebook:
- Towards Driving-Oriented Metric for Lane Detection Models
- A Keypoint-based Global Association Network for Lane Detection
  :star:code:newspaper:解读
- 单目3D车道检测
  - ONCE-3DLanes: Building Monocular 3D Lane Detection
    :star:code
    车道线检测技术再演进
车道线描述
- Eigenlanes: Data-Driven Lane Descriptors for Structurally Diverse Lanes
  :star:code
- CLRNet: Cross Layer Refinement Network for Lane Detection
  :star:code:newspaper:解读
自动驾驶场景重新照明
- SIMBAR: Single Image-Based Scene Relighting For Effective Data Augmentation For Automated Driving Vision Tasks
  :house:project
行人轨迹预测
轨迹预测
车辆检测
- Modality-Agnostic Learning for Radar-Lidar Fusion in Vehicle Detection

19.Neural Architecture Search(神经架构搜索)

🐦️ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior
:star:code
Arch-Graph: Acyclic Architecture Relation Predictor for Task-Transferable Neural Architecture Search
:star:code:newspaper:解读
GPUNet: Searching the Deployable Convolution Neural Networks for GPUs
神经架构搜索，面向GPUs部署的轻量级网络结构搜索（比谷歌EfficientNet-X系列、Meta FBNetV3 速度更快，甚至性能都要好，作者来自英伟达）
Distribution Consistent Neural Architecture Search
Performance-Aware Mutual Knowledge Distillation for Improving Neural Architecture Search
BaLeNAS: Differentiable Architecture Search via the Bayesian Learning Rule
GreedyNASv2: Greedier Search With a Greedy Path Filter
Global Convergence of MAML and Theory-Inspired Neural Architecture Search for Few-Shot Learning
:star:code
Neural Architecture Search with Representation Mutual Information
:star:code
Demystifying the Neural Tangent Kernel From a Practical Perspective: Can It Be Trusted for Neural Architecture Search Without Training?
:star:code
b-DARTS: Beta-Decay Regularization for Differentiable Architecture Search
:star:code
Shapley-NAS: Discovering Operation Contribution for Neural Architecture Search
:star:code

18.Person Re-Identification(人员重识别)

17.Medical Image(医学影像)

Temporal Context Matters: Enhancing Single Image Prediction with Disease Progression Representations
:open_mouth:oral
BoostMIS: Boosting Medical Image Semi-supervised Learning with Adaptive Pseudo Labeling and Informative Active Annotation
:star:code
DeepLIIF: An Online Platform for Quantification of Clinical Pathology Slides
DiRA: Discriminative, Restorative, and Adversarial Learning for Self-supervised Medical Image Analysis
:star:code:newspaper:解读
Surpassing the Human Accuracy: Detecting Gallbladder Cancer from USG Images with Curriculum Learning
:star:code:house:project
What Makes Transfer Learning Work for Medical Images: Feature Reuse & Other Factors
ImplicitAtlas: Learning Deformable Shape Templates in Medical Imaging
Robust Equivariant Imaging: A Fully Unsupervised Framework for Learning To Image From Noisy and Partial Measurements
:star:code
ContIG: Self-Supervised Multimodal Contrastive Learning for Medical Imaging With Genetics
:star:code
3D生物打印
- Generating 3D Bio-Printable Patches Using Wound Segmentation and Reconstruction to Treat Diabetic Foot Ulcers
  利用伤口分割和重建生成3D生物打印贴片来治疗糖尿病足溃疡
SR（ＭRI）
- Transformer-empowered Multi-scale Contextual Matching and Aggregation for Multi-contrast MRI Super-resolution
  :star:code
医学图像分割
- CycleMix: A Holistic Strategy for Medical Image Segmentation From Scribble Supervision
  :star:code
- C-CAM: Causal CAM for Weakly Supervised Semantic Segmentation on Medical Image
  :star:code
- HyperSegNAS: Bridging One-Shot Neural Architecture Search With 3D Medical Image Segmentation Using HyperNet
- Closing the Generalization Gap of Cross-Silo Federated Medical Image Segmentation
  :star:code
- Rethinking Bayesian Deep Learning Methods for Semi-Supervised Volumetric Medical Image Segmentation
  :star:code
医学图像配准
- Affine Medical Image Registration with Coarse-to-Fine Vision Transformer
  :star:code
医学图像分析
- FIBA: Frequency-Injection based Backdoor Attack in Medical Image Analysis
  :star:code:newspaper:解读
自动生成报告
- Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation
医学图像分类
- ACPL: Anti-Curriculum Pseudo-Labelling for Semi-Supervised Medical Image Classification
  :star:code
- M3T: Three-Dimensional Medical Image Classifier Using Multi-Plane and Multi-Slice Transformer
CT合成
- Incremental Cross-View Mutual Distillation for Self-Supervised Medical CT Synthesis
医学影像关键点检测
- Which Images To Label for Few-Shot Medical Landmark Detection?
MRI
- Vox2Cortex: Fast Explicit Reconstruction of Cortical Surfaces From 3D MRI Scans With Geometric Deep Neural Networks
  :star:code
- Recurrent Variational Network: A Deep Learning Inverse Problem Solver Applied to the Task of Accelerated MRI Reconstruction
  :star:code
组织病理学
- Cross-Patch Dense Contrastive Learning for Semi-Supervised Segmentation of Cellular Nuclei in Histopathologic Images
  :star:code
牙齿
- Improving Segmentation of the Inferior Alveolar Nerve Through Deep Label Propagation
  :house:project
3D医学分析
- Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image Analysis
  :star:code
三维牙齿实例分割
- DArch: Dental Arch Prior-Assisted 3D Tooth Instance Segmentation With Weak Annotations
疟疾检测
- Towards Low-Cost and Efficient Malaria Detection
  :sunflower:dataset

16.Semi/self-supervised learning(半/自监督)

15.Transformer

Vision Transformer With Deformable Attention
:star:code
Delving Deep Into the Generalization of Vision Transformers Under Distribution Shifts
:star:code
HiVT: Hierarchical Vector Transformer for Multi-Agent Motion Prediction
Vision Transformer Slimming: Multi-Dimension Searching in Continuous Optimization Space
:star:code
BoxeR: Box-Attention for 2D and 3D Transformers
:star:code
Video Swin Transformer
:star:code
APRIL: Finding the Achilles' Heel on Privacy for Vision Transformers
Fast Point Transformer
:star:code
ChiTransformer:Towards Reliable Stereo from Cues
Beyond Fixation: Dynamic Window Visual Transformer
:star:code
Training-free Transformer Architecture Search
:newspaper:解读
Automated Progressive Learning for Efficient Training of Vision Transformers
:star:code
Collaborative Transformers for Grounded Situation Recognition
:star:code
TubeDETR: Spatio-Temporal Video Grounding with Transformers
:open_mouth:oral:star:code:house:project
Deformable Video Transformer
MixFormer: Mixing Features across Windows and Dimensions
:open_mouth:oral:star:code:newspaper:粗解
Are Multimodal Transformers Robust to Missing Modality?
MiniViT: Compressing Vision Transformers with Weight Multiplexing
Multimodal Token Fusion for Vision Transformers
:star:code
Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer
:open_mouth:oral:star:code:newspaper:解读
UTC: A Unified Transformer with Inter-Task Contrastive Learning for Visual Dialog
对比学习用于视觉对话的统一Transformer架构
Patch Slimming for Efficient Vision Transformers
:newspaper:解读
Swin Transformer V2: Scaling Up Capacity and Resolution
:star:code
:newspaper:大大刷新记录！Swin Transformer v2.0 来了，30亿参数！
SimMIM: A Simple Framework for Masked Image Modeling
:star:code
NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition
:star:code
:newspaper:解读
Mobile-Former: Bridging MobileNet and Transformer
:star:code
MulT: An End-to-End Multitask Learning Transformer
:star:code:house:project
Scaling Vision Transformers to Gigapixel Images via Hierarchical Self-Supervised Learning
:open_mouth:oral:star:code:newspaper:解读
CodedVTR: Codebook-Based Sparse Voxel Transformer With Geometric Guidance
MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger Tokens
:star:code
IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering in Indoor Scenes
:star:code
Reversible Vision Transformers
:star:code
MetaFormer Is Actually What You Need for Vision
:open_mouth:oral:star:code
GradViT: Gradient Inversion of Vision Transformers
:house:project
CSWin Transformer: A General Vision Transformer Backbone With Cross-Shaped Windows
:star:code
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
:star:code
:newspaper:Meta&伯克利基于池化自注意力机制提出通用多尺度视觉Transformer，在ImageNet分类准确率达88.8%！开源
A-ViT: Adaptive Tokens for Efficient Vision Transformer
:open_mouth:oral:house:project
:newspaper:不重要的token可以提前停止计算！英伟达提出自适应token的高效视觉Transformer网络A-ViT，提高模型的吞吐量！
Certified Patch Robustness via Smoothed Vision Transformers
:star:code
The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy
:star:code
Bootstrapping ViTs: Towards Liberating Vision Transformers From Pre-Training
:star:code
Object-Region Video Transformers
:star:code:house:project
Shunted Self-Attention via Multi-Scale Token Aggregation
:open_mouth:oral:star:code
Towards Robust Vision Transformer
:star:code
Fine-tuning Image Transformers using Learnable Memory
Lite Vision Transformer With Enhanced Self-Attention
:star:code
Self-Supervised Video Transformer
:star:code
TransMix: Attend To Mix for Vision Transformers
:star:code
CMT: Convolutional Neural Networks Meet Vision Transformers
:star:code
形状补全
ShapeFormer: Transformer-based Shape Completion via Sparse Representation
:star:code:house:project

14.Video

Improving Video Model Transfer With Dynamic Representation Learning
动作分割
- Unsupervised Activity Segmentation by Joint Representation Learning and Online Clustering
  :tv:video
- Weakly-Supervised Online Action Segmentation in Multi-View Instructional Videos
- Unsupervised Action Segmentation by Joint Representation Learning and Online Clustering
动作理解
- How Do You Do It? Fine-Grained Action Understanding with Pseudo-Adverbs
  :star:code
- Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos
  :star:code
Video Copy Detection(视频拷贝检测)
- A Large-scale Comprehensive Dataset and Copy-overlap Aware Evaluation Protocol for Segment-level Video Copy Detection
  :star:code
视频合成
- Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning
  :star:code
- Playable Environments: Video Manipulation in Space and Time
  :star:code:house:project
- 3D Moments from Near-Duplicate Photos
  :house:project
- Neural 3D Video Synthesis From Multi-View Video
  :open_mouth:oral:house:project
视频异常检测
- Generative Cooperative Learning for Unsupervised Video Anomaly Detection
- Bayesian Nonparametric Submodular Video Partition for Robust Anomaly Detection
- Deep Anomaly Discovery From Unlabeled Videos via Normality Advantage and Self-Paced Refinement
- UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection
  :star:code
视频监控
- 轨迹预测
视频时刻检索和视频高光检测
- UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection
  :star:code
- Learning Pixel-Level Distinctions for Video Highlight Detection
- Contrastive Learning for Unsupervised Video Highlight Detection
  :star:code
视频时刻检索
- AxIoU: An Axiomatically Justified Measure for Video Moment Retrieval
视频预测
- STRPM: A Spatiotemporal Residual Predictive Model for High-Resolution Video Prediction
- Continual Predictive Learning from Videos
  :open_mouth:oral:star:code
- SimVP: Simpler yet Better Video Prediction
  :star:code:newspaper:解读
- Comparing Correspondences: Video Prediction With Correspondence-Wise Losses
  :star:code:house:project
视频个体计数
- DR.VIC: Decomposition and Reasoning for Video Individual Counting
  :star:code
视频插值
- Many-to-many Splatting for Efficient Video Frame Interpolation
  :star:code
- TimeReplayer: Unlocking the Potential of Event Cameras for Video Interpolation
  :house:project
- Long-term Video Frame Interpolation via Feature Propagation
- Time Lens++: Event-based Frame Interpolation with Parametric Non-linear Flow and Multi-scale Fusion
  :house:project
视觉对应（视频）
- Locality-Aware Inter-and Intra-Video Reconstruction for Self-Supervised Correspondence Learning
  :star:code
视频识别
- BEVT: BERT Pretraining of Video Transformers
  :star:code
  :newspaper:视频Transformer自监督预训练新范式，复旦、微软云AI实现视频识别新SOTA
- MLP-3D: A MLP-like 3D Architecture with Grouped Time Mixing
  :star:code:newspaper:解读
- MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition
  :newspaper:将模型的记忆保存下来！Meta&UC Berkeley提出MeMViT，建模时间支持比现有模型长30倍，计算量仅增加4.5%
- Multiview Transformers for Video Recognition
  :star:code
- Group Contextualization for Video Recognition
  :star:code
- AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition
  :star:code
视频分类
- 零样本视频分类
  - Alignment-Uniformity aware Representation Learning for Zero-shot Video Classification
    :star:code
- 视频动作分类
  - Learning To Recognize Procedural Activities With Distant Supervision
    :star:code
视频预测
- Modular Action Concept Grounding in Semantic Video Prediction
  :house:project
- 手部动作预测
  - Joint Hand Motion and Interaction Hotspots Prediction from Egocentric Videos
    :house:project:tv:video
视频分割
- Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation
  :star:code
- VSS
  - Scene Consistency Representation Learning for Video Scene Segmentation
    :star:code
    :newspaper:解读1
    :newspaper:解读2
- VOS
- 视频实例分割(VIS)
  - Efficient Video Instance Segmentation via Tracklet Query and Proposal
    :house:project:tv:video:newspaper:粗解
  - Temporally Efficient Vision Transformer for Video Instance Segmentation
    :open_mouth:oral:star:code:newspaper:解读
  - VISOLO: Grid-Based Space-Time Aggregation for Efficient Online Video Instance Segmentation
    :star:code
  - Multi-Level Representation Learning With Semantic Alignment for Referring Video Object Segmentation
- 视频语义分割
  - Coarse-to-Fine Feature Mining for Video Semantic Segmentation
    :star:code
- 视频全景分割
  - Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation
    :open_mouth:oral:star:code:newspaper:解读
  - Slot-VPS: Object-Centric Representation Learning for Video Panoptic Segmentation
    :star:code
  - Large-Scale Video Panoptic Segmentation in the Wild: A Benchmark
    :star:code
视频影像处理
- 视频恢复
  - Neural Global Shutter: Learn to Restore Video from a Rolling Shutter Camera with Global Reset Feature
    :star:code
  - Neural Compression-Based Feature Learning for Video Restoration
- 视频修复
- 视频去摩尔纹
  - Video Demoireing with Relation-Based Temporal Consistency
    :house:project:tv:video
- 视频去模糊
  - Multi-Scale Memory-Based Video Deblurring
    :star:code
- 视频去噪
  - Dancing under the stars: video denoising in starlight
    :star:code
- 电影修复
  - Bringing Old Films Back to Life
    :star:code
视频表征学习
- TransRank: Self-supervised Video Representation Learning via Ranking-based Transformation Recognition
  :open_mouth:oral:star:code:newspaper:解读
- Motion-Aware Contrastive Video Representation Learning via Foreground-Background Merging
  :star:code
- Motion-Adjustable Neural Implicit Video Representation
- 自监督视频表征学习
- 视频对比学习
  - Probabilistic Representations for Video Contrastive Learning
视频分解
- Deformable Sprites for Unsupervised Video Decomposition
  :open_mouth:oral:house:project
视频阴影检测
- Video Shadow Detection via Spatio-Temporal Interpolation Consistency Training
  :star:code
视频帧插值
- IFRNet: Intermediate Feature Refine Network for Efficient Frame Interpolation
  :star:code
  :newspaper:解读
- Video Frame Interpolation with Transformer
  :star:code
  :newspaper:解读
- Video Frame Interpolation Transformer
  :star:code
- Optimizing Video Prediction via Video Frame Interpolation
- ST-MFNet: A Spatio-Temporal Multi-Flow Network for Frame Interpolation
  :star:code
视频理解
- Revisiting the "Video" in Video-Language Understanding
  :open_mouth:oral:star:code
- Long-Short Temporal Contrastive Learning of Video Transformers
- 通用事件边界检测(视频理解)
视频字幕
- End-to-End Generative Pretraining for Multimodal Video Captioning
  :newspaper:谷歌多模态预训练框架：视频字幕、动作分类、问答全部实现SOTA
- Hierarchical Modular Network for Video Captioning
  :star:code
- SwinBERT: End-to-End Transformers With Sparse Attention for Video Captioning
  :star:code
- EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching
  :star:code
视频重构
- E2V-SDE: From Asynchronous Events to Fast and Continuous Video Reconstruction via Neural Stochastic Differential Equations
- Context-Aware Video Reconstruction for Rolling Shutter Cameras
  :star:code:newspaper:解读
视频相似度评估
- Tencent-MVSE: A Large-Scale Benchmark Dataset for Multi-Modal Video Similarity Evaluation
  :house:project
视频摘要
- Joint Video Summarization and Moment Localization by Cross-Task Sample Transfer
  :house:project
- IntentVizor: Towards Generic Query Guided Interactive Video Summarization
  :star:code
视频编解码
- OCSampler: Compressing Videos to One Clip With Single-Step Sampling
- Learning Based Multi-Modality Image and Video Compression
- Coarse-To-Fine Deep Video Coding With Hyperprior-Guided Mode Prediction
- LSVC: A Learning-Based Stereo Video Compression Framework
视频建模
- Stand-Alone Inter-Frame Attention in Video Models
  :star:code
  :newspaper:解读
视频段落定位
- Semi-Supervised Video Paragraph Grounding With Contrastive Encoder
句子定位
- Weakly Supervised Temporal Sentence Grounding With Gaussian-Based Contrastive Proposal Learning
  :star:code
序列验证
- SVIP: Sequence VerIfication for Procedures in Videos
  :house:project
视频编辑
- M3L: Language-Based Video Editing via Multi-Modal Multi-Level Transformers
视频视觉关系检测
- VRDFormer: End-to-End Video Visual Relation Detection With Transformers
  :star:code
视频动作推理
- Complex Video Action Reasoning via Learnable Markov Logic Network
视频重建
- Event-based Video Reconstruction via Potential-assisted Spiking Neural Network
  :star:code:house:project

13.GAN

12.Image-to-Image Translation(图像到图像翻译)

11.Face(人脸)

Synthetic Generation of Face Videos With Plethysmograph Physiology
:house:project
Protecting Celebrities with Identity Consistency Transformer
PhysFormer: Facial Video-based Physiological Measurement with Temporal Difference Transformer
:star:code
How Much Does Input Data Type Impact Final Face Model Accuracy?
HP-Capsule: Unsupervised Face Part Discovery by Hierarchical Parsing Capsule Network
Learning To Listen: Modeling Non-Deterministic Dyadic Facial Motion
:star:code:house:project
Estimating Structural Disparities for Face Models
General Facial Representation Learning in a Visual-Linguistic Manner
:open_mouth:oral:star:code
Deepfake
- Voice-Face Homogeneity Tells Deepfake
  :star:code:newspaper:粗解
妆容迁移
- Protecting Facial Privacy: Generating Adversarial Identity Masks via Style-robust Makeup Transfer
  :star:code
人脸识别
- Local-Adaptive Face Recognition via Graph-based Meta-Clustering and Regularized Adaptation
- Killing Two Birds with One Stone:Efficient and Robust Training of Face Recognition CNNs by Partial FC
  :star:code
- AdaFace: Quality Adaptive Margin for Face Recognition
  :open_mouth:oral:star:code
- Killing Two Birds With One Stone: Efficient and Robust Training of Face Recognition CNNs by Partial FC
  :star:code
- Learning To Learn Across Diverse Data Biases in Deep Face Recognition
- Simulated Adversarial Testing of Face Recognition Models
- Privacy-Preserving Online AutoML for Domain-Specific Face Detection
- An Efficient Training Approach for Very Large Scale Face Recognition
  :star:code
人脸表情识别
- Towards Semi-Supervised Deep Facial Expression Recognition with An Adaptive Confidence Margin
  :star:code
- Multi-Dimensional, Nuanced and Subjective - Measuring the Perception of Facial Expressions
- Face2Exp: Combating Data Biases for Facial Expression Recognition
  :star:code
- Neural Emotion Director: Speech-Preserving Semantic Control of Facial Expressions in "In-the-Wild" Videos
  :open_mouth:oral:star:code:house:project
三维人像
- RigNeRF: Fully Controllable Neural 3D Portraits
3D人脸
- ImFace: A Nonlinear 3D Morphable Face Model with Implicit Neural Representations
- Learning to Restore 3D Face from In-the-Wild Degraded Images
  :newspaper:解读
活体检测
- PatchNet: A Simple Face Anti-Spoofing Framework via Fine-Grained Patch Recognition
- Domain Generalization via Shuffled Style Assembly for Face Anti-Spoofing
  :star:code
假脸检测
- Exploring Frequency Adversarial Attacks for Face Forgery Detection
  :newspaper:粗解
- Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection
- End-to-End Reconstruction-Classification Learning for Face Forgery Detection
  :newspaper:解读
- Learning Second Order Local Anomaly for General Face Forgery Detection
- Protecting Celebrities From DeepFake With Identity Consistency Transformer
  :star:code
人脸交换
- High-resolution Face Swapping via Latent Semantics Disentanglement
  :star:code
- Region-Aware Face Swapping
- Smooth-Swap: A Simple Enhancement for Face-Swapping With Smoothness
人脸属性分类
- Fair Contrastive Learning for Facial Attribute Classification
  :star:code
Face Relighting(人脸重照光)
- Face Relighting with Geometrically Consistent Shadows
  :star:code
人脸编辑
- TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing
  :star:code:house:project
- FENeRF: Face Editing in Neural Radiance Fields
  :star:code:house:project
人脸幻构
- Escaping Data Scarcity for High-Resolution Heterogeneous Face Hallucination
Deepfake检测
- Detecting Deepfakes with Self-Blended Images
  :open_mouth:oral:star:code
- DeepFake Disrupter: The Detector of DeepFake Is My Friend
人脸重建
- JIFF: Jointly-aligned Implicit Face Function for High Quality Single View Clothed Human Reconstruction
  :star:code:house:project:newspaper:解读
- 人脸三维重建
  - Generating Diverse 3D Reconstructions From a Single Occluded Face Image
    :star:code
人脸捕捉
- EMOCA: Emotion Driven Monocular Face Capture and Animation
  :house:project
换头
- Few-Shot Head Swapping in the Wild
  :open_mouth:oral:star:code:house:project:tv:video:newspaper:解读
人像畸变矫正
- Semi-Supervised Wide-Angle Portraits Correction by Multi-Scale Transformer
  :star:code:newspaper:解读
3D人脸建模
- Physically-guided Disentangled Implicit Rendering for 3D Face Modeling
  :newspaper:解读
人脸修复
- Blind Face Restoration via Integrating Face Shape and Generative Priors
  :star:code
  :newspaper:解读
- Rethinking Deep Face Restoration
- RestoreFormer: High-Quality Blind Face Restoration From Undegraded Key-Value Pairs
  :star:code
- Learning to Restore 3D Face from In-the-Wild Degraded Images
人脸对齐
- Sparse Local Patch Transformer for Robust Face Alignment and Landmarks Inherent Relation Learning
  :star:code
- Occlusion-Robust Face Alignment Using a Viewpoint-Invariant Hierarchical Network Architecture
  :star:code
语音驱动的3D脸部动画
- FaceFormer: Speech-Driven 3D Facial Animation with Transformers
  :star:code:house:project
舌头三维重建
- 3D Human Tongue Reconstruction From Single "In-the-Wild" Images
  :star:code
伪造图像检测
- Robust Image Forgery Detection Over Online Social Network Shared Images
  :star:code
人脸解析
- Decoupled Multi-Task Learning With Cyclical Self-Regulation for Face Parsing
  :star:code
人脸表情
- Robust Egocentric Photo-Realistic Facial Expression Transfer for Virtual Reality
人脸检测
- MogFace: Towards a Deeper Appreciation on Face Detection
  :star:code
人脸重现
- Dual-Generator Face Reenactment
  :star:code
说话人脸生成
- Talking Face Generation With Multilingual TTS
  :house:project
- Expressive Talking Head Generation With Granular Audio-Visual Control
人脸关键点
- Towards Accurate Facial Landmark Detection via Cascaded Transformers
人脸变形
- FaceVerse: A Fine-Grained and Detail-Controllable 3D Face Morphable Model From a Hybrid Dataset
  :star:code
3D人脸表情合成
- Sparse to Dense Dynamic 3D Facial Expression Generation
  :star:code
语音驱动的动画舌头
- Speech Driven Tongue Animation
  :star:code:house:project
文本-人脸
- AnyFace: Free-Style Text-To-Face Synthesis and Manipulation
面部动作单元识别
- Knowledge-Driven Self-Supervised Representation Learning for Facial Action Unit Recognition
人脸验证
- DeepFace-EMD: Re-Ranking Using Patch-Wise Earth Mover's Distance Improves Out-of-Distribution Face Identification
  :star:code

10.3D(三维视觉)

Disentangled3D: Learning a 3D Generative Model with Disentangled Geometry and Appearance from Monocular Images
Depth-Guided Sparse Structure-from-Motion for Movies and TV Shows
:star:code
3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive Selection
:open_mouth:oral:star:code:newspaper:解读
Physical Simulation Layer for Accurate 3D Modeling
φ-SfT: Shape-from-Template with a Physics-Based Deformation Model
:house:project
ICON: Implicit Clothed Humans Obtained From Normals
:star:code:house:project
Representing 3D Shapes With Probabilistic Directed Distance Fields
Improving Neural Implicit Surfaces Geometry With Patch Warping
:star:code
LOLNerf: Learn From One Look
:house:project
Neural Mesh Simplification
Extracting Triangular 3D Models, Materials, and Lighting From Images
:open_mouth:oral:star:code:house:project
PlanarRecon: Real-Time 3D Plane Detection and Reconstruction From Posed Monocular Videos
:star:code:house:project
The Wanderings of Odysseus in 3D Scenes
:star:code:house:project
Volumetric Bundle Adjustment for Online Photorealistic Scene Capture
Stereo Merging
- PSMNet: Position-aware Stereo Merging Network for Room Layout Estimation
- GraftNet: Towards Domain Generalized Stereo Matching with a Broad-Spectrum and Task-Oriented Feature
  :star:code
- Degradation-agnostic Correspondence from Resolution-asymmetric Stereo
- Practical Stereo Matching via Cascaded Recurrent Network with Adaptive Correlation
  :open_mouth:oral:star:code:newspaper:解读
stereo matching
- Chitransformer: Towards Reliable Stereo From Cues
  :star:code
- Uniform Subdivision of Omnidirectional Camera Space for Efficient Spherical Stereo Matching
- FoggyStereo: Stereo Matching With Fog Volume Representation
- ITSA: An Information-Theoretic Approach to Automatic Shortcut Avoidance and Domain Generalization in Stereo Matching Networks
  :star:code
深度估计
- OmniFusion: 360 Monocular Depth Estimation via Geometry-Aware Fusion
  :open_mouth:oral:star:code
- NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation
  :star:code:house:project
- 🐦️Toward Practical Self-Supervised Monocular Indoor Depth Estimation
- P3Depth: Monocular Depth Estimation with a Piecewise Planarity Prior
  :star:code
- HiMODE: A Hybrid Monocular Omnidirectional Depth Estimation Model
- Multi-Frame Self-Supervised Depth with Transformers
- Layered Depth Refinement with Mask Guidance
  :house:project
- 360MonoDepth: High-Resolution 360deg Monocular Depth Estimation
  :star:code:house:project
- Towards Multimodal Depth Estimation from Light Fields
- Multi-Frame Self-Supervised Depth with Transformers
- Exploiting Pseudo Labels in a Self-Supervised Learning Framework for Improved Monocular Depth Estimation
- Rethinking Depth Estimation for Multi-View Stereo: A Unified Representation
  :star:code
- Multi-View Depth Estimation by Fusing Single-View Depth Probability With Multi-View Geometry
  :open_mouth:oral:star:code
- Toward Practical Monocular Indoor Depth Estimation
- Single-Stage 3D Geometry-Preserving Depth Estimation Model Training on Dataset Mixtures With Uncalibrated Stereo Data
- Stereo Depth From Events Cameras: Concentrate and Focus on the Future
  :star:code
- Depth Estimation by Combining Binocular Stereo and Monocular Structured-Light
  :star:code
- CroMo: Cross-Modal Learning for Monocular Depth Estimation
- Deep Depth From Focus With Differential Focus Volume
- Gated2Gated: Self-Supervised Depth Estimation From Gated Images
  :star:code
房间布局
- LGT-Net: Indoor Panoramic Room Layout Estimation with Geometry-Aware Transformer Network
  :star:code:newspaper:粗解
MVS
- RayMVSNet: Learning Ray-based 1D Implicit Fields for Accurate Multi-View Stereo
- TransMVSNet: Global Context-aware Multi-view Stereo Network with Transformers
  :star:code:newspaper:解读
- Non-parametric Depth Distribution Modelling based Depth Inference for Multi-view Stereo
  :star:code
- IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo
  :star:code
- Generalized Binary Search Network for Highly-Efficient Multi-View Stereo
  :star:code
- Differentiable Stereopsis: Meshes From Multiple Views Using Differentiable Rendering
  :star:code:house:project
- Efficient Multi-View Stereo by Iterative Dynamic Cost Volume
  :star:code
- MVS2D: Efficient Multi-View Stereo via Attention-Driven 2D Convolutions
  :star:code
- MVPS
  - Uncertainty-Aware Deep Multi-View Photometric Stereo
三维重建
- PlaneMVS: 3D Plane Reconstruction from Multi-View Stereo
- Self-supervised Neural Articulated Shape and Appearance Models
  :house:project
- BNV-Fusion: Dense 3D Reconstruction using Bi-level Neural Volume Fusion
- Topologically-Aware Deformation Fields for Single-View 3D Reconstruction
  :star:code:house:project
- Pre-train, Self-train, Distill: A simple recipe for Supersizing 3D Reconstruction
  :star:code:house:project:newspaper:解读
- What's in your hands? 3D Reconstruction of Generic Objects in Hands
  :star:code:house:project:tv:video:newspaper:解读
- Surface Reconstruction from Point Clouds by Learning Predictive Context Priors
  :star:code
- FvOR: Robust Joint Shape and Pose Optimization for Few-view Object Reconstruction
  :star:code
  :newspaper:解读
- KeyTr: Keypoint Transporter for 3D Reconstruction of Deformable Objects in Videos
- SPAMs: Structured Implicit Parametric Models
  :house:project:tv:video
- Enhancing Face Recognition With Self-Supervised 3D Reconstruction
- Neural Fields As Learnable Kernels for 3D Reconstruction
  :house:project
- Input-Level Inductive Biases for 3D Reconstruction
- Human-Aware Object Placement for Visual Environment Reconstruction
  :star:code:house:project
- Gradient-SDF: A Semi-Implicit Surface Representation for 3D Reconstruction
  :star:code
- OcclusionFusion: Occlusion-Aware Motion Estimation for Real-Time Dynamic 3D Reconstruction
  :star:code:house:project
- 三维场景重建
  - Neural 3D Scene Reconstruction with the Manhattan-world Assumption
    :open_mouth:oral:star:code:house:project:tv:video:newspaper:解读
  - StyleMesh: Style Transfer for Indoor 3D Scene Reconstructions
    :star:code:house:project:tv:video
  - PhotoScene: Photorealistic Material and Lighting Transfer for Indoor Scenes
    :star:code
  - Look Outside the Room: Synthesizing a Consistent Long-Term 3D Scene Video From a Single Image
    :star:code:house:project
  - NeRFusion: Fusing Radiance Fields for Large-Scale Scene Reconstruction
    :star:code:house:project
- 手物重建
  - Collaborative Learning for Hand and Object Reconstruction with Attention-guided Graph Convolution
- 三维服装网格重建
  - Registering Explicit to Implicit: Towards High-Fidelity Garment mesh Reconstruction from Single Images
    :house:project
  - Photorealistic Monocular 3D Reconstruction of Humans Wearing Clothing
    :house:project
- 三维网格重建
  - Neural Template: Topology-aware Reconstruction and Disentangled Generation of 3D Meshes
    :star:code:newspaper:解读
三维形状重建
- 3D Shape Reconstruction from 2D Images with Disentangled Attribute Flow
  :star:code
- GIFS: Neural Implicit Function for General Shape Representation
  :house:project
三维服装变形
- SNUG: Self-Supervised Neural Dynamic Garments
  :open_mouth:oral:star:code
纹理迁移与合成
- AUV-Net: Learning Aligned UV Maps for Texture Transfer and Synthesis
  :star:code:house:project:tv:video
形状匹配
- A Scalable Combinatorial Solver for Elastic Geometrically Consistent 3D Shape Matching
  :star:code
- Deep Orientation-Aware Functional Maps: Tackling Symmetry Issues in Shape Matching
  :star:code
表面重建
- Critical Regularizations for Neural Surface Reconstruction in the Wild
- POCO: Point Convolution for Surface Reconstruction
  :star:code
- Neural RGB-D Surface Reconstruction
多视图网格重建
- Multi-View Mesh Reconstruction With Neural Deferred Shading
3D形状分析
- Medial Spectral Coordinates for 3D Shape Analysis
- Learning Deep Implicit Functions for 3D Shapes with Dynamic Code Clouds
  :star:code
三维补全
- AutoSDF: Shape Priors for 3D Completion, Reconstruction and Generation
  :star:code:house:project
图像重建
- Image Based Reconstruction of Liquids from 2D Surface Detections
  :star:code
PS
- Fast Light-Weight Near-Field Photometric Stereo
预测三维物体形状
- Learning 3D Object Shape and Layout Without 3D Supervision
  :house:project
三维形状
- 3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces
  :star:code
神经三维内容生成
- StyleSDF: High-Resolution 3D-Consistent Image and Geometry Generation
  :house:project
深度补全
- RGB-Depth Fusion GAN for Indoor Depth Completion
- GuideFormer: Transformers for Image Guided Depth Completion
- Learning Robust Image-Based Rendering on Sparse Scene Geometry via Depth Completion
线段重建
- ELSR: Efficient Line Segment Reconstruction With Planes and Points Guidance
形状重建
- Topology-Preserving Shape Reconstruction and Registration via Neural Diffeomorphic Flow
  :star:code
3D形状生成
- Towards Implicit Text-Guided 3D Shape Generation
  :star:code
- 3D狗的形状
  - BARC: Learning to Regress 3D Dog Shape from Images by Exploiting Breed Information
    :house:project
3D Part Segmentation
- AutoGPart: Intermediate Supervision Search for Generalizable 3D Part Segmentation
3D语义场景完成
- MonoScene: Monocular 3D Semantic Scene Completion
  :star:code

9.Human Pose Estimation(人体姿态估计)

COAP: Compositional Articulated Occupancy of People
:star:code:house:project:tv:video:newspaper:解读
Context-Aware Sequence Alignment using 4D Skeletal Augmentation
:open_mouth:oral:star:code:house:project
Generalizable Human Pose Triangulation
Location-Free Human Pose Estimation
:newspaper:解读
Meta Agent Teaming Active Learning for Pose Estimation
Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation
:star:code
多人姿态估计
- Learning Local-Global Contextual Adaptation for Multi-Person Pose Estimation
  :star:code
- End-to-End Multi-Person Pose Estimation With Transformers
  :star:code
- Contextual Instance Decoupling for Robust Multi-Person Pose Estimation
  :star:code
基于视频的HPE
- Temporal Feature Alignment and Mutual Information Maximization for Video-Based Human Pose Estimation
  :open_mouth:oral:star:code
3D pose
- MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video
  :star:code
- PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision
  :open_mouth:oral:star:code
- Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose Estimation
  :house:project
- Single-Stage Is Enough: Multi-Person Absolute 3D Pose Estimation
- Distribution-Aware Single-Stage Models for Multi-Person 3D Pose Estimation
  :newspaper:精准高效估计多人3D姿态，美图&北航分布感知式单阶段模型
- Forecasting Characteristic 3D Poses of Human Actions
  :tv:video
- Ray3D: Ray-Based 3D Human Pose Estimation for Monocular Absolute 3D Localization
  :star:code
- Estimating Egocentric 3D Human Pose in the Wild With External Weak Supervision
  :house:project
- ElePose: Unsupervised 3D Human Pose Estimation by Predicting Camera Elevation and Learning Normalizing Flows on 2D Poses
  :star:code
- MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation
  :star:code
- PoseKernelLifter: Metric Lifting of 3D Human Pose Using Sound
- Capturing Humans in Motion: Temporal-Attentive 3D Human Pose and Shape Estimation From Monocular Video
  :star:code:house:project
- GraFormer: Graph-Oriented Transformer for 3D Pose Estimation
- AdaptPose: Cross-Dataset Adaptation for 3D Human Pose Estimation by Learnable Motion Generation
- MetaPose: Fast 3D Pose From Multiple Views Without 3D Supervision
  :star:code:house:project
- Keypoint Transformer: Solving Joint Identification in Challenging Hands and Object Interactions for Accurate 3D Pose Estimation
4D 人体捕获
- H4D: Human 4D Modeling by Learning Neural Compositional Representation
  :star:code:house:project
运动捕捉
- Neural MoCon: Neural Motion Control for Physically Plausible Human Motion Capture
  :house:project
- A Low-Cost & Real-Time Motion Capture System
  :tv:video
- LiDARCap: Long-Range Marker-Less 3D Human Motion Capture With LiDAR Point Clouds
手臂-手部动态估计
- Spatial-Temporal Parallel Transformer for Arm-Hand Dynamic Estimation
3D人体形状
- OSSO: Obtaining Skeletal Shape from Outside
  :star:code:house:project:tv:video:newspaper:解读
Dense correspondence
- BodyMap: Learning Full-Body Dense Correspondence Map
  :house:project
3D人体运动重建
- Differentiable Dynamics for Articulated 3d Human Motion Reconstruction
三维人体姿态重建
- Trajectory Optimization for Physics-Based Reconstruction of 3d Human Pose from Monocular Video
- Putting People in their Place: Monocular Regression of 3D People in Depth
  :star:code:newspaper:解读
人体网格恢复
- Human Mesh Recovery From Multiple Shots
  :star:code:house:project
- Occluded Human Mesh Recovery
  :house:project
- GLAMR: Global Occlusion-Aware Human Mesh Recovery With Dynamic Cameras
  :open_mouth:oral:star:code:house:project
人体运动描述
- Programmatic Concept Learning for Human Motion Description and Synthesis
  :house:project
三维人体动作
- Generating Diverse and Natural 3D Human Motions From Text
  :star:code:house:project
三维人体合成
- Surface-Aligned Neural Radiance Fields for Controllable 3D Human Synthesis
  :star:code:house:project
HSC
- Capturing and Inferring Dense Full-Body Human-Scene Contact
  :star:code:house:project:tv:video
3D人体运动合成
- Towards Diverse and Natural Scene-Aware 3D Human Motion Synthesis
人体重建
- DoubleField: Bridging the Neural Surface and Radiance Fields for High-Fidelity Human Reconstruction and Rendering
  :house:project
- SMPL-A: Modeling Person-Specific Deformable Anatomy
- SelfRecon: Self Reconstruction Your Digital Avatar From Monocular Video
  :open_mouth:oral:star:code
手部姿态
- 手部网格重建
  - MobRecon: Mobile-Friendly Hand Mesh Reconstruction From Monocular Image
    :star:code
- 3D手部姿势
  - Mining Multi-View Information: A Strong Self-Supervised Framework for Depth-Based 3D Hand Pose and Mesh Estimation
    :star:code
- 音频驱动的手势重演
  - Audio-driven Neural Gesture Reenactment with Video Motion Graphs
    :star:code
- 3D手重建
  - LISA: Learning Implicit Shape and Appearance of Hands
    :house:project
- 手部跟踪
  - Whose Hands Are These? Hand Detection and Hand-Body Association in the Wild
  - Forward Propagation, Backward Regression, and Pose Association for Hand Tracking in the Wild
- 手势生成
  - Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation
    :star:code:house:project
- 3D手网格估计
  - HandOccNet: Occlusion-Robust 3D Hand Mesh Estimation Network
    :star:code
三维人体
- Accurate 3D Body Shape Regression Using Metric and Semantic Attributes

8.Action Detection(人体动作检测与识别)

动作检测
- Colar: Effective and Efficient Online Action Detection by Consulting Exemplars
  :star:code
- Learnable Irrelevant Modality Dropout for Multimodal Action Recognition on Modality-Specific Annotated Videos
- End-to-End Semi-Supervised Learning for Video Action Detection
- SPAct: Self-supervised Privacy Preservation for Action Recognition
  :star:code
- Temporal Alignment Networks for Long-term Video
  :open_mouth:oral:star:code:house:project:newspaper:粗解
- SOS! Self-supervised Learning Over Sets Of Handled Objects In Egocentric Action Recognition
- GateHUB: Gated History Unit With Background Suppression for Online Action Detection
- MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection
  :star:code
  :newspaper:MS-TCT：Inria&SBU提出用于动作检测的多尺度时间Transformer，效果SOTA！已开源！（CVPR2022）
- Look for the Change: Learning Object States and State-Modifying Actions From Untrimmed Web Videos
  :house:project
- Uncertainty-Guided Probabilistic Transformer for Complex Action Recognition
- Learning From Temporal Gradient for Semi-Supervised Action Recognition
  :star:code
- DirecFormer: A Directed Attention in Transformer Approach to Robust Action Recognition
  :star:code
- Interact Before Align: Leveraging Cross-Modal Knowledge for Domain Adaptive Action Recognition
- Object-Relation Reasoning Graph for Action Recognition
- Revisiting Skeleton-Based Action Recognition
  :open_mouth:oral:star:code
- InfoGCN: Representation Learning for Human Skeleton-Based Action Recognition
- E2(GO)MOTION: Motion Augmented Event Stream for Egocentric Action Recognition
  :star:code
- End-to-End Semi-Supervised Learning for Video Action Detection
  :star:code
- Stochastic Backpropagation: A Memory Efficient Strategy for Training Video Models
  :open_mouth:oral
- TubeR: Tubelet Transformer for Video Action Detection
  :open_mouth:oral:house:project
- 半监督动作识别
  - Cross-Model Pseudo-Labeling for Semi-Supervised Action Recognition
    :house:project
- 零样本动作识别
  - Cross-modal Representation Learning for Zero-shot Action Recognition
    :star:code
    零样本动作识别：跨模态表示学习
- 小样本动作识别
  - Hybrid Relation Guided Set Matching for Few-shot Action Recognition
    :star:code:newspaper:解读
  - Motion-Modulated Temporal Fragment Alignment Network for Few-Shot Action Recognition
  - Spatio-Temporal Relation Modeling for Few-Shot Action Recognition
    :star:code
- 时序动作检测
  - An Empirical Study of End-to-End Temporal Action Detection
    :star:code:newspaper:粗解
  - RCL: Recurrent Continuous Localization for Temporal Action Detection
时序动作定位
重复动作计数
- TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting
  :open_mouth:oral:star:code:house:project
组动作识别
- Dual-AI: Dual-path Action Interaction Learning for Group Activity Recognition
  :open_mouth:oral
- Detector-Free Weakly Supervised Group Activity Recognition
  :star:code:house:project
动作质量评估
- FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment
  :open_mouth:oral:star:code:house:project:newspaper:解读
活动识别
- Audio-Adaptive Activity Recognition Across Video Domains
  :star:code:house:project
- 群体活动识别
  - Dual-AI: Dual-path Actor Interaction Learning for Group Activity Recognition
    :house:project

7.Point Cloud(点云)

Shape-invariant 3D Adversarial Point Clouds
:star:code
AziNorm: Exploiting the Radial Symmetry of Point Cloud for Azimuth-Normalized 3D Perception
:star:code
REGTR: End-to-end Point Cloud Correspondences with Transformers
:star:code
Equivariant Point Cloud Analysis via Learning Orientations for Message Passing
:star:code
Text2Pos: Text-to-Point-Cloud Cross-Modal Localization
:star:code:house:project
Deformation and Correspondence Aware Unsupervised Synthetic-to-Real Scene Flow Estimation for Point Clouds
:star:code
Self-Supervised Arbitrary-Scale Point Clouds Upsampling via Implicit Neural Representation
:star:code:newspaper:解读
3DeformRS: Certifying Spatial Deformations on Point Clouds
:star:code
Reconstructing Surfaces for Sparse Point Clouds with On-Surface Priors
:star:code:newspaper:解读
Density-preserving Deep Point Cloud Compression
:star:code:house:project:newspaper:解读
Surface Representation for Point Clouds
:open_mouth:oral:star:code
:newspaper:解读1
:newspaper:解读2
Neural Points: Point Cloud Representation With Neural Fields for Arbitrary Upsampling
:star:code
Point Cloud Pre-Training With Natural 3D Structures
:star:code:house:project
Not All Points Are Equal: Learning Highly Efficient Point-Based Detectors for 3D LiDAR Point Clouds
:star:code
Point2Cyl: Reverse Engineering 3D Objects from Point Clouds to Extrusion Cylinders
RigidFlow: Self-Supervised Scene Flow Learning on Point Clouds by Local Rigidity Prior
PatchFormer: An Efficient Point Transformer With Patch Attention
PhyIR: Physics-Based Inverse Rendering for Panoramic Indoor Images
Point Cloud Color Constancy
:star:code
Multimodal Colored Point Cloud to Image Alignment
No Pain, Big Gain: Classify Dynamic Point Cloud Sequences With Static Models by Fitting Feature-Level Space-Time Surfaces
:star:code
Domain Adaptation on Point Clouds via Geometry-Aware Implicits
:star:code
ZZ-Net: A Universal Rotation Equivariant Architecture for 2D Point Clouds
3DAC: Learning Attribute Compression for Point Clouds
RCP: Recurrent Closest Point for Point Cloud
:star:code
Self-Supervised Global-Local Structure Modeling for Point Cloud Domain Adaptation With Reliable Voted Pseudo Labels
DiGS: Divergence Guided Shape Implicit Neural Representation for Unoriented Point Clouds
:star:code:house:project
The Devil Is in the Pose: Ambiguity-Free 3D Rotation-Invariant Learning via Pose-Aware Convolution
3D 点云
- Point-BERT: Pre-Training 3D Point Cloud Transformers With Masked Point Modeling
  :star:code
- CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding
  :star:code:newspaper:粗解
  CrossPoint，一个用于 3D 点云表征学习的简单自监督学习框架。虽然该方法是在合成的三维物体数据集上训练的，但在下游任务中的实验结果，如三维物体分类和三维物体部分分割，在合成和真实世界的数据集中都证明了该方法在学习可迁移表征方面的有效性。
- IDEA-Net: Dynamic 3D Point Cloud Interpolation via Deep Embedding Alignment
  :star:code
- A Unified Query-based Paradigm for Point Cloud Understanding
  :star:code
- WarpingGAN: Warping Multiple Uniform Priors for Adversarial 3D Point Cloud Generation
  :star:code
- 3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds
- Robust Structured Declarative Classifiers for 3D Point Clouds: Defending Adversarial Attacks With Implicit Gradients
  :house:project
- Why Discard if You Can Recycle?: A Recycling Max Pooling Module for 3D Point Cloud Analysis
- Upright-Net: Learning Upright Orientation for 3D Point Cloud
3D点云分割
- Stratified Transformer for 3D Point Cloud Segmentation
  :star:code
点云分类
- ART-Point: Improving Rotation Robustness of Point Cloud Classifiers via Adversarial Rotation
  :star:code:newspaper:粗解 :notebook:
点云配准
- SC^2-PCR: A Second Order Spatial Compatibility for Efficient and Robust Point Cloud Registration
  :star:code
  :newspaper:二阶相似性测度，让传统配准方法取得比深度学习更好的性能，并达到深度学习的速度
- Multi-Instance Point Cloud Registration by Efficient Correspondence Clustering
  :star:code
- Deterministic Point Cloud Registration via Novel Transformation Decomposition
  :newspaper:解读
- SC2-PCR: A Second Order Spatial Compatibility for Efficient and Robust Point Cloud Registration
  :star:code
- Geometric Transformer for Fast and Robust Point Cloud Registration
  :star:code
点云补全
- Learning a Structured Latent Space for Unsupervised Point Cloud Completion
- Learning Local Displacements for Point Cloud Completion
- LAKe-Net: Topology-Aware Point Cloud Completionby Localizing Aligned Keypoints
  :newspaper:粗解
- LAKe-Net: Topology-Aware Point Cloud Completion by Localizing Aligned Keypoints
点云分割
- Contrastive Boundary Learning for Point Cloud Segmentation
  :star:code:newspaper:解读
- SemAffiNet: Semantic-Affine Transformation for Point Cloud Segmentation
  :star:code:newspaper:解读
- An MIL-Derived Transformer for Weakly Supervised Point Cloud Segmentation
  :star:code
- Pyramid Architecture for Multi-Scale Processing in Point Cloud Segmentation
  :star:code
点云匹配
- Lepard: Learning Partial Point Cloud Matching in Rigid and Deformable Scenes
  :star:code
场景流估计
- RCP: Recurrent Closest Point for Scene Flow Estimation on 3D Point Clouds
点云理解
- PointCLIP: Point Cloud Understanding by CLIP
  :star:code

6.Object Tracking(目标跟踪)

TCTrack: Temporal Contexts for Aerial Tracking
:star:code:newspaper:粗解
:newspaper:TCTrack: 用于空中跟踪的时序信息框架
Correlation-Aware Deep Tracking
Global Tracking Transformers
:star:code
Unified Transformer Tracker for Object Tracking
:star:code
Global Tracking via Ensemble of Local Trackers
:star:code
Unsupervised Learning of Accurate Siamese Tracking
:star:code
Transformer Tracking with Cyclic Shifting Window Attention
:star:code
Transformer 跟踪：循环为一窗口注意力模型。该算法在五个数据集VOT2020, UAV123, LaSOT, TrackingNet, GOT-10k上均实现了新的SOTA.
Tracking People by Predicting 3D Appearance, Location and Pose
:open_mouth:oral:star:code:house:project
Cannot See the Forest for the Trees: Aggregating Multiple Viewpoints to Better Classify Objects in Videos
:star:code
Opening Up Open World Tracking
:open_mouth:oral:star:code:house:project
Transforming Model Prediction for Tracking
:star:code
PyMiceTracking: An Open-Source Toolbox for Real-Time Behavioral Neuroscience Experiments
:star:code
Spiking Transformers for Event-Based Single Object Tracking
:star:code
Correlation-Aware Deep Tracking
MixFormer: End-to-End Tracking With Iterative Mixed Attention
:open_mouth:oral:star:code
PTTR: Relational 3D Point Cloud Object Tracking With Transformer
:star:code
GridShift: A Faster Mode-Seeking Algorithm for Image Segmentation and Object Tracking
:star:code
3D 目标跟踪
- Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds
  :star:code:newspaper:粗解
- Iterative Corresponding Geometry: Fusing Region and Depth for Highly Efficient 3D Tracking of Textureless Objects
  :star:code
- BCOT: A Markerless High-Precision 3D Object Tracking Benchmark
  :star:code
多目标跟踪
- Learning of Global Objective for Network Flow in Multi-Object Tracking
- MeMOT: Multi-Object Tracking with Memory
  :open_mouth:oral
- Multi-Object Tracking Meets Moving UAV
- Adiabatic Quantum Computing for Multi Object Tracking
- Towards Discriminative Representation: Multi-View Trajectory Contrastive Learning for Online Multi-Object Tracking
- LMGP: Lifted Multicut Meets Geometry Projections for Multi-Camera Multi-Object Tracking
  :star:code
- TrackFormer: Multi-Object Tracking With Transformers
  :star:code
- DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion
  :star:code
RGB-T跟踪
- Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline
  :house:project:newspaper:解读
视觉跟踪
- Ranking-Based Siamese Visual Tracking
  :star:code:newspaper:解读
夜间跟踪
- Unsupervised Domain Adaptation for Nighttime Aerial Tracking
  :star:code
人类运动跟踪
- Physical Inertial Poser (PIP): Physics-Aware Real-Time Human Motion Tracking From Sparse Inertial Sensors
  :star:code:house:project
多人姿态跟踪
- PoseTrack21: A Dataset for Person Search, Multi-Object Tracking and Multi-Person Pose Tracking
  :star:code

5.Object Detection(目标检测)

DN-DETR: Accelerate DETR Training by Introducing Query DeNoising
:star:code:newspaper:粗解
Overcoming Catastrophic Forgetting in Incremental Object Detection via Elastic Response Distillation
:star:code
ESCNet: Gaze Target Detection with the Understanding of 3D Scenes
:star:code
Segment and Complete: Defending Object Detectors Against Adversarial Patch Attacks With Robust Patch Detection
:star:code
Interactron: Embodied Adaptive Object Detection
:star:code
Beyond Bounding Box: Multimodal Knowledge Learning for Object Detection
以往目标检测往往以目标包围框作为标注训练，作者引入语言提示信息，提炼语言知识到目标检测模型中，获得了1.6~2.1%的性能增益。
Dynamic Sparse R-CNN
Unknown-Aware Object Detection: Learning What You Don't Know from Videos in the Wild
:star:code:newspaper:粗解
Focal and Global Knowledge Distillation for Detectors
:star:code:newspaper:解读
关于目标检测的知识蒸馏工作，只需要30行代码就可以在 anchor-base, anchor-free 的单阶段、两阶段各种检测器上稳定涨点，现在代码已经开源。
Group R-CNN for Weakly Semi-supervised Object Detection with Points
:star:code
:newspaper:解读
Real-time Object Detection for Streaming Perception
:star:code:newspaper:解读
Ev-TTA: Test-Time Adaptation for Event-Based Object Recognition
Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model
:star:code
Optimal Correction Cost for Object Detection Evaluation
Expanding Low-Density Latent Regions for Open-Set Object Detection
:star:code
:newspaper:解读
SIOD: Single Instance Annotated Per Category Per Image for Object Detection
:star:code
:newspaper:解读
Task-specific Inconsistency Alignment for Domain Adaptive Object Detection
:star:code
Zero-Query Transfer Attacks on Context-Aware Object Detectors
AdaMixer: A Fast-Converging Query-Based Object Detector
:open_mouth:oral:star:code
Learning to Detect Mobile Objects from LiDAR Scans Without Labels
:star:code
Forecasting from LiDAR via Future Object Detection
:star:code
Target-aware Dual Adversarial Learning and a Multi-scenario Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection
:open_mouth:oral:star:code
Multi-Granularity Alignment Domain Adaptation for Object Detection
:star:code
Proper Reuse of Image Classification Features Improves Object Detection
:star:code
R(Det)^2: Randomized Decision Routing for Object Detection
Towards Robust Adaptive Object Detection under Noisy Annotations
:star:code
Entropy-based Active Learning for Object Detection with Progressive Diversity Constraint
Target-Relevant Knowledge Preservation for Multi-Source Domain Adaptive Object Detection
Interactive Segmentation and Visualization for Tiny Objects in Multi-megapixel Images
:star:code
Cross Domain Object Detection by Target-Perceived Dual Branch Distillation
:star:code
跨域目标检测：目标感知双分支蒸馏
Progressive End-to-End Object Detection in Crowded Scenes
:star:code
:newspaper:解读
HCSC: Hierarchical Contrastive Selective Coding
:star:code
:newspaper:CNN自监督预训练新SOTA：上交、Mila、字节联合提出具有层级结构的图像表征自学习新框架
Recurrent Glimpse-based Decoder for Detection with Transformer
:open_mouth:oral:star:code
:newspaper:解读
Continual Object Detection via Prototypical Task Correlation Guided Gating Mechanism
:star:code
Balanced and Hierarchical Relation Learning for One-Shot Object Detection
:star:code
Accelerating DETR Convergence via Semantic-Aligned Matching
:star:code
DETReg: Unsupervised Pretraining With Region Priors for Object Detection
:star:code:house:project
Source-Free Object Detection by Learning To Overlook Domain Style
DESTR: Object Detection With Split Transformer
SmartAdapt: Multi-Branch Object Detection Framework for Videos on Mobiles
Explore Spatio-Temporal Aggregation for Insubstantial Object Detection: Benchmark Dataset and Baseline
:star:code
Exploring Endogenous Shift for Cross-Domain Detection: A Large-Scale Benchmark and Perturbation Suppression Network
Not All Labels Are Equal: Rationalizing the Labeling Costs for Training Object Detection
:star:code
Training Object Detectors From Scratch: An Empirical Study in the Era of Vision Transformer
Sequential Voting With Relational Box Fields for Active Object Detection
:star:code:house:project
Simple Multi-dataset Detection
:star:code
ObjectFormer for Image Manipulation Detection and Localization
A Dual Weighting Label Assignment Scheme for Object Detection
:star:code
Point-Level Region Contrast for Object Detection Pre-Training
:open_mouth:oral
Neural Volumetric Object Selection
:house:project
Confidence Propagation Cluster: Unleash Full Potential of Object Detectors
Single-Domain Generalized Object Detection in Urban Scene via Cyclic-Disentangled Self-Distillation
:star:code
DetectorDetective: Investigating the Effects of Adversarial Examples on Object Detectors
:tv:video
Cross-Domain Adaptive Teacher for Object Detection
:star:code:house:project
End-to-End Human-Gaze-Target Detection With Transformers
小目标检测
- QueryDet: Cascaded Sparse Query for Accelerating High-Resolution Small Object Detection
  :star:code
- Interactive Multi-Class Tiny-Object Detection
  :star:code
- ISNet: Shape Matters for Infrared Small Target Detection
  :star:code
  :newspaper:解读
零样本目标检测
- Robust Region Feature Synthesizer for Zero-Shot Object Detection
  :star:code
小样本目标检测
- Sylph: A Hypernetwork Framework for Incremental Few-shot Object Detection
- Few-Shot Object Detection with Fully Cross-Transformer
- Kernelized Few-Shot Object Detection With Efficient Integral Aggregation
  :star:code
- Label, Verify, Correct: A Simple Few Shot Object Detection Method
  :star:code:house:project
目标定位
- Weakly Supervised Object Localization as Domain Adaption
  :star:code:newspaper:粗解
- Bridging the Gap between Classification and Localization for Weakly Supervised Object Localization
- Object Localization under Single Coarse Point Supervision
  :star:code
  :newspaper:解读
- CREAM: Weakly Supervised Object Localization via Class RE-Activation Mapping
  :star:code
- Spatial Commonsense Graph for Object Localisation in Partial Scenes
  :house:project
  :star:code:house:project
3D目标检测
- Point Density-Aware Voxels for LiDAR 3D Object Detection
  :star:code
- A Versatile Multi-View Framework for LiDAR-based 3D Object Detection with Guidance from Panoptic Segmentation
- Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection From Point Clouds
  :star:code
- Pseudo-Stereo for Monocular 3D Object Detection in Autonomous Driving
  :star:code:newspaper:粗解
- Rope3D: TheRoadside Perception Dataset for Autonomous Driving and Monocular 3D Object Detection Task
  :house:project
- Point2Seq: Detecting 3D Objects as Sequences
  :star:code
- MonoDETR: Depth-aware Transformer for Monocular 3D Object Detection
  :star:code
- Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes
  :star:code
  :newspaper:粗解
- Exploring Geometric Consistency for Monocular 3D Object Detection
- LiDAR Snowfall Simulation for Robust 3D Object Detection
  :open_mouth:oral:star:code
- CAT-Det: Contrastively Augmented Transformer for Multi-modal 3D Object Detection
- Homography Loss for Monocular 3D Object Detection
- HyperDet3D: Learning a Scene-conditioned 3D Object Detector
- DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection
  :star:code
- OccAM's Laser: Occlusion-based Attribution Maps for 3D Object Detectors on LiDAR Data
  :star:code
- Focal Sparse Convolutional Networks for 3D Object Detection
  :open_mouth:oral:star:code:newspaper:解读 :notebook:
- Rotationally Equivariant 3D Object Detection
  :house:project
- Bridged Transformer for Vision and Point Cloud 3D Object Detection
  :newspaper:解读
- Sparse Fuse Dense: Towards High Quality 3D Detection with Depth Completion
  :open_mouth:oral:star:code
  :newspaper:解读
- VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention
  :star:code
  :newspaper:华南理工提出VISTA：双跨视角空间注意力机制实现3D目标检测SOTA，即插即用
- Diversity Matters: Fully Exploiting Depth Clues for Reliable Monocular 3D Object Detection
  :open_mouth:oral
- MonoDTR: Monocular 3D Object Detection With Depth-Aware Transformer
  :star:code
- Voxel Field Fusion for 3D Object Detection
  :star:code
  :newspaper:解读
- DisARM: Displacement Aware Relation Module for 3D Detection
  :star:code
- Back to Reality: Weakly-supervised 3D Object Detection with Shape-guided Label Enhancement
  :star:code
- Embracing Single Stride 3D Object Detector With Sparse Transformer
  :star:code
- 3D-VField: Adversarial Augmentation of Point Clouds for Domain Generalization in 3D Object Detection
  :house:project
- Dimension Embeddings for Monocular 3D Object Detection
- MonoJSG: Joint Semantic and Geometric Cost Volume for Monocular 3D Object Detection
  :star:code
- RBGNet: Ray-Based Grouping for 3D Object Detection
  :star:code
- LIFT: Learning 4D LiDAR Image Fusion Transformer for 3D Object Detection
- SS3D: Sparsely-Supervised 3D Object Detection From Point Cloud
- DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection
  :star:code
- MonoGround: Detecting Monocular 3D Objects From the Ground
  :star:code
- TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection With Transformers
  :star:code
- Boosting 3D Object Detection by Simulating Multimodality on Point Clouds
伪装目标检测
- Zoom In and Out: A Mixed-scale Triplet Network for Camouflaged Object Detection
  :star:code
- Detecting Camouflaged Object in Frequency Domain
- Implicit Motion Handling for Video Camouflaged Object Detection
  :house:project
- Segment, Magnify and Reiterate: Detecting Camouflaged Objects the Hard Way
  :star:code
全监督目标检测
- Omni-DETR: Omni-Supervised Object Detection with Transformers
  :star:code
自监督目标检测
- Self-Supervised Object Detection From Audio-Visual Correspondence
半监督目标检测
- Dense Learning based Semi-Supervised Object Detection
  :star:code:newspaper:解读
- Label Matching Semi-Supervised Object Detection
  :star:code
- Semi-Supervised Object Detection via Multi-Instance Alignment With Global Class Prototypes
- Active Teacher for Semi-Supervised Object Detection
  :star:code
- Scale-Equivalent Distillation for Semi-Supervised Object Detection
- Unbiased Teacher v2: Semi-Supervised Object Detection for Anchor-Free and Anchor-Based Detectors
- MUM: Mix Image Tiles and UnMix Feature Tiles for Semi-Supervised Object Detection
  :star:code
弱监督目标检测
- Salvage of Supervision in Weakly Supervised Object Detection
- Background Activation Suppression for Weakly Supervised Object Localization
  :star:code
- H2FA R-CNN: Holistic and Hierarchical Feature Alignment for Cross-Domain Weakly Supervised Object Detection
  :star:code
显著目标检测
- Pyramid Grafting Network for One-Stage High Resolution Saliency Detection
  :star:code:newspaper:解读
  :newspaper:超高分辨率显著目标检测，新颖高效的错层嫁接架构PGNet（CVPR2022）
- Learning from Pixel-Level Noisy Label : A New Perspective for Light Field Saliency Detection
  :star:code:newspaper:解读
- Bi-directional Object-context Prioritization Learning for Saliency Ranking
  :star:code
- Multi-Source Uncertainty Mining for Deep Unsupervised Saliency Detection
- Learning From Pixel-Level Noisy Label: A New Perspective for Light Field Saliency Detection
  :star:code
密集目标检测
- Revisiting AP Loss for Dense Object Detection: Adaptive Ranking Pair Selection
  :star:code
Co-Salient目标检测
- Democracy Does Matter: Comprehensive Feature Mining for Co-Salient Object Detection
  :star:code
- Can You Spot the Chameleon? Adversarially Camouflaging Images From Co-Salient Object Detection
  :star:code
长尾目标检测
- C2AM Loss: Chasing a Better Decision Boundary for Long-Tail Object Detection
- Equalized Focal Loss for Dense Long-Tailed Object Detection
  :star:code
- Adaptive Hierarchical Representation Learning for Long-Tailed Object Detection
旋转目标检测
- OSKDet: Orientation-Sensitive Keypoint Localization for Rotated Object Detection
关键点检测
- Self-Supervised Equivariant Learning for Oriented Keypoint Detection
  :star:code:house:project
- UKPGAN: A General Self-Supervised Keypoint Detector
  :star:code
  :newspaper:粗解
- Contour-Hugging Heatmaps for Landmark Detection
  :star:code
- Few-Shot Keypoint Detection With Uncertainty Learning for Unseen Species
- 关键点发现
  - Self-Supervised Keypoint Discovery in Behavioral Videos
    :star:code:house:project
object discovery
- Discovering Objects that Can Move
Affordance grounding
- Learning Affordance Grounding from Exocentric Images
  :star:code:newspaper:解读
- Self-Supervised Transformers for Unsupervised Object Discovery using Normalized Cut
  :star:code:house:project
图像对齐
- Unsupervised Homography Estimation with Coplanarity-Aware GAN
  :star:code:newspaper:解读
物体属性识别
- Disentangling Visual Embeddings for Attributes and Objects
  :open_mouth:oral:star:code
消影点检测
- Deep vanishing point detection: Geometric priors make dataset variations vanish
  :star:code
红外探测线
- Infrared Invisible Clothing: Hiding From Infrared Detectors at Multiple Angles in Real World
  :open_mouth:oral
OOD
- Deep Hybrid Models for Out-of-Distribution Detection
- Rethinking Reconstruction Autoencoder-Based Out-of-Distribution Detection
- Evading the Simplicity Bias: Training a Diverse Set of Models Discovers Solutions with Superior OOD Generalization
  :sunflower:dataset
- PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures
  :star:code
- The Two Dimensions of Worst-Case Training and Their Integrated Effect for Out-of-Domain Generalization
- Out-of-Distribution Generalization With Causal Invariant Transformations
- ViM: Out-Of-Distribution with Virtual-logit Matching
  :star:code
- OoD-Bench: Quantifying and Understanding Two Dimensions of Out-of-Distribution Generalization
  :star:code
- Neural Mean Discrepancy for Efficient Out-of-Distribution Detection
开放世界目标检测
- OW-DETR: Open-world Detection Transformer
  :star:code
域适应目标检测
- SIGMA: Semantic-Complete Graph Matching for Domain Adaptive Object Detection
  :star:code
密集目标检测
- Localization Distillation for Dense Object Detection
  :star:code
图像复制检测
- A Self-Supervised Descriptor for Image Copy Detection
  :star:code
变化检测
- Dual Task Learning by Leveraging Both Dense Correspondence and Mis-Correspondence for Robust Change Detection With Imperfect Matches
  :star:code
图像识别
- AdaViT: Adaptive Vision Transformers for Efficient Image Recognition
  :star:code

4.Image Captioning(图像字幕)

3.Image Progress(图像处理)

图像恢复
- Attentive Fine-Grained Structured Sparsity for Image Restoration
  :star:code:newspaper:解读
- Uformer: A General U-Shaped Transformer for Image Restoration
  :star:code
- Burst Image Restoration and Enhancement
  :open_mouth:oral:star:code
- BNUDC: A Two-Branched Deep Neural Network for Restoring Images From Under-Display Cameras
- Restormer: Efficient Transformer for High-Resolution Image Restoration
  :open_mouth:oral:star:code
- TransWeather: Transformer-Based Restoration of Images Degraded by Adverse Weather Conditions
  :star:code
- Deep Generalized Unfolding Networks for Image Restoration
  :star:code
- Self-Supervised Deep Image Restoration via Adaptive Stochastic Gradient Langevin Dynamics
  :star:code
- All-in-One Image Restoration for Unknown Corruption
  :star:code
- Exploring and Evaluating Image Restoration Potential in Dynamic Scenes
  :star:code
- KNN Local Attention for Image Restoration
图像修复
- Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding
  :star:code:newspaper:粗解
- MAT: Mask-Aware Transformer for Large Hole Image Inpainting
  :star:code
- Reduce Information Loss in Transformers for Pluralistic Image Inpainting
  :star:code
- UniCoRN: A Unified Conditional Image Repainting Network
- Dual-Path Image Inpainting With Auxiliary GAN Inversion
- MISF: Multi-Level Interactive Siamese Filtering for High-Fidelity Image Inpainting
  :star:code
图像拼接
- Deep Rectangling for Image Stitching: A Learning Baseline
  :open_mouth:oral:star:code:newspaper:粗解
- utomatic Color Image Stitching Using Quaternion Rank-1 Alignment
- Geometric Structure Preserving Warp for Natural Image Stitching
  :star:code
运动去模糊
- Unifying Motion Deblurring and Frame Interpolation with Events
  :star:code
image outpainting
- Diverse Plausible 360-Degree Image Outpainting for Efficient 3DCG Background Creation
  :house:project
图像美学评估
- Personalized Image Aesthetics Assessment with Rich Attributes
  :house:project
图像质量评估
- Incorporating Semi-Supervised and Positive-Unlabeled Learning for Boosting Full Reference Image Quality Assessment
  :star:code:newspaper:解读
图像去雨
- Towards Robust Rain Removal Against Adversarial Attacks: A Comprehensive Benchmark Analysis and Beyond
  :star:code
- Dreaming To Prune Image Deraining Networks
图像去模糊
- Learning to Deblur using Light Field Generated and Real Defocus Images
  :star:code:house:project
- Pixel Screening Based Intermediate Correction for Blind Deblurring
- Deblurring via Stochastic Refinement
- XYDeblur: Divide and Conquer for Single Image Deblurring
- Towards Multi-Domain Single Image Dehazing via Test-Time Training
图像压缩
- SASIC: Stereo Image Compression With Latent Shifts and Stereo Attention
  :star:code
- Global Sensing and Measurements Reuse for Image Compressed Sensing
  :star:code
- DPICT: Deep Progressive Image Compression Using Trit-Planes
  :open_mouth:oral:star:code
- Joint Global and Local Hierarchical Priors for Learned Image Compression
  :star:code
- Neural Data-Dependent Transform for Learned Image Compression
  :star:code:house:project
- LC-FDNet: Learned Lossless Image Compression With Frequency Decomposition Network
  :star:code
- ELIC: Efficient Learned Image Compression With Unevenly Grouped Space-Channel Contextual Adaptive Coding
  :open_mouth:oral
- Deep Stereo Image Compression via Bi-Directional Coding
- Unified Multivariate Gaussian Mixture for Efficient Neural Image Compression
  :star:code
- The Devil Is in the Details: Window-Based Attention for Image Compression
  :star:code
图像无损压缩
- PILC: Practical Image Lossless Compression With an End-to-End GPU Oriented Neural Framework
图像去噪
- CVF-SID: Cyclic multi-Variate Function for Self-Supervised Image Denoising by Disentangling Noise from Image
  :star:code
- NAN: Noise-Aware NeRFs for Burst-Denoising
- Blind2Unblind: Self-Supervised Image Denoising With Visible Blind Spots
  :star:code
- AP-BSN: Self-Supervised Denoising for Real-World Images via Asymmetric PD and Blind-Spot Network
  :star:code
- RePaint: Inpainting Using Denoising Diffusion Probabilistic Models
  :star:code
- Noise Distribution Adaptive Self-Supervised Image Denoising Using Tweedie Distribution and Score Matching
图像去雾
- Image Dehazing Transformer with Transmission-Aware 3D Position Embedding
  :house:project
De-rendering
- Learning sRGB-to-Raw-RGB De-rendering with Content-Aware Metadata
  :star:code:newspaper:解读
- De-Rendering 3D Objects in the Wild
  :star:code
- IDR: Self-Supervised Image Denoising via Iterative Data Refinement
  :star:code
- RADU: Ray-Aligned Depth Update Convolutions for ToF Data Denoising
  :star:code
- Self-augmented Unpaired Image Dehazing via Density and Depth Decomposition
  :star:code
  :newspaper:解读
  :newspaper:D4：非成对图像去雾，基于密度与深度分解的自增强方法（CVPR 2022）
图像增强
- Toward Fast, Flexible, and Robust Low-Light Image Enhancement
  :open_mouth:oral:star:code:newspaper:解读
  :newspaper:SCI：快速、灵活与稳健的低光照图像增强方法（CVPR 2022 Oral）
- AdaInt: Learning Adaptive Intervals for 3D Lookup Tables on Real-time Image Enhancement
  :star:code
- Directional Self-supervised Learning for Heavy Image Augmentations
  :star:code
  :newspaper:解读
- Abandoning the Bayer-Filter To See in the Dark
  :star:code
- URetinex-Net: Retinex-Based Deep Unfolding Network for Low-Light Image Enhancement
  :star:code
- GIQE: Generic Image Quality Enhancement via Nth Order Iterative Degradation
- Deep Color Consistent Network for Low-Light Image Enhancement
- SNR-Aware Low-Light Image Enhancement
  :star:code
图像和谐化
- SCS-Co: Self-Consistent Style Contrastive Learning for Image Harmonization
  :star:code
- High-Resolution Image Harmonization via Collaborative Dual Transformations
  :star:code
图像超级补全
- Scene Graph Expansion for Semantics-Guided Image Outpainting
  该文解决了一个非常有意思的问题，通过对图像场景图的扩展，对图像边缘以外的内容进行语义引导的内容生成，可帮助设计师快速绘就自然和谐的图像扩展内容。
语义图像匹配
- TransforMatcher: Match-to-Match Attention for Semantic Correspondence
  :star:code:house:project
  :newspaper:解读
图像修饰
- ABPN: Adaptive Blend Pyramid Network for Real-Time Local Retouching of Ultra High-Resolution Photo
  :star:code
图像着色
- Style-Structure Disentangled Features and Normalizing Flows for Diverse Icon Colorization
图像校正
- EvUnroll: Neuromorphic Events Based Rolling Shutter Image Correction
  :star:code
图像分解
- PIE-Net: Photometric Invariant Edge Guided Network for Intrinsic Image Decomposition
  :star:code:house:project
图像重建
- Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction
  :star:code
- A Differentiable Two-Stage Alignment Scheme for Burst Image Reconstruction With Large Shift
  :star:code
图像配准
- A Variational Bayesian Method for Similarity Learning in Non-Rigid Image Registration
  :star:code
- NODEO: A Neural Ordinary Differential Equation Based Optimization Framework for Deformable Image Registration
- RFNet: Unsupervised Network for Mutually Reinforcing Multi-Modal Image Registration and Fusion
- Aladdin: Joint Atlas Building and Diffeomorphic Registration Learning With Pairwise Alignment
  :star:code
图像编辑
- Brain-Supervised Image Editing
图像缩放
- Towards Bidirectional Arbitrary Image Rescaling: Joint Optimization and Cycle Idempotence
图像色彩编辑
- SpaceEdit: Learning a Unified Editing Space for Open-Domain Image Color Editing
  :star:code:house:project
图像拼图
- SoftCollage: A Differentiable Probabilistic Tree Generator for Image Collage
  :star:code
图像裁剪
- Rethinking Image Cropping: Exploring Diverse Compositions From Global Views
图像补全
- Bridging Global Context Interactions for High-Fidelity Image Completion
  :star:code
基于文本指导的图像操作
- DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation
  :star:code
Image Dewarping
- Revisiting Document Image Dewarping by Grid Regularization
恶劣天气消除
- Learning Multiple Adverse Weather Removal via Two-Stage Knowledge Learning and Multi-Contrastive Regularization: Toward a Unified Model
  :star:code
Image Outpainting
- InOut: Diverse Image Outpainting via GAN Inversion
  :star:code:house:project
消除阴影
- Bijective Mapping Network for Shadow Removal
图像隐写术
- Robust Invertible Image Steganography
声音引导的语义图像处理
- Sound-Guided Semantic Image Manipulation
  :star:code:house:project
用于文本驱动的自然图像编辑
- Blended Diffusion for Text-driven Editing of Natural Images
  :star:code:house:project
伪影去除
- Self-Supervised Bulk Motion Artifact Removal in Optical Coherence Tomography Angiography

2.Image Segmentation(图像分割)

FocalClick: Towards Practical Interactive Image Segmentation
:star:code:newspaper:粗解
Multimodal Material Segmentation
Semantic-Aware Domain Generalized Segmentation
:open_mouth:oral:star:code
ReSTR: Convolution-free Referring Image Segmentation Using Transformers
:star:code:house:project
CRIS: CLIP-Driven Referring Image Segmentation
Panoptic Neural Fields: A Semantic Object-Aware Neural Scene Representation
:house:project
全景神经场：谷歌新提出的语义级目标感知的神经场景表示模型。该表示模型可以有效地用于新视图合成、2D 全景分割、3D 场景编辑和多视图深度预测等多项任务。相信这又会是一个引领潮流的新方向。
FocusCut: Diving Into a Focus View in Interactive Segmentation
:house:project
Hyperbolic Image Segmentation
:star:code
Clustering Plotted Data by Image Segmentation
:star:code
Generalizable Cross-Modality Medical Image Segmentation via Style Augmentation and Dual Normalization
:star:code
Image Segmentation Using Text and Image Prompts
:star:code
:newspaper:CLIP还能做分割任务？哥廷根大学提出一个使用文本和图像prompt，能同时作三个分割任务的模型CLIPSeg，榨干CLIP能力
ISDNet: Integrating Shallow and Deep Networks for Efficient Ultra-high Resolution Segmentation
:star:code
:newspaper:解读
Adaptive Early-Learning Correction for Segmentation From Noisy Annotations
:star:code
Weakly Supervised Segmentation on Outdoor 4D Point Clouds With Temporal Matching and Spatial Graph Propagation
Masked-Attention Mask Transformer for Universal Image Segmentation
:star:code:house:project
:newspaper:能同时做三个分割任务的模型，性能和效率优于MaskFormer！Meta&UIUC提出通用分割模型，性能优于任务特定模型！开源！
High Quality Segmentation for Ultra High-Resolution Images
:star:code
LAVT: Language-Aware Vision Transformer for Referring Image Segmentation
:newspaper:性能超群！牛津&上海AI Lab&港大&商汤&清华强强联手，提出用于引用图像分割的语言感知视觉Transformer！代码已开源
实例分割
- E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation
  :star:code:newspaper:粗解
- Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling
  :star:code
- Sparse Instance Activation for Real-Time Instance Segmentation
  :star:code
- SharpContour: A Contour-based Boundary Refinement Approach for Efficient and Accurate Instance Segmentation
  :house:project
- Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity
  :star:code:house:project
- DArch: Dental Arch Prior-assisted 3D Tooth Instance Segmentation
- Relieving Long-tailed Instance Segmentation via Pairwise Class Balance
  :star:code:newspaper:解读
- ContrastMask: Contrastive Learning to Segment Every Thing
  :newspaper:解读
  基于像素级对比学习的不完全监督实例分割算法
- GASP, a Generalized Framework for Agglomerative Clustering of Signed Graphs and Its Application to Instance Segmentation
  :star:code
- TWIST: Two-Way Inter-Label Self-Training for Semi-Supervised 3D Instance Segmentation
  :star:code
- Pointly-Supervised Instance Segmentation
  :open_mouth:oral:star:code:house:project
- Instance Segmentation With Mask-Supervised Polygonal Boundary Transformers
  :star:code
- Beyond Semantic to Instance Segmentation: Weakly-Supervised Instance Segmentation via Semantic Knowledge Transfer and Self-Refinement
  :star:code
- Sparse Object-Level Supervision for Instance Segmentation With Pixel Embeddings
  :star:code
- Mask Transfiner for High-Quality Instance Segmentation
  :star:code
- 半监督实例分割
  - Noisy Boundaries: Lemon or Lemonade for Semi-supervised Instance Segmentation?
    :star:code
- 3D 实例分割
  - SoftGroup for 3D Instance Segmentation on Point Clouds
    :star:code:newspaper:粗解
- 🐦️FreeSOLO: Learning to Segment Objects without Annotations
  :star:code
- 小样本分割
  - iFS-RCNN: An Incremental Few-Shot Instance Segmenter
语义分割
- Generalized Few-Shot Semantic Segmentation
  :star:code
- Scribble-Supervised LiDAR Semantic Segmentation
  :open_mouth:oral:star:code
- Novel Class Discovery in Semantic Segmentation
  :star:code:house:project
- Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation
  :star:code
- Semi-Supervised Video Semantic Segmentation With Inter-Frame Feature Reconstruction
  :star:code
- Pin the Memory: Learning to Generalize Semantic Segmentation
  :star:code:newspaper:解读
- Representation Compensation Networks for Continual Semantic Segmentation
  :star:code
- Tree Energy Loss: Towards Sparsely Annotated Semantic Segmentation
  :star:code:newspaper:解读
- GroupViT: Semantic Segmentation Emerges from Text Supervision
  :star:code:house:project:tv:video
  :newspaper:做语义分割不用任何像素标签，UCSD、英伟达在ViT中加入分组模块
- Bending Reality: Distortion-aware Transformers for Adapting to Panoramic Semantic Segmentation
  :star:code:newspaper:粗解
- Deep Hierarchical Semantic Segmentation
  :star:code
- Semantic Segmentation by Early Region Proxy
  :star:code:newspaper:粗解
- SimT: Handling Open-set Noise for Domain Adaptive Semantic Segmentation
  :star:code
- Rethinking Semantic Segmentation: A Prototype View
  :open_mouth:oral:star:code
- On the Road to Online Adaptation for Semantic Image Segmentation
  :star:code
- Threshold Matters in WSSS: Manipulating the Activation for the Robust and Accurate Segmentation Model Against Thresholds
  :star:code
- NightLab: A Dual-level Architecture with Hardness Detection for Segmentation at Night
  :star:code:newspaper:解读
- TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation
  :star:code
- Cross-Image Relational Knowledge Distillation for Semantic Segmentation
  :star:code:newspaper:解读
- Dynamic Prototype Convolution Network for Few-Shot Semantic Segmentation
- Unsupervised Hierarchical Semantic Segmentation with Multiview Cosegmentation and Clustering Transformers
  :star:code
- Self-Supervised Learning of Object Parts for Semantic Segmentation
  :star:code
- Cross-view Transformers for real-time Map-view Semantic Segmentation
  :open_mouth:oral:star:code
- Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised Semantic Segmentation and Localization
  :house:project
- Point-to-Voxel Knowledge Distillation for LiDAR Semantic Segmentation
  :star:code:newspaper:解读
- Real-Time, Accurate, and Consistent Video Semantic Segmentation via Unsupervised Adaptation and Cross-Unit Deployment on Mobile Device
- Partial Class Activation Attention for Semantic Segmentation
  :star:code
- Incremental Learning in Semantic Segmentation From Image Labels
  :star:code
- HybridCR: Weakly-Supervised 3D Point Cloud Semantic Segmentation via Hybrid Contrastive Regularization
  :newspaper:解读
- ADeLA: Automatic Dense Labeling With Attention for Viewpoint Shift in Semantic Segmentation
- Domain-Agnostic Prior for Transfer Semantic Segmentation
- Class Similarity Weighted Knowledge Distillation for Continual Semantic Segmentation
- Sparse and Complete Latent Organization for Geospatial Semantic Segmentation
- 3D语义分割
  - MM-TTA: Multi-Modal Test-Time Adaptation for 3D Semantic Segmentation
    :house:project
  - Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation
    :open_mouth:oral:star:code:newspaper:解读
  - Segment-Fusion: Hierarchical Context Fusion for Robust 3D Semantic Segmentation
- 弱监督语义分割
  - Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation
    :star:code:newspaper:粗解
  - Self-supervised Image-specific Prototype Exploration for Weakly Supervised Semantic Segmentation
    :star:code
  - Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation
    :star:code
  - Cross Language Image Matching for Weakly Supervised Semantic Segmentation
    :star:code
  - Multi-class Token Transformer for Weakly Supervised Semantic Segmentation
    :star:code
  - Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers
    :star:code:newspaper:解读
  - Weakly Supervised Semantic Segmentation using Out-of-Distribution Data
    :star:code:newspaper:粗解
  - L2G: A Simple Local-to-Global Knowledge Transfer Framework for Weakly Supervised Semantic Segmentation
    :star:code
  - Weakly Supervised Semantic Segmentation by Pixel-to-Prototype Contrast
  - CLIMS: Cross Language Image Matching for Weakly Supervised Semantic Segmentation
    :star:code
  - Regional Semantic Contrast and Aggregation for Weakly Supervised Semantic Segmentation
    :star:code
  - C2AM: Contrastive Learning of Class-Agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation
    :star:code
- Towards Noiseless Object Contours for Weakly Supervised Semantic Segmentation
  :star:code
- 无监督语义分割
  - Cross-Domain Correlation Distillation for Unsupervised Domain Adaptation in Nighttime Semantic Segmentation
    :star:code
- 半监督语义分割
  - Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels
    :star:code:house:project
  - Semi-supervised Semantic Segmentation with Error Localization Network
    :star:code:house:project:newspaper:粗解
  - UCC: Uncertainty guided Cross-head Co-training for Semi-Supervised Semantic Segmentation
  - Perturbed and Strict Mean Teachers for Semi-Supervised Semantic Segmentation
    :star:code
  - Unbiased Subclass Regularization for Semi-Supervised Semantic Segmentation
    :star:code
  - ST++: Make Self-Training Work Better for Semi-Supervised Semantic Segmentation
    :star:code
- 域适应语义分割
  - Towards Fewer Annotations: Active Learning via Region Impurity and Prediction Uncertainty for Domain Adaptive Semantic Segmentation
    :star:code
  - ADAS: A Direct Adaptation Strategy for Multi-Target Domain Adaptive Semantic Segmentation
  - Class-Balanced Pixel-Level Self-Labeling for Domain Adaptive Semantic Segmentation
    :star:code
  - DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation
    :star:code
- 域泛化语义分割
  - WildNet: Learning Domain Generalized Semantic Segmentation from the Wild
    :star:code
- 零样本语义分割
  - Decoupling Zero-Shot Semantic Segmentation
    :star:code
- 小样本语义分割
  - Learning Non-target Knowledge for Few-shot Semantic Segmentation
    :star:code
    :newspaper:解读
  - Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer
- 跨域语义分割
  - Undoing the Damage of Label Shift for Cross-Domain Semantic Segmentation
    :star:code
动作分割
- Weakly-Supervised Online Action Segmentation in Multi-View Instructional Videos
- Fast and Unsupervised Action Boundary Detection for Action Segmentation
场景解析
- FLOAT: Factorized Learning of Object Attributes for Improved Multi-object Multi-part Scene Parsing
  :star:code
- Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing
  :star:code
雾景分割
- FIFO: Learning Fog-invariant Features for Foggy Scene Segmentation
  :open_mouth:oral:star:code:house:project
全景分割
- Panoptic, Instance and Semantic Relations: A Relational Context Encoder to Enhance Panoptic Segmentation
- Joint Forecasting of Panoptic Segmentations with Difference Attention
  :star:code:newspaper:解读
- PanopticDepth: A Unified Framework for Depth-aware Panoptic Segmentation
  :star:code:newspaper:解读
- Amodal Panoptic Segmentation
  :house:project
- Panoptic-PHNet: Towards Real-Time and High-Precision LiDAR Panoptic Segmentation via Clustering Pseudo Heatmap
- CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation
- Panoptic SegFormer: Delving Deeper Into Panoptic Segmentation With Transformers
  :star:code
抠图
- Human Instance Matting via Mutual Guidance and Multi-Instance Refinement
  :open_mouth:oral:star:code
- MatteFormer: Transformer-Based Image Matting via Prior-Tokens
  :star:code
玻璃分割
- Glass Segmentation Using Intensity and Spectral Polarization Cues
  :house:project
Amodal Segmentation
- Amodal Segmentation through Out-of-Task and Out-of-Distribution Generalization with a Bayesian Model
  :star:code
场景理解
- Both Style and Fog Matter: Cumulative Domain Adaptation for Semantic Foggy Scene Understanding
- ScanQA: 3D Question Answering for Spatial Scene Understanding
  :star:code
- Egocentric Scene Understanding via Multimodal Spatial Rectifier
  :star:code
人体解析
- CDGNet: Class Distribution Guided Network for Human Parsing
  :star:code
Part Segmentation
- Learning Part Segmentation through Unsupervised Domain Adaptation from Synthetic Vehicles
  :house:project
小样本分割
- GanOrCon: Are Generative Models Useful for Few-Shot Segmentation?
  :star:code:house:project
- Learning What Not To Segment: A New Perspective on Few-Shot Segmentation
  :open_mouth:oral:star:code
3D分割
- INS-Conv: Incremental Sparse Convolution for Online 3D Segmentation
  :star:code
零件分割
- PartGlot: Learning Shape Part Segmentation From Language Reference Games
  :open_mouth:oral:star:code

1.其它

扫码CV君微信（注明：CVPR）入微信交流群：

9475fa20fd5e95235d9fa23ae9587a2