WACV-2025-Papers
June 30, 2025 · View on GitHub

会议时间:2025年2月28日–3月4日
会议网址:https://wacv2025.thecvf.com/
查看2025年综述文献点这里↘️2025-CV-Surveys
2025 年论文分类汇总戳这里
↘️WACV-2025-Papers ↘️CVPR-2025-Papers ↘️ICCV-2025-Papers
2024 年论文分类汇总戳这里
↘️WACV-2024-Papers ↘️CVPR-2024-Papers ↘️ECCV-2024-Papers
2023 年论文分类汇总戳这里
2022 年论文分类汇总戳这里
2021 年论文分类汇总戳这里
2020 年论文分类汇总戳这里
❣❣❣ WACV 2025 论文分类整理已完成
:loudspeaker::loudspeaker::loudspeaker:获奖论文
:trophy:最佳论文(算法)
- RayGauss: Volumetric Gaussian-Based Ray Casting for Photorealistic Novel View Synthesis
:star:code
:house:project
:trophy:最佳论文(应用)
:trophy:最佳学生论文
:trophy:最佳学生论文荣誉提名奖
:trophy:Time of time award: (tie)
- Deeply-Learned Feature for Age Estimation
- Bayesian Multi-object Tracking Using Motion Context from Multiple Objects
目录
49.计算成像
- FaVoR: Features via Voxel Rendering for Camera Relocalization
- Dense Scene Reconstruction from Light-Field Images Affected by Rolling Shutter光场图像密集场景重建
- PrivateEye: In-Sensor Privacy Preservation Through Optical Feature Separation光学
- Solar Multimodal Transformer: Intraday Solar Irradiance Predictor using Public Cameras and Time Series
- TaCOS: Task-Specific Camera Optimization with Simulation
48.Protecting copyright(保护版权)
47.Sketch(草图)
- 3D Edge Sketch from Multiview Images
- PICASSO: A Feed-Forward Framework for Parametric Inference of CAD Sketches via Rendering Self-Supervision
- ColorizeDiffusion: Improving Reference-Based Sketch Colorization with Latent Diffusion Model
46.Sound
- SoundSil-DS: Deep Denoising and Segmentation of Sound-Field Images with Silhouettes利用轮廓对声场图像进行深度去噪和分割
- NarrAD: Automatic Generation of Audio Descriptions for Movies with Rich Narrative Context
- NowYouSee Me: Context-Aware Automatic Audio Description
- EgoSonics: Generating Synchronized Audio for Silent Egocentric Videos
:house:project - SoundLoc3D: Invisible 3D Sound Source Localization and Classification Using a Multimodal RGB-D Acoustic Camera
- Temporally Streaming Audio-Visual Synchronization for Real-World Videos
- Multimodal Interpretable Depression Analysis using Visual Physiological Audio and Textual Data
- Unsupervised Video Highlight Detection by Learning from Audio and Visual Recurrence
- VioPose: Violin Performance 4D Pose Estimation by Hierarchical Audiovisual Inference
- VMAs: Video-to-Music Generation via Semantic Alignment in Web Music Videos
45.Transformer
- LowFormer: Hardware Efficient Design for Convolutional Transformer Backbones
- Bandit Based Attention Mechanism in Vision Transformers
- AMP-ViT: Optimizing Vision Transformer Efficiency with Adaptive Mixed-Precision Post-Training Quantization
- Channel Propagation Networks for Refreshable Vision Transformer
- Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers
- SpectFormer: Frequency and Attention is What You Need in a Vision Transformer
- Image Adaptation for Colour Vision Deficient Viewers using Vision Transformers
- QuantAttack: Exploiting Quantization Techniques to Attack Vision Transformers
- TORE: Token Recycling in Vision Transformers for Efficient Active Visual Exploration
- Adversarial Attention Deficit: Fooling Deformable Vision Transformers with Collaborative Adversarial Patches
- Beyond Grids: Exploring Elastic Input Sampling for Vision Transformers
- Weight Copy and Low-Rank Adaptation for Few-Shot Distillation of Vision Transformers
44.Dense Prediction(密集预测)
- Optimizing Dense Visual Predictions Through Multi-Task Coherence and Prioritization
:star:code - Cross-Task Affinity Learning for Multitask Dense Scene Predictions密集场景预测
43.Neural Radiance Fields
- Radiance Field-Based Pose Estimation via Decoupled Optimization Under Challenging Initial Conditions
- MFNeRF: Memory Efficient NeRF with Mixed-Feature Hash Table
- GANESH: Generalizable NeRF for Lensless Imaging
:star:code - TRNeRF: Restoring Blurry Rolling Shutter and Noisy Thermal Images with Neural Radiance Fields
- BASED: Bundle-Adjusting Surgical Endoscopic Dynamic Video Reconstruction using Neural Radiance Fields
- ARF-Plus: Controlling Perceptual Factors in Artistic Radiance Fields for 3D Scene Stylization
- Self-Aligning Depth-Regularized Radiance Fields for Asynchronous RGB-D Sequences
- 新视图合成
- RendBEV: Semantic Novel View Synthesis for Self-Supervised Bird's Eye View Segmentation
- VaLID: Variable-Length Input Diffusion for Novel View Synthesis
- RayGauss: Volumetric Gaussian-Based Ray Casting for Photorealistic Novel View Synthesis
- GauFRe: Gaussian Deformation Fields for Real-Time Dynamic Novel View Synthesis
- MSI-NeRF: Linking Omni-Depth with View Synthesis through Multi-Sphere Image Aided Generalizable Neural Radiance Field
- FluoNeRF: Fluorescent Novel-View Synthesis under Novel Light Source Colors
- SGD: Street View Synthesis with Gaussian Splatting and Diffusion Prior
- 渲染
- Global-Guided Focal Neural Radiance Field for Large-Scale Scene Rendering
- OccFlowNet: Occupancy Estimation via Differentiable Rendering and Occupancy Flow
- PRoGS: Progressive Rendering of Gaussian Splats渲染
- NeuManifold: Neural Watertight Manifold Reconstruction with Efficient and High-Quality Rendering Support
42.Industrial Anomaly Detection(工业缺陷检测)
- SPACE: SPAtial-aware Consistency rEgularization for anomaly detection in Industrial applications
- Adaptive Deviation Learning for Visual Anomaly Detection with Data Contamination
- Anomaly Detection for People with Visual Impairments Using an Egocentric 360-Degree Camera
- ROADS: Robust Prompt-driven Multi-Class Anomaly Detection under Domain Shift
- FUN-AD: Fully Unsupervised Learning for Anomaly Detection with Noisy Training Data
:star:code - Single-Layer Distillation with Fourier Convolutions for Texture Anomaly Detection
- Looking at Model Debiasing through the Lens of Anomaly Detection
- AnomalyDINO: Boosting Patch-Based Few-Shot Anomaly Detection with DINOv2
- Removing Geometric Bias in One-Class Anomaly Detection with Adaptive Feature Perturbation
- 图像异常检测
- 异常定位
- 异常分割
41.Anomaly Detection(异常检测)
- 奇异值检测
- OOD
- Exploiting Inter-Sample Information for Long-Tailed Out-of-Distribution Detection
- Identity Curvature Laplace Approximation for Improved Out-of-Distribution Detection
- CRAFT: Class Ranking Aware Fine-Tuning for Enhanced Out-of-Distribution Detection
- Finding Dino: A Plug-and-Play Framework for Zero-Shot Detection of Out-of-Distribution Objects using Prototypes
- CLIPScope: Enhancing Zero-Shot OOD Detection with Bayesian Scoring
40.Deepfake
- DeCLIP: Decoding CLIP representations for deepfake localization
:star:code - Texture Shape and Order Matter: A New Transformer Design for Sequential DeepFake Detection
- AI生成图像检测
- 错误信息检测
39.Robots(机器人)
- Transferring Foundation Models for Generalizable Robotic Manipulation机器人操作
- Avatar
- Try-On
- SLAM
- 室内定位
- 视觉位置识别
38.HOI Detection(交互检测)
37.Scene(场景)
- LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph Generation with Enhanced Spatial Relations
:star:code - DDS: Decoupled Dynamic Scene-Graph Generation Network
- Enhancing Scene Graph Generation with Hierarchical Relationships and Commonsense Knowledge
- Effective Scene Graph Generation by Statistical Relation Distillation
36.Object Pose Estimation(物体姿态估计)//
35.Dataset/Benchmark(数据集/基准)
- SynDRA: Synthetic Dataset for Railway Applications
- High-Fidelity Document Stain Removal via A Large-Scale Real-World Dataset and A Memory-Augmented Transformer
- Needles & Haystacks: Dataset and Benchmark for Domain-Agnostic Image-Based Rigid Slice-to-Volume Registration
- The FineView Dataset:A 3D Scanned Multi-View Object Dataset of Fine-Grained Category Instances
- PureForest: A Large-Scale Aerial Lidar and Aerial Imagery Dataset for Tree Species Classification in Monospecific Forests
- IRIS-VIS: A New Dataset for Visibility Estimation in an Industrial Environment
- GTA-HDR: A Large-Scale Synthetic Dataset for HDR Image Reconstruction
- CT to PET Translation: A Large-scale Dataset and Domain-Knowledge-Guided Diffusion Approach
:star:code - PV-VTT: A Privacy-Centric Dataset for Mission-Specific Anomaly Detection and Natural Language Interpretation
- CLIPping Imbalances: A Novel Evaluation Baseline and PEARL Dataset for Pedestrian Attribute Recognition
- SynDroneVision: A Synthetic Dataset for Image-Based Drone Detection
- SEED4D: A Synthetic Ego--Exo Dynamic 4D Data Generator, Driving Dataset and Benchmark
:star:code
:star:code - SEED4D: A Synthetic Ego-Exo Dynamic 4D Data Generator Driving Dataset and Benchmark
- DrIFT: Autonomous Drone Dataset with Integrated Real and Synthetic Data, Flexible Views, and Transformed Domains
:star:code - DrIFT: Autonomous Drone Dataset with Integrated Real and Synthetic Data Flexible Views and Transformed Domains
- TimberVision: A Multi-Task Dataset and Framework for Log-Component Segmentation and Tracking in Autonomous Forestry Operations
:star:code - A Pipeline and NIR-Enhanced Dataset for Parking Lot Segmentation
- 3D Understanding of Deformable Linear Objects: Datasets and Transferability Benchmark
- CISOL: An Open and Extensible Dataset for Table Structure Recognition in the Construction Industry
- SANPO: A Scene Understanding Accessibility and Human Navigation Dataset
- Sign Language Recognition: A Large-Scale Multi-View Dataset and Comprehensive Evaluation
- CycleCrash: A Dataset of Bicycle Collision Videos for Collision Prediction and Analysis
- A Semantically Impactful Image Manipulation Dataset: Characterizing Image Manipulations using Semantic Significance
- 基准
- GazeSearch: Radiology Findings Search Benchmark
- ARTeFACT: Benchmarking Segmentation Models on Diverse Analogue Media Damage
:star:code - CardioSyntax: End-to-End SYNTAX Score Prediction - Dataset Benchmark and Method
- Oriented Cell Dataset: A Dataset and Benchmark for Oriented Cell Detection and Applications
- ANTHROPOS-V: Benchmarking the Novel Task of Crowd Volume Estimation
- SALVE: A 3D Reconstruction Benchmark of Wounds from Consumer-Grade Videos
- Mind the Prompt: A Novel Benchmark for Prompt-Based Class-Agnostic Counting
- VG-SSL: Benchmarking Self-Supervised Representation Learning Approaches for Visual Geo-Localization
- Multi-Label Continual Learning for the Medical Domain: A Novel Benchmark
- OpenCapBench: A Benchmark to Bridge Pose Estimation and Biomechanics
- UAL-Bench: The First Comprehensive Unusual Activity Localization Benchmark
- MIP-GAF: A MLLM-Annotated Benchmark for Most Important Person Localization and Group Context Understanding
34.Vision-Language(视觉语言)
- Active Learning for Vision-Language Models
- Active Learning for Vision Language Models
- Generalist YOLO: Towards Real-Time End-to-End Multi-Task Visual Language Models
- LLM-Generated Rewrite and Context Modulation for Enhanced Vision Language Models in Digital Pathology
- @Bench: Benchmarking Vision-Language Models for Human-centered Assistive Technology
:star:code
:house:project - Style-Pro: Style-Guided Prompt Learning for Generalizable Vision-Language Models
- Who Brings the Frisbee: Probing Hidden Hallucination Factors in Large Vision-Language Model via Causality Analysis
- Retaining and Enhancing Pre-trained Knowledge in Vision-Language Models with Prompt Ensembling
- Enhancing Vision-Language Few-Shot Adaptation with Negative Learning
- DPA: Dual Prototypes Alignment for Unsupervised Adaptation of Vision-Language Models
- Optimizing Vision-Language Model for Road Crossing Intention Estimation
- OpenCity3D: What do Vision-Language Models Know About Urban Environments?
- Automated Evaluation of Large Vision-Language Models on Self-Driving Corner Cases
- 视频语言
- VLN
- To Ask or Not to Ask? Detecting Absence of Information in Vision and Language Navigation
- Hijacking Vision-and-Language Navigation Agents with Adversarial Environmental Attacks
- GroundingMate: Aiding Object Grounding for Goal-Oriented Vision-and-Language Navigation
- ELBA: Learning by Asking for Embodied Visual Navigation and Task Completion
- LLM
- MLLM
- Can Multimodal Large Language Models Truly Perform Multimodal In-Context Learning?
- User-in-the-Loop Evaluation of Multimodal LLMs for Activity Assistance
- Multi-Modal Large Language Model with RAG Strategies in Soccer Commentary Generation
- MLLM-Tool: A Multimodal Large Language Model for Tool Agent Learning
- Multi-Modal Large Language Models are Effective Vision Learners
- Visual Grounding
- 农业+视觉语言
33.Semi/self-supervised learning(半/自监督)//
- 自监督
- 半监督
- 新类别发现
32.MC/KD/Pruning(模型压缩/知识蒸馏/剪枝)//
- A Multi-Task Supervised Compression Model for Split Computing
- KD
- On Explaining Knowledge Distillation: Measuring and Visualising the Knowledge Transfer Process
- KDC-MAE: Knowledge Distilled Contrastive Mask Auto-Encoder
- Dropout Connects Transformers and CNNs: Transfer General Knowledge for Knowledge Distillation
- InDistill: Information Flow-Preserving Knowledge Distillation for Model Compression
- Information Extraction from Heterogeneous Documents without Ground Truth Labels using Synthetic Label Generation and Knowledge Distillation
- EchoDFKD: Data-Free Knowledge Distillation for Cardiac Ultrasound Segmentation using Synthetic Data
- Comparative Knowledge Distillation
- SMDAF: A Scalable Sidewalk Material Data Acquisition Framework with Bidirectional Cross-Modal Knowledge Distillation
- ChromaDistill : Colorizing Monochrome Radiance Fields with Knowledge Distillation
- 剪枝
- VLTP: Vision-Language Guided Token Pruning for Task-Oriented Segmentation
- Shapley Consensus Deep Learning for Ensemble Pruning
- Patch Ranking: Token Pruning as Ranking Prediction for Efficient CLIP
- Pruning One More Token is Enough: Leveraging Latency-Workload Non-Linearities for Vision Transformers on the Edge
- Information Theoretic Pruning of Coupled Channels in Deep Neural Networks
- 量化
- PTQ4VM: Post-Training Quantization for Visual Mamba
:star:code - Dequantization and Color Transfer with Diffusion Models
- Data Generation for Hardware-Friendly Post-Training Quantization
- Difficulty Diversity and Plausibility: Dynamic Data-Free Quantization
- Q-TempFusion: Quantization-Aware Temporal Multi-Sensor Fusion on Bird's-Eye View Representation
- PTQ4VM: Post-Training Quantization for Visual Mamba
31.Neural Architecture Search(神经架构搜索)
- MONAS-ESNN: Multi-Objective Neural Architecture Search for Efficient Spiking Neural Networks
- Delta-NAS: Difference of Architecture Encoding for Predictor-Based Evolutionary Neural Architecture Search
30.Few/Zero-Shot Learning/DG/A(小/零样本/域泛化/域适应)
- 域泛化
- ERM++: An Improved Baseline for Domain Generalization
- Crafting Distribution Shifts for Validation and Training in Single Source Domain Generalization
- FRAUD-Net: Fraud News Detection using Sample Uncertainty & Domain Aware Generalized Network
- Domain-Generalized Object Anti-Spoofing: Bridging Gaps and Patch Selection for Robust Detection Across Domains
- Fair Domain Generalization with Heterogeneous Sensitive Attributes Across Domains
- Domain-Guided Weight Modulation for Semi-Supervised Domain Generalization
- FDS: Feedback-Guided Domain Synthesis with Multi-Source Conditional Diffusion Models for Domain Generalization
- Domain Generalization using Large Pretrained Models with Mixture-of-Adapters
- ConDiSR: Contrastive Disentanglement and Style Regularization for Single Domain Generalizatio
- 域适应
- Label Calibration in Source Free Domain Adaptation
- AH-OCDA: Amplitude-based Curriculum Learning and Hopfield Segmentation Model for Open Compound Domain Adaptation
- Feature Fusion Transferability Aware Transformer for Unsupervised Domain Adaptation
- Instance-Warp: Saliency Guided Image Warping for Unsupervised Domain Adaptation
- Memory-Efficient Pseudo-Labeling for Online Source-Free Universal Domain Adaptation using a Gaussian Mixture Model
- Combining Inherent Knowledge of Vision-Language Models with Unsupervised Domain Adaptation through Strong-Weak Guidance
- Transferable-Guided Attention is All You Need for Video Domain Adaptation
- When Cars Meet Drones: Hyperbolic Federated Learning for Source-Free Domain Adaptation in Adverse Weather
- Ad^2mix: Adversarial and Adaptive Mixup for Unsupervised Domain Adaptation
- 零样本
- Unified Framework for Open-World Compositional Zero-Shot Learning
- Just Shift It: Test-Time Prototype Shifting for Zero-Shot Generalization with Vision-Language Models
- Test-Time Low Rank Adaptation via Confidence Maximization for Zero-Shot Generalization of Vision-Language Models
- SenCLIP: Enhancing zero-shot land-use mapping for Sentinel-2 with ground-level prompting
- HOPE: A Memory-Based and Composition-Aware Framework for Zero-Shot Learning with Hopfield Network and Soft Mixture of Experts
- Learning to Identify Seen Unseen and Unknown in the Open World: A Practical Setting for Zero-Shot Learning
- PC-GZSL: Prior Correction for Generalized Zero Shot Learning
29.Deep Learning
- Prior2Posterior: Model Prior Correction for Long-Tailed Learning
- SADDLe: Sharpness-Aware Decentralized Deep Learning with Heterogeneous Data
- DNN
28.GNN/GCN
- WiGNet: Windowed Vision Graph Neural Network
- SIGNN - Star Identification using Graph Neural Networks
27.Machine Learning(机器学习)
- 度量学习
- 迁移学习
- 机器遗忘
- 增量学习
- 类增量
- Covariance-based Space Regularization for Few-shot Class Incremental Learning
- A Reality Check on Pre-training for Exemplar-free Class-Incremental Learning
- Dynamic Adapter Tuning for Long-Tailed Class-Incremental Learning
- ReFu: Recursive Fusion for Exemplar-Free 3D Class-Incremental Learning
- Strategic Base Representation Learning via Feature Augmentations for Few-Shot Class Incremental Learning
- Are Exemplar-Based Class Incremental Learning Models Victim of Black-Box Poison Attacks?
- TACLE: Task and Class-Aware Exemplar-Free Semi-Supervised Class Incremental Learning
- 主动学习
- 联邦学习
- Navigating Heterogeneity and Privacy in One-Shot Federated Learning with Diffusion Models
- Predicting Event Memorability using Personalized Federated Learning
- Achieving Byzantine-Resilient Federated Learning via Layer-Adaptive Sparsified Model Aggregation
- Identify Backdoored Model in Federated Learning via Individual Unlearning
- MLLM-LLaVA-FL: Multimodal Large Language Model Assisted Federated Learning
- 对比学习
- MOOSS: Mask-Enhanced Temporal Contrastive Learning for Smooth State Evolution in Visual Reinforcement Learning
- Tuned Contrastive Learning
- Contrastive Learning of Image Representations Guided by Spatial Relations
- CATALOG: A Camera Trap Language-Guided Contrastive Learning Model
- PLReMix: Combating Noisy Labels with Pseudo-Label Relaxed Contrastive Representation Learning
- 持续学习
- Online-LoRA: Task-free Online Continual Learning via Low Rank Adaptation
:star:code - Memory-efficient Continual Learning with Neural Collapse Contrastive
- Exploring the Stability Gap in Continual Learning: The Role of the Classification Head
- Semantic Prompting with Image Token for Continual Learning
- Towards Unbiased Continual Learning: Avoiding Forgetting in the Presence of Spurious Correlations
- EvoCL: Continual Learning over Evolving Domains
- AdaPrefix++: Integrating Adapters Prefixes and Hypernetwork for Continual Learning
- Online-LoRA: Task-free Online Continual Learning via Low Rank Adaptation
- 多任务学习
- 对抗
- FAIR-TAT: Improving Model Fairness using Targeted Adversarial Training
- PoolAtnRes: Towards Generalisable Differential Morphing Attack Detection变形攻击检测
- Knockoff Branch: Model Stealing Attack via Adding Neurons in the Pre-Trained Model
- Low-Frequency Black-Box Backdoor Attack via Evolutionary Algorithm
- Can Adversarial Examples Be Parsed to Reveal Victim Model Information?
- When Visual State Space Model Meets Backdoor Attacks后门攻击
- Pre-Trained Multiple Latent Variable Generative Models are Good Defenders Against Adversarial Attacks
26.Motion Generation(人体运动生成)
- SyncViolinist: Music-Oriented Violin Motion Generation Based on Bowing and Fingering
:star:code - GHOST: Grounded Human Motion Generation with Open Vocabulary Scene-and-Text Contexts
- Generation of Complex 3D Human Motion by Temporal and Spatial Composition of Diffusion Models
- MoRAG - Multi-Fusion Retrieval Augmented Generation for Human Motion
- UniTMGE: Uniform Text-Motion Generation and Editing Model via Diffusion
- 基于骨架的运动预测
25.Style Transfer(风格迁移)
- Meta-Learning for Color-to-Infrared Cross-Modal Style Transfer
- D-LUT: Photorealistic Style Transfer via Diffusion Process
- Mamba-ST: State Space Model for Efficient Style Transfer
24.GAN/Image Synthesis(图像生成)
- Unsupervised Single-Image Intrinsic Image Decomposition with LiDAR Intensity Enhanced Training图像分解
- ZeroComp: Zero-Shot Object Compositing from Image Intrinsics via Diffusion
- MixDiff: Mixing Natural and Synthetic Images for Robust Self-Supervised Representations合成图像
- 3D Synthesis for Architectural Design
- ARTIST: Improving the Generation of Text-Rich Images with Disentangled Diffusion Models and Large Language Models
- 360PanT: Training-Free Text-Driven 360-Degree Panorama-to-Panorama Translation文本驱动的360度全景到全景翻译
- Clarity Amidst Blur: A Deterministic Method for Synthetic Generation of Water Droplets on Camera Lenses水滴合成
- SpotDiffusion: A Fast Approach for Seamless Panorama Generation Over Time全景图生成
- Attribute Diffusion: Diffusion Driven Diverse Attribute Editing多样化属性编辑
- DiffQRCoder: Diffusion-Based Aesthetic QR Code Generation with Scanning Robustness Guided Iterative Refinement美学二维码生成
- GAN
- 图像合成
- 纹理生成
- 图像生成
- RAW-Diffusion: RGB-Guided Diffusion Models for High-Fidelity RAW Image Generation(https://github.com/SonyResearch/RAW-Diffusion)
- MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning
:star:code
:house:project - Beta Sampling is All You Need: Efficient Image Generation Strategy for Diffusion Models using Stepwise Spectral Analysis
- Refining Text-to-Image Generation: Towards Accurate Training-Free Glyph-Enhanced Image Generation
- Skip-and-Play: Depth-Driven Pose-Preserved Image Generation for Any Objects
- FineControlNet: Fine-Level Text Control for Image Generation with Spatially Aligned Text Control Injection
- 食谱生成
- 图像编辑
- Uniform Attention Maps: Boosting Image Fidelity in Reconstruction and Editing
:star:code - Diffusion-Based Conditional Image Editing through Optimized Inference with Guidance
- Dual-Schedule Inversion: Training- and Tuning-Free Inversion for Real Image Editing
- LIME: Localized Image Editing via Attention Regularization in Diffusion Models
- ReEdit: Multimodal Exemplar-Based Image Editing
- GeoDiffuser: Geometry-Based Image Editing with Diffusion Models
- DragText: Rethinking Text Embedding in Point-Based Image Editing
- Incorporating Task Progress Knowledge for Subgoal Generation in Robotic Manipulation through Image Edits
- Uniform Attention Maps: Boosting Image Fidelity in Reconstruction and Editing
- 文本-图像
- DreamBlend: Advancing Personalized Fine-tuning of Text-to-Image Diffusion Models
- Disentangling Subject-Irrelevant Elements in Personalized Text-to-Image Diffusion via Filtered Self-Distillation
- Counting Guidance for High Fidelity Text-to-Image Synthesis
- Elucidating Optimal Reward-Diversity Tradeoffs in Text-to-Image Diffusion Models
- Detecting Origin Attribution for Text-to-Image Diffusion Models
- AlignIT: Enhancing Prompt Alignment in Customization of Text-to-Image Models
- Good Seed Makes a Good Crop: Discovering Secret Seeds in Text-to-Image Diffusion Models
- Improving Faithfulness of Text-to-Image Diffusion Models through Inference Intervention
- Structured Human Assessment of Text-to-Image Generative Models
- Controlling Human Shape and Pose in Text-to-Image Diffusion Models via Domain Adaptation
- An Image is Worth Multiple Words: Multi-Attribute Inversion for Constrained Text-to-Image Synthesis
- 布局到图像生成
- 三维生成
- 文本-3D
- 图像-图像翻译
- 视频编辑
- 视频生成
- Fine-Grained Controllable Video Generation via Object Appearance and Context
- Generating Long-Take Videos via Effective Keyframes and Guidance生成长镜头视频
- Dance Any Beat: Blending Beats with Visuals in Dance Video Generation
- Corgi: Cached Memory Guided Video Generation
- TrackDiffusion: Tracklet-Conditioned Video Generation via Diffusion Models
- 视频合成
- 轮胎足迹生成
- 扩散模型
- Enhancing Image Layout Control with Loss-Guided Diffusion Models
- GeoGuide: Geometric Guidance of Diffusion Models
- Inverse Problems with Diffusion Models: A MAP Estimation Perspective
- Diffusion Model Guided Sampling with Pixel-Wise Aleatoric Uncertainty Estimation
- SimuScope: Realistic Endoscopic Synthetic Dataset Generation through Surgical Simulation and Diffusion Models
:star:code - Negative-Prompt Inversion: Fast Image Inversion for Editing with Text-Guided Diffusion Models
- SODA: Spectral Orthogonal Decomposition Adaptation for Diffusion Models
- Improving Conditional Diffusion Models through Re-Noising from Unconditional Diffusion Priors
- MemControl: Mitigating Memorization in Diffusion Models via Automated Parameter Selection
- Disentangling Disentangled Representations: Towards Improved Latent Units via Diffusion Models
- Elucidating the Solution Space of Extended Reverse-Time SDE for Diffusion Models
- DiffuseKronA: A Parameter Efficient Fine-Tuning Method for Personalized Diffusion Models
- CusConcept: Customized Visual Concept Decomposition with Diffusion Models
- CharDiff: Improving Sampling Convergence via Characteristic Function Consistency in Diffusion Models
- Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
23.Visual Question Answering(视觉问答)
- 视频问答
- 视觉问答
- CL-Cross VQA: A Continual Learning Benchmark for Cross-Domain Visual Question Answering
- AdQuestA: Knowledge-Guided Visual Question Answer Framework for Advertisements
- One VLM to Keep it Learning: Generation and Balancing for Data-Free Continual Visual Question Answering
- Unsupervised Domain Adaptive Visual Question Answering in the Era of Multi-Modal Large Language Models
- Visual Robustness Benchmark for Visual Question Answering (VQA)
- 图表问答
- 表格问答
22.OCR
- 手写文档识别
- 场景文本识别
- 场景文本编辑
- 文本变化检测
- 文本多边形检测
- 表结构识别
21.3D(三维重建\三维视觉)
- NPL-MVPS: Neural Point-Light Multi-View Photometric Stereo多视角光度立体
- HybridDepth: Robust Metric Depth Fusion by Leveraging Depth from Focus and Single-Image Priors
- LIPIDS: Learning-based Illumination Planning In Discretized (Light) Space for Photometric Stereo
- Instructive3D: Editing Large Reconstruction Models with Text Instructions
- Scene-LLM: Extending Language Model for 3D Visual Reasoning
- Towards a Training Free Approach for 3D Scene Editing
- CRAFT: Designing Creative and Functional 3D Objects
- NeRFs are Mirror Detectors: using Structural Similarity for Multi-View Mirror Scene Reconstruction with 3D Surface Primitives
- VortSDF: 3D Modeling with Centroidal Voronoi Tesselation on Signed Distance Field
- EfficientMorph: Parameter-Efficient Transformer-Based Architecture for 3D Image Registration
- 3DGS
- Planar Gaussian Splatting
- UW-GS: Distractor-Aware 3D Gaussian Splatting for Enhanced Underwater Scene Reconstruction
- EdgeGaussians - 3D Edge Mapping via Gaussian Splatting
- Localized Gaussian Splatting Editing with Contextual Awareness
- DN-Splatter: Depth and Normal Priors for Gaussian Splatting and Meshing
- ELMGS: Enhancing Memory and Computation Scalability through Compression for 3D Gaussian Splatting
- OmniGS: Fast Radiance Field Reconstruction using Omnidirectional Gaussian Splatting
- 三维重建
- Assessing the Quality of 3D Reconstruction in the Absence of Ground Truth: Application to a Multimodal Archaeological Dataset
- Multi-HexPlanes: A Lightweight Map Representation for Rendering and 3D Reconstruction
- Semantic Segmentation Method for Automated Indoor 3D Reconstruction Based on Architectural-Knowledge-Aware Features
- DreaMo: Articulated 3D Reconstruction from a Single Casual Video
- Sparse-View 3D Reconstruction of Clothed Humans via Normal Maps
- Comparative Evaluation of 3D Reconstruction Methods for Object Pose Estimation
- ALSTER: A Local Spatio-Temporal Expert for Online 3D Semantic Reconstruction
- 表面重建
- 深度估计
- GET-UP: GEomeTric-aware Depth Estimation with Radar Points UPsampling
- MDCN-PS: Monocular-Depth-Guided Coarse Normal Attention for Robust Photometric Stereo
- Revisiting Disparity from Dual-Pixel Images: Physics-Informed Lightweight Depth Estimation
- MonoPP: Metric-Scaled Self-Supervised Monocular Depth Estimation by Planar-Parallax Geometry in Automotive Applications
:star:code - OmniDiffusion: Reformulating 360 Monocular Depth Estimation using Semantic and Surface Normal Conditioned Diffusion
- Enhancing Monocular Depth Estimation with Multi-Source Auxiliary Tasks
- CabNIR: A Benchmark for In-Vehicle Infrared Monocular Depth Estimation
- 房屋布局估计
- 三维场景理解
- 三维语义场景补全
- 3D形状补全
20.Point Cloud(点云)
- BioNet and NeFF: Crop Biomass Prediction from Point Clouds to Drone Imagery
- Point Cloud Color Upsampling with Attention-Based Coarse Colorization and Refinement
- On-the-Fly Object-aware Representative Point Selection in Point Cloud
- PocoLoco: A Point Cloud Diffusion Model of Human Shape in Loose Clothing
:star:code - Test-Time Adaptation in Point Clouds: Leveraging Sampling Variation with Weight Averaging
- Point-JEPA: A Joint Embedding Predictive Architecture for Self-Supervised Learning on Point Cloud
- 3D 点云
- Test-Time Adaptation of 3D Point Clouds via Denoising Diffusion Models
:star:code - Learning under Noisy Labels Spurious Points and Diverse Structures: TS40K a 3D Point Cloud Dataset of Rural Terrain and Electrical Transmission Systems
- Learning Semantic Part-Based Graph Structure for 3D Point Cloud Domain Generalization
- Adversarial Learning Based Knowledge Distillation on 3D Point Clouds
- RGB2Point: 3D Point Cloud Generation from Single RGB Images
- Continual Learning in 3D Point Clouds: Employing Spectral Techniques for Exemplar Selection
- Test-Time Adaptation of 3D Point Clouds via Denoising Diffusion Models
- 点云分类
- 点云分割
- 点云配准
19.Video
- NeuroViG - Integrating Event Cameras for Resource-Efficient Video Grounding
- MVFNet: Multipurpose Video Forensics Network using Multiple Forms of Forensic Evidence
- GEXIA: Granularity Expansion and Iterative Approximation for Scalable Multi-Grained Video-Language Learning视频语言学习
- 视频监控
- DashCop: Automated E-Ticket Generation for Two-Wheeler Traffic Violations using Dashcam Videos使用Dashcam视频自动生成两轮车交通违章电子票
- 视频理解
- 视频时许定位
- 视频异常检测
- Graph-Jigsaw Conditioned Diffusion Model for Skeleton-Based Video Anomaly Detection
- Guess Future Anomalies from Normalcy: Forecasting Abnormal Behavior in Real-World Videos
- Distilling Aggregated Knowledge for Weakly-Supervised Video Anomaly Detection
- Discriminative Score Suppression for Weakly Supervised Video Anomaly Detection
- MissionGNN: Hierarchical Multimodal GNN-Based Weakly Supervised Video Anomaly Recognition with Mission-Specific Knowledge Graph Generation
- 视频镜像检测
- 视频时刻检索
- 视频帧插值
- 视频稳定
18.Person Re-id(行人重识别)
- 重识别
- ReMix: Training Generalized Person Re-identification on a Mixture of Data
- Re-Identifying People in Video via Learned Temporal Attention and Multi-Modal Foundation Models重识别
- VILLS : Video-Image Learning to Learn Semantics for Person Re-Identification
- AnonyNoise: Anonymizing Event Data with Smart Noise to Outsmart Re-Identification and Preserve Privacy
:star:code - 换衣重识别
- 可见光和红外重识别
- 行人搜索
- 步态识别
- VM-Gait: Multi-Modal 3D Representation Based on Virtual Marker for Gait Recognition
- GaitCloud: Leveraging Spatial-Temporal Information for LiDAR-Base Gait Recognition with A True-3D Gait Representation
- GaitContour: Efficient Gait Recognition Based on a Contour-Pose Representation
- MimicGait: A Model Agnostic Approach for Occluded Gait Recognition using Correlational Knowledge Distillation
- 人群密度完成
17.Action Detection(动作检测)
- Learning to Visually Connect Actions and their Effects
- ActionDiffusion: An Action-Aware Diffusion Model for Procedure Planning in Instructional Videos
- Inferring Past Human Actions in Homes with Abductive Reasoning
- 动作检测
- 动作识别
- 群组动作识别
- 时许动作定位
- 动作质量评估
- 社交互动识别
16.Human Pose Estimation(人体姿态估计)
- EgoCast: Forecasting Egocentric Human Pose in the Wild
- Recoverable Anonymization for Pose Estimation: A Privacy-Enhancing Approach
- 人体解析
- 人体重塑
- 人体重建
- 人体网格恢复
- 三维姿态估计
- ReMP: Reusable Motion Prior for Multi-domain 3D Human Pose Estimation and Motion Inbetweening
:star:code - LiCamPose: Combining Multi-View LiDAR and RGB Cameras for Robust Single-Timestamp 3D Human Pose Estimation
- Event-Guided Video Transformer for End-to-End 3D Human Pose Estimation
- Event-Guided Fusion-Mamba for Context-Aware 3D Human Pose Estimation
- STRIDE: Single-Video Based Temporally Continuous Occlusion-Robust 3D Pose Estimation
- BioPose: Biomechanically-Accurate 3D Pose Estimation from Monocular Videos
- ReMP: Reusable Motion Prior for Multi-domain 3D Human Pose Estimation and Motion Inbetweening
- 人体运动恢复
- 手势生成
- 手势识别
- 手部姿态估计
- 婴儿动作生成
- 手语翻译
15.Medical Image Progress(医学影响处理)
- Survival Prediction in Lung Cancer through Multi-Modal Representation Learning预测肺癌患者生存率
- Physiology-Aware PolySnake for Coronary Vessel Segmentation血管分割
- TRUST: Time-Domain Residual Unsupervised Stability Technique for Improved Heart Rate Estimation心率估计
- Attention-Guided Masked Autoencoders for Learning Image Representations
- Multi-Aperture Transformers for 3D (MAT3D) Segmentation of Clinical and Microscopic Images3D分割
- Tumor Synthesis Conditioned on Radiomics肿瘤合成
- Data Augmentation for Surgical Scene Segmentation with Anatomy-Aware Diffusion Models手术场景分割
- MFTrans: A Multi-Resolution Fusion Transformer for Robust Tumor Segmentation in Whole Slide Images肿瘤分割
- AutoProSAM: Automated Prompting SAM for 3D Multi-Organ Segmentation3D多器官分割
- Learning Anatomy-Disease Entangled Representation学习解剖学疾病纠缠表征
- GAUDA: Generative Adaptive Uncertainty-Guided Diffusion-Based Augmentation for Surgical Segmentation手术分割
- Endoscopic Scoring and Localization in Unconstrained Clinical Trial Videos内镜评分和定位
- SAM-Mamba: Mamba Guided SAM Architecture for Generalized Zero-Shot Polyp Segmentation息肉分割
- Uncertainty Awareness Enables Efficient Labeling for Cancer Subtyping in Digital Pathology数字病理学
- Federated-Continual Dynamic Segmentation of Histopathology Guided by Barlow Continuity组织病理学
- F2FLDM: Latent Diffusion Models with Histopathology Pre-Trained Embeddings for Unpaired Frozen Section to FFPE Translation
- Investigating Imaging Annotation and Self-Supervision for the Classification of Continuously Developing Cells in Histological Whole Slide Images
- PK-YOLO: Pretrained Knowledge Guided YOLO for Brain Tumor Detection in Multiplanar MRI Slices
:star:code - Volumetric Conditioning Module to Control Pretrained Diffusion Models for 3D Medical Images
:star:code - AMNCutter: Affinity-Attention-Guided Multi-View Normalized Cutter for Unsupervised Surgical Instrument Segmentation
:star:code - SAM-DA: Decoder Adapter for Efficient Medical Domain Adaptation
- Multimodal Fusion Learning with Dual Attention for Medical Imaging
:star:code - LQ-Adapter: ViT-Adapter with Learnable Queries for Gallbladder Cancer Detection from Ultrasound Image
:star:code - Reviving Poor Object Segmentations in OOD Medical Images using Variational-Deep-PCA Modeling on Segmentation Maps with Sampling-Free Learning
- Training-Free Medical Image Inverses via Bi-Level Guided Diffusion Models
- Sli2Vol+: Segmenting 3D Medical Images Based on an Object Estimation Guided Correspondence Flow Network
- Data-Efficient Alignment in Medical Imaging via Reconfigurable Generative Networks
- Multi-Resolution Guided 3D GANs for Medical Image Translation
- Relaxing Binary Constraints in Contrastive Vision-Language Medical Representation Learning
- MAISI: Medical AI for Synthetic Imaging
- FMD: Comprehensive Data Compression in Medical Domain via Fused Matching Distillation
- 皮肤病分类
- 青光眼检测
- 异常分割
- 心脏图像的合成分割
- 视网膜眼底图像增强
- 医学图像超分辨率
- 医学图像配准
- 医学图像分类
- 医学图像分割
- Generalizable Single-Source Cross-modality Medical Image Segmentation via Invariant Causal Mechanisms
:star:code - MulModSeg: Enhancing Unpaired Multi-Modal Medical Image Segmentation with Modality-Conditioned Text Embedding and Alternating Training
:star:code - Effective and Efficient Medical Image Segmentation with Hierarchical Context Interaction
- Personalized Mixture of Experts for Multi-Site Medical Image Segmentation
- Frequency-Domain Refinement of Vision Transformers for Robust Medical Image Segmentation under Degradation
- 半监督医学图像分割
- Generalizable Single-Source Cross-modality Medical Image Segmentation via Invariant Causal Mechanisms
- 医学放射科报告生成
- MRI
- Continuous Spatio-Temporal Memory Networks for 4D Cardiac Cine MRI Segmentation
- Feasibility of Federated Learning from Client Databases with Different Brain Diseases and MRI Modalities
- DiaMond: Dementia Diagnosis with Multi-Modal Vision Transformers Using MRI and PET
- MRI Reconstruction with Regularized 3D Diffusion Model (R3DM)
- MambaRecon: MRI Reconstruction with Structured State Space Models
- McCaD: Multi-Contrast MRI Conditioned Adaptive Adversarial Diffusion Model for High-Fidelity MRI Synthesis
- X-ray
- DeepCA: Deep Learning-Based 3D Coronary Artery Tree Reconstruction from Two 2D Non-Simultaneous X-ray Angiography Projections
- Self-Supervised Pre-Training with Diffusion Model for Few-Shot Landmark Detection in X-Ray Images
- Foundation X: Integrating Classification Localization and Segmentation through Lock-Release Pretraining Strategy for Chest X-ray Analysis
- TempA-VLP: Temporal-Aware Vision-Language Pretraining for Longitudinal Exploration in Chest X-ray Image
- OTCXR: Rethinking Self-Supervised Alignment using Optimal Transport for Chest X-ray Analysis
- CT
- 海马体分割
- 肺气道分割
14.Autonomous Driving(自动驾驶)
- A Generic Vehicle-to-Sensor Calibration Framework车辆到传感器校准框架
- CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving
:house:project - S3PT: Scene Semantics and Structure Guided Clustering to Boost Self-Supervised Pre-Training for Autonomous Driving
- LORD: Large Models Based Opposite Reward Design for Autonomous Driving
- DyRoNet: Dynamic Routing and Low-Rank Adapters for Autonomous Driving Streaming Perception
- Robust Long-Range Perception Against Sensor Misalignment in Autonomous Vehicles
- Bandwidth-Efficient Communication Modelling for Autonomous Vehicle Collaborative Perception
- 轨迹预测
- 车道线检测
- 3D占用预测
13.Biomedical(生物特征识别)
- On Which Data Distribution (Synthetic or Real) We Should Rely for Soft Biometric Classification软生物特征分类
- 指纹检测
- 虹膜检测
- 基于虹膜图像的死后间隔估计
12.UAV/Remote Sensing/Satellite Image(无人机/遥感/卫星图像)
- PGRID: Power Grid Reconstruction in Informal Developments Using High-Resolution Aerial Imagery
- Pix2Poly: A Sequence Prediction Method for End-to-end Polygonal Building Footprint Extraction from Remote Sensing Imagery
:star:code - Skyeyes: Ground Roaming using Aerial View Images
- Aerial Mirage: Unmasking Hallucinations in Large Vision Language Models
- OPTIMUS: Observing Persistent Transformations in Multi-Temporal Unlabeled Satellite-Data
- UAV
- 分割
- 超分辨率
- 跟踪/检测
- 航空图像合成
- 遥感变化检测
- Deep Metric Learning for Unsupervised Remote Sensing Change Detection
- DDPM-CD: Denoising Diffusion Probabilistic Models as Feature Extractors for Remote Sensing Change Detection
- A Mamba-Based Siamese Network for Remote Sensing Change Detection
- Improving Zero-Shot Object-Level Change Detection by Incorporating Visual Correspondence变化检测
- 稀有动物行为
11.Object Tracking(目标跟踪)
- BroadTrack: Broadcast Camera Tracking for Soccer
- MFTIQ: Multi-Flow Tracker with Independent Matching Quality Estimation
:star:code - Improving Accuracy and Generalization for Efficient Visual Tracking
- Vision-Based Landing Guidance through Tracking and Orientation Estimation
- TAM-VT: Transformation-Aware Multi-Scale Video Transformer for Segmentation and Tracking
- 多人跟踪
- 3D跟踪
- 点跟踪
10.Object Detection(目标检测)
- Uncertainty Aware Interest Point Detection and Description兴趣点检测与描述
- DT-LSD: Deformable Transformer-Based Line Segment Detection线段检测
- EDMB: Edge Detector with Mamba边缘检测
- MVMD: A Multi-View Approach for Enhanced Mirror Detection镜像检测
- Recurrence-based Vanishing Point Detection
- No Annotations for Object Detection in Art through Stable Diffusion
:star:code - Label Convergence: Defining an Upper Performance Bound in Object Recognition through Contradictory Annotations
- Bit-Flip Induced Latency Attacks in Object Detection
- Multispectral Object Detection Enhanced by Cross-Modal Information Complementary and Cosine Similarity Channel Resampling Modules
- DiL: An Explainable and Practical Metric for Abnormal Uncertainty in Object Detection
- Enhancing Novel Object Detection via Cooperative Foundational Models
- ERUP-YOLO: Enhancing Object Detection Robustness for Adverse Weather Condition by Unified Image-Adaptive Processing
- Mixed Patch Visible-Infrared Modality Agnostic Object Detection
:star:code
:house:project - Enhancing Embodied Object Detection with Spatial Feature Memory
- Interactive Object Detection for Tiny Objects in Large Remotely Sensed Images
- Shape-Biased Texture Agnostic Representations for Improved Textureless and Metallic Object Detection and 6D Pose Estimation
- ECF-YOLOv7-Tiny: Improving Feature Fusion and the Receptive Field for Lightweight Object Detectors
- Noise-Aware Evaluation of Object Detectors
- Attention-Based Class-Conditioned Alignment for Multi-Source Domain Adaptation of Object Detectors
- 3D OD
- VADet: Multi-frame LiDAR 3D Object Detection using Variable Aggregation
- V-MIND: Building Versatile Monocular Indoor 3D Detector with Diverse 2D Annotations
- DSTR: Dual Scenes Transformer for Cross-Modal Fusion in 3D Object Detection
- ALPI: Auto-Labeller with Proxy Injection for 3D Object Detection using 2D Labels Only
- AIC3DOD: Advancing Indoor Class-Incremental 3D Object Detection with Point Transformer Architecture and Room Layout Constraints
- Reflective Teacher: Semi-Supervised Multimodal 3D Object Detection in Bird's-Eye-View via Uncertainty Measure
- VOD
- 伪装目标检测
- 伪装目标发现
- 域泛化目标检测
- 小样本目标检测
- 零样本目标检测
- 端到端目标检测
- 开发世界目标检测
- 裂纹检测
- 雪检测
- 目标定位
- 视觉伪影检测
9.Super Resolution(超分辨率)
- Boosting Diffusion Guidance via Learning Degradation-Aware Models for Blind Super Resolution
:star:code - ENAF: A Multi-Exit Network with an Adaptive Patch Fusion for Large Image Super Resolution
- Partial Filter-Sharing: Improved Parameter-Sharing Method for Single Image Super-Resolution Networks
- Dynamic Attention-Guided Diffusion for Image Super-Resolution
- 场景文本图像超分辨率
- 视频超分辨率
8.Image/Video Retrieval(图像/视频检索)
- 跨域检索
- 图像检索
- 视频检索
- 信息检索
7.Image Captioning(图像字幕)
- Reframing Image Difference Captioning with BLIP2IDC and Synthetic Augmentation
- Image-Caption Encoding for Improving Zero-Shot Generalization
- 密集视频字幕
6.Image/video Compression(图像/视频压缩)
- All-in-One Image Compression and Restoration
- Efficient Progressive Image Compression with Variance-aware Masking
5.Image Classification(图像分类)
- Federated Source-Free Domain Adaptation for Classification: Weighted Cluster Aggregation for Unlabeled Data
- Multi-Task Learning of Classification and Generation for Set-Structured Data
- 图像分类
- Invariant Shape Representation Learning for Image Classification
- TLDR: Text Based Last-Layer Retraining for Debiasing Image Classifiers
- Pixel-Wise Shuffling with Collaborative Sparsity for Melanoma Hyperspectral Image Classification
- Data Augmentation for Image Classification using Generative AI
- CEMIL: Contextual Attention Based Efficient Weakly Supervised Approach for Histopathology Image Classification
- 视觉分类
- 小样本分类
- 多标签分类
- 开集识别
4.Image Progress(图像/视频处理)
- 去噪
- DiffPAD: Denoising Diffusion-based Adversarial Patch Decontamination
- Focusing on What to Decode and What to Train: SOV Decoding with Specific Target Guided DeNoising and Vision Language Advisor
- Inverting the Generation Process of Denoising Diffusion Implicit Models: Empirical Evaluation and a Novel Method
- VISIONARY: Novel Spatial-Spectral Attention Mechanism for Hyperspectral Image Denoising
- Unsupervised Denoising for Signal-Dependent and Row-Correlated Imaging Noise
- J-Invariant Volume Shuffle for Self-Supervised Cryo-Electron Tomogram Denoising on Single Noisy Volume
- SwinIA: Self-Supervised Blind-Spot Image Denoising without Convolutions
- Hausdorff Distance Matching with Adaptive Query Denoising for Rotated Detection Transformer
- Design Principles of Multi-Scale J-Invariant Networks for Self-Supervised Image Denoising
- 去模糊
- 去阴影
- 图像恢复
- Dropout the High-rate Downsampling: A Novel Design Paradigm for UHD Image Restoration
- Swin-: Gradient-Based Image Restoration from Image Sequences using Video Swin-Transformers
- Bayesian Optimal Latent Projection for Noisy Image Restoration
- Denoising Diffusion Models for High-Resolution Microscopy Image Restoration
- 水下图像恢复
- 图像修复
- SEM-Net: Efficient Pixel Modelling for image inpainting with Spatially Enhanced SSM
:star:code - I Dream My Painting: Connecting MLLMs and Diffusion Models via Prompt Generation for Text-Guided Multi-Mask Inpainting
:star:code - A New Benchmark and Baseline for Real-Time High-Resolution Image Inpainting on Edge Devices
- Improving Detail in Pluralistic Image Inpainting with Feature Dequantization
- SEM-Net: Efficient Pixel Modelling for image inpainting with Spatially Enhanced SSM
- 图像增强
- 图像质量评估
- 视频恢复
- 视频修复
- 视频增强
- 视频去模糊
- 图像重建
- MS-Glance: Non-semantic context vectors and the applications in supervising image reconstruction
:star:code - MS-Glance: Bio-Inspired Non-Semantic Context Vectors and their Applications in Supervising Image Reconstruction
- Spk2ImgMamba: Spiking Camera Image Reconstruction with Multi-Scale State Space Models
- Self-Supervised Learning with Spectral Low-Rank Prior for Hyperspectral Image Reconstruction
- MS-Glance: Non-semantic context vectors and the applications in supervising image reconstruction
- 着色
- Image Dewarping
3.Image Segmentation(图像分割)
- Aggregated Attributions for Explanatory Analysis of 3D Segmentation Models3D分割
- GaussianBeV : 3D Gaussian Representation Meets Perception Models for BeV SegmentationBeV分割
- HSDA: High-Frequency Shuffle Data Augmentation for Bird's-Eye-View Map Segmentation鸟瞰图分割
- EfficientCrackNet: A Lightweight Model for Crack Segmentation裂纹分割
- HSDA: High-frequency Shuffle Data Augmentation for Bird's-Eye-View Map Segmentation
:star:code - CAMS: Convolution and Attention-Free Mamba-Based Cardiac Image Segmentation
- Image-Level Regression for Uncertainty-Aware Retinal Image Segmentation
- Active Learning for Image Segmentation with Binary User Feedback
- Task Configuration Impacts Annotation Quality and Model Training Performance in Crowdsourced Image Segmentation
- 指代图像分割
- 部分分割
- 全景分割
- 实例分割
- 语义分割
- COSNet: A Novel Semantic Segmentation Network using Enhanced Boundaries in Cluttered Scenes
- A Conflict-Guided Evidential Multimodal Fusion for Semantic Segmentation
- Modality-Incremental Learning with Disjoint Relevance Mapping Networks for Image-based Semantic Segmentation
- Epipolar Attention Field Transformers for Bird's Eye View Semantic Segmentation
- Active Learning with Context Sampling and One-vs-Rest Entropy for Semantic Segmentation
- CCASeg: Decoding Multi-Scale Context with Convolutional Cross-Attention for Semantic Segmentation
- Multi-Scale Grouped Prototypes for Interpretable Semantic Segmentation
- DASC-SPT: Towards Self-Supervised Panoramic Semantic Segmentation
- U-MixFormer: UNet-Like Transformer with Mix-Attention for Efficient Semantic Segmentation
- Class-Agnostic Visio-Temporal Scene Sketch Semantic Segmentation
- Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation
- 3D语义分割
- 域适应语义分割
- 弱监督语义分割
- 半监督语义分割
- 小样本语义分割
- 开放词汇语义分割
- 抠图
- VSS
- VPS
- VIS
- VOS
- 伪装分割
2.Face(人脸)
- Continual Learning of Personalized Generative Face Models with Experience Replay
:star:code - Remote Blood Pressure Estimation from Facial Videos using Transfer Learning: Leveraging PPG to rPPG Conversion
- ComFace: Facial Representation Learning with Synthetic Data for Comparing Faces
- BeautyBank: Encoding Facial Makeup in Latent Space
- VerA: Versatile Anonymization Applicable to Clinical Facial Photographs
- LogicNet: A Logical Consistency Embedded Face Attribute Learning Network
- 情绪识别
- 三维人脸
- 人脸恢复
- 人脸重现
- 人脸识别
- PETALface: Parameter Efficient Transfer Learning for Low-resolution Face Recognition
:house:project - A Rapid Test for Accuracy and Bias of Face Recognition Technology
- FALCON: Fair Face Recognition via Local Optimal Feature Normalization
- CLFace: A Scalable and Resource-Efficient Continual Learning Framework for Lifelong Face Recognition
- Beyond Spatial Explanations: Explainable Face Recognition in the Frequency Domain
- Effective Backdoor Learning on Open-Set Face Recognition Systems
- PETALface: Parameter Efficient Transfer Learning for Low-resolution Face Recognition
- 人脸验证
- 人脸重建
- 人脸交换
- 人脸生成
- 人脸匿名化
- 人脸活体检测
- 说话人脸生成
- Joint Co-Speech Gesture and Expressive Talking Face Generation using Diffusion with Adapters
- DisFlowEm : One-Shot Emotional Talking Head Generation using Disentangled Pose and Expression Flow-Guidance
- SyncDiff: Diffusion-Based Talking Head Synthesis with Bottlenecked Temporal Visual Prior for Improved Synchronization
- EmoVOCA: Speech-Driven Emotional 3D Talking Heads
- Talking Head Anime 4: Distillation for Real-Time Performance
- 文本-说话人脸生成
- 人脸表情识别
- 人脸关键点检测
1.Othere(其它)
- Dense Depth from Event Focal Stack
- MAGMA: Manifold Regularization for MAEs
:star:code - Advancing Weight and Channel Sparsification with Enhanced Saliency
- Secrets of Edge-Informed Contrast Maximization for Event-Based Vision
- Metric Compatible Training for Online Backfilling in Large-Scale Retrieval
- PULSE: Physiological Understanding with Liquid Signal Extraction
- CryoMAE: Few-Shot Cryo-EM Particle Picking with Masked Autoencoders
- Adaptive and Temporally Consistent Gaussian Surfels for Multi-View Dynamic Reconstruction
- Enriching Local Patterns with Multi-Token Attention for Broad-Sight Neural Networks
- PALO: A Polyglot Large Multimodal Model for 5B People
- Assessing Visually-Continuous Corruption Robustness of Neural Networks Relative to Human Performance
- Cross-Domain and Cross-Dimension Learning for Image-to-Graph Transformers
- SpiralMLP: A Lightweight Vision MLP Architecture
- Mind the Map! Accounting for Existing Maps When Estimating Online HDMaps from Sensors
- Zero-Shot Class Unlearning in CLIP with Synthetic Samples
- Revisiting Deep Archetypal Analysis for Phenotype Discovery in High Content Imaging
- A 0-Shot Self-Attention Mechanism for Accelerated Diagonal Attention
- Long-Term Ad Memorability: Understanding & Generating Memorable Ads
- Crossroads of Continents: Automated Artifact Extraction for Cultural Adaptation with Large Multimodal Models
- A Two-Head Loss Function for Deep Average-K Classification
- Neural Graph Map: Dense Mapping with Efficient Loop Closure Integration
- Learning to Count from Pseudo-Labeled Segmentation
- Recognizing Unseen States of Unknown Objects by Leveraging Knowledge Graphs
- ReC-TTT: Contrastive Feature Reconstruction for Test-Time Training
- Cross-Aligned Fusion for Multimodal Understanding
- GStex: Per-Primitive Texturing of 2D Gaussian Splatting for Decoupled Appearance and Geometry Modeling
- VHS: High-Resolution Iterative Stereo Matching with Visual Hull Priors
- MemFusionMap: Working Memory Fusion for Online Vectorized HD Map Construction
- Learning the Power of "No": Foundation Models with Negations
- Disentangle Source and Target Knowledge for Continual Test-Time Adaptation
- LiLMaps: Learnable Implicit Language Maps
- SpaGBOL: Spatial-Graph-Based Orientated Localisation
- SegBuilder: A Semi-Automatic Annotation Tool for Segmentation标注工具
- DTA: Dual Temporal-Channel-Wise Attention for Spiking Neural Networks
- LumiGauss: Relightable Gaussian Splatting in the Wild
- Improving Shift Invariance in Convolutional Neural Networks with Translation Invariant Polyphase Sampling
- Multi-Spectral Image Color Reproduction图像颜色再现
- Improving Uncertainty Estimation with Confidence-Aware Training Data
- Decomposed Distribution Matching in Dataset Condensation
- Deduce and Select Evidences with Language Models for Training-Free Video Goal Inference
- SAND: Enhancing Open-Set Neuron Descriptions through Spatial Awareness
- Harmonizing Attention: Training-Free Texture-Aware Geometry Transfer
- SmartKC++: Improving Performance of Smartphone-Based Corneal Topographers
- Partial Texture VAE: Color and Texture Encoder for Rock Particle Images
- Feature Space Perturbation: A Panacea to Enhanced Transferability Estimation
- Token Turing Machines are Efficient Vision Models
- MetaVIn: Meteorological and Visual Integration for Atmospheric Turbulence Strength Estimation大气湍流强度估算
- On Neural BRDFs: A Thorough Comparison of State-of-the-Art Approaches
- Learning Keypoints for Multi-Agent Behavior Analysis using Self-Supervision
- DiTAS: Quantizing Diffusion Transformers via Enhanced Activation Smoothing
- SUM: Saliency Unification through Mamba for Visual Attention Modeling
- Shift Equivariant Pose Network
- SV-data2vec: Guiding Video Representation Learning with Latent Skeleton Targets
- An Investigation on LLMs' Visual Understanding Ability using SVG for Image-Text Bridging
- RiemStega: Covariance-Based Loss for Print-Proof Transmission of Data in Images
- Rethinking Low-Rank Adaptation in Vision: Exploring Head-Level Responsiveness Across Diverse Tasks
- VisualFusion: Enhancing Blog Content with Advanced Infographic Pipeline
- From Visual Explanations to Counterfactual Explanations with Latent Diffusion
- Latency Robust Cooperative Perception using Asynchronous Feature Fusion
- Social EgoMesh Estimation
- MENTOR: Human Perception-Guided Pretraining for Increased Generalization
- A Conic Transformation Approach for Solving the Perspective-Three-Point Problem
- OT-VP: Optimal Transport-Guided Visual Prompting for Test-Time Adaptation
- Per-Pixel Solution of Multispectral Photometric Stereo
- PositiveCoOp: Rethinking Prompting Strategies for Multi-Label Recognition with Partial Annotations
- NAT: Learning to Attack Neurons for Enhanced Adversarial Transferability
- VideoGameBunny: Towards Vision Assistants for Video Games
- Diffusion-Based Generative Regularization for Supervised Discriminative Learning
- CLIPArTT: Adaptation of CLIP to New Domains at Test Time
- BIV-Priv-Seg: Locating Private Content in Images Taken by People with Visual Impairments在视障人士拍摄的图像中定位私人内容
- ROSA: Reconstructing Object Shape and Appearance Textures by Adaptive Detail Transfer重建物体形状和外观纹理
- LatteCLIP: Unsupervised CLIP Fine-Tuning via LMM-Synthetic Texts
- Diffusion-Based Particle-DETR for BEV Perception
- DisCo: Discovering Common Affordance from Large Models for Actionable Part Perception
- WARLearn: Weather-Adaptive Representation Learning
- Cross-Modal Feature Alignment and MMD Improve Robustness of Prompt Tuning
- ReinDiffuse: Crafting Physically Plausible Motions with Reinforced Diffusion Model用强化扩散模型制作物理上合理的运动
- Learning Deep Illumination-Robust Features from Multispectral Filter Array Images
- Label Augmented Dataset Distillation
- Towards Privacy-Preserving Split Learning for ControlNet
- Evaluating Sensitivity Consistency of Explanations
- Stable Autofocus with Focal Consistency Loss
- Semantically Conditioned Prompts for Visual Recognition under Missing Modality Scenarios
- RD-DPP: Rate-Distortion Theory Meets Determinantal Point Process to Diversify Learning Data Samples
- Towards Robust Training via Gradient-Diversified Backpropagation
- Benchmarking VLMs' Reasoning About Persuasive Atypical Images
- Non-Cross Diffusion for Semantic Consistency
- Optimizing Neural Network Effectiveness via Non-Monotonicity Refinement
- FlashMix: Fast Map-Free LiDAR Localization via Feature Mixing and Contrastive-Constrained Accelerated Training
- PrevPredMap: Exploring Temporal Modeling with Previous Predictions for Online Vectorized HD Map Construction
- Towards Utilising a Range of Neural Activations for Comprehending Representational Associations
- GeneralizeFormer: Layer-Adaptive Model Generation across Test-Time Distribution Shifts
- Re-Evaluating Group Robustness via Adaptive Class-Specific Scaling
- Sun Off Lights On: Photorealistic Monocular Nighttime Simulation for Robust Semantic Perception夜间模拟
- Improving Deep Detector Robustness via Detection-Related Discriminant Maximization and Reorganization
- Multi-Class Textual-Inversion Secretly Yields a Semantic-Agnostic Classifier
:star:code - PACA: Perspective-Aware Cross-Attention Representation for Zero-Shot Scene Rearrangement
- Active Event Alignment for Monocular Distance Estimation
- Multi-Level Feature Distillation of Joint Teachers Trained on Distinct Image Datasets
:star:code - Self-Relaxed Joint Training: Sample Selection for Severity Estimation with Ordinal Noisy Labels
:star:code - Uncertainty-Aware Online Extrinsic Calibration: A Conformal Prediction Approach
- CorrFill: Enhancing Faithfulness in Reference-based Inpainting with Correspondence Guidance in Diffusion Models
:star:code - CM3T: Framework for Efficient Multimodal Learning for Inhomogeneous Interaction Datasets
- RapidNet: Multi-Level Dilated Convolution Based Mobile Backbone
- Dataset Augmentation by Mixing Visual Concepts
- ACE: Anatomically Consistent Embeddings in Composition and Decomposition
- Federated Voxel Scene Graph for Intracranial Hemorrhage
- Hyperdimensional Representation for Adaptive Information Association and Memorization
- ARD-VAE: A Statistical Formulation to Find the Relevant Latent Dimensions of Variational Autoencoders
- Cap2Aug: Caption Guided Image Data Augmentation
- DeepMIM: Deep Supervision for Masked Image Modeling
- A Simple-but-Effective Baseline for Training-Free Class-Agnostic Counting
- CLASS: Conditional Latent Architecture for Search and Synthesis of Design Layouts
- Semiotic-Based Construction of a Large Emotional Image Dataset with Neutral Samples
- MatSpectNet: Material Segmentation Network with Domain-Aware and Physically-Constrained Hyperspectral Reconstruction
- EI-Nexus: Towards Unmediated and Flexible Inter-Modality Local Feature Extraction and Matching for Event-Image Data
:star:code - SEMU-Net: A Segmentation-based Corrector for Fabrication Process Variations of Nanophotonics with Microscopic Images
- Situational Scene Graph for Structured Human-centric Situation Understanding
- TPP-Gaze: Modelling Gaze Dynamics in Space and Time with Neural Temporal Point Processes
:star:code - Debiasify: Self-Distillation for Unsupervised Bias Mitigation
- TaxaBind: A Unified Embedding Space for Ecological Applications
:star:code - Towards High-fidelity Head Blending with Chroma Keying for Industrial Applications
:star:code - Through the Curved Cover: Synthesizing Cover Aberrated Scenes with Refractive Field
- CLIP-Fusion: A Spatio-Temporal Quality Metric for Frame Interpolation
- Learning Instance-Specific Parameters of Black-Box Models using Differentiable Surrogates
- Seeing Eye to AI: Comparing Human Gaze and Model Attention in Video Memorability
- Temporally Grounding Instructional Diagrams in Unconstrained Videos
- A Data Perspective on Enhanced Identity Preservation for Diffusion Personalization
- Detective Networks: Enhancing Disaster Recognition in Images Through Attention Shifting using Optimal Masking
- Separating Direct and Global Components from Novel Viewpoints
- Conceptual Learning via Embedding Approximations for Reinforcing Interpretability and Transparency
- HandCraft: Anatomically Correct Restoration of Malformed Hands in Diffusion Generated Images
- WAFFLE: Multimodal Floorplan Understanding in the Wild
:house:project - Distillation of Diffusion Features for Semantic Correspondence
:star:code - Divergent Domains, Convergent Grading: Enhancing Generalization in Diabetic Retinopathy Grading
:star:code - HeightMapNet: Explicit Height Modeling for End-to-End HD Map Learning
:star:code - STLight: a Fully Convolutional Approach for Efficient Predictive Learning by Spatio-Temporal joint Processing
- Design-o-meter: Towards Evaluating and Refining Graphic Designs
:star:code - Ordinal Multiple-instance Learning for Ulcerative Colitis Severity Estimation with Selective Aggregated Transformer
:star:code - TreeFormer: Single-view Plant Skeleton Estimation via Tree-constrained Graph Generation
:star:code - I Spy With My Little Eye: A Minimum Cost Multicut Investigation of Dataset Frames
:star:code - Multi-view Image Diffusion via Coordinate Noise and Fourier Attention
- LLS: Local Learning Rule for Deep Neural Networks Inspired by Neural Activity Synchronization
- SHIP: Structural Hierarchies for Instance-Dependent Partial Labels
本文介绍了一个模块化组件,旨在无缝集成到深度学习架构中,特别是在标签层次结构存在的情况下。SHIP增强了基于实例的部分标签学习(PLL),并在各种算法中提高了2.6%的准确率! - Generating visual explanations from deep networks using implicit neural representations
- Feature Augmentation Based Test-Time Adaptation
- Agtech Framework for Cranberry-Ripening Analysis using Vision Foundation Models
- Pre-Capture Privacy via Adaptive Single-Pixel Imaging
- Enhancing Predictive Imaging Biomarker Discovery through Treatment Effect Analysis
- Precise Integral in NeRFs: Overcoming the Approximation Errors of Numerical Quadrature
- Differential Privacy Mechanisms in Neural Tangent Kernel Regression
- Neural SDF for Shadow-Aware Unsupervised Structured Light
- Rubric-Constrained Figure Skating Scoring
- Polarization as Texture: Microscale 3D Shape from Polarized Light Focus
- An Encoder-Agnostic Weakly Supervised Method for Describing Textures
- Deciphering the Complaint Aspects: Towards an Aspect-Based Complaint Identification Model with Video Complaint Dataset in Finance
- DarSwin-Unet: Distortion Aware Architecture
- Make VLM Recognize Visual Hallucination on Cartoon Character Image with Pose Information
2020 年论文分类汇总戳这里
↘️CVPR-2020-Papers ↘️ECCV-2020-Papers
2021 年论文分类汇总戳这里
↘️ICCV-2021-Papers ↘️CVPR-2021-Papers
2022 年论文分类汇总戳这里
↘️CVPR-2022-Papers ↘️WACV-2022-Papers ↘️ECCV-2022-Papers
2023 年论文分类汇总戳这里
↘️CVPR-2023-Papers ↘️WACV-2023-Papers ↘️ICCV-2023-Papers ↘️2023-CV-Surveys
扫码CV君微信(注明:CVPR)入微信交流群:
