| Improving Target Presence and Plurality Recognition for Generalized Referring Image Segmentation | AAAI 2026 | [webpage] |
| PixelRefer | PixelRefer: A Unified Framework for Spatio-Temporal Object Referring with Arbitrary Granularity | arxiv 25.10 | [code] |
| CoPatch | CoPatch: Zero-Shot Referring Image Segmentation by Leveraging Untapped Spatial Knowledge in CLIP | arxiv 25.09 | [code] |
| SaFiRe | SaFiRe: Saccade-Fixation Reiteration with Mamba for Referring Image Segmentation | NeurIPS 2025 | |
| UniPixel | UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning | NeurIPS 2025 | [code] [webpage] |
| RaAM | Region-aware Anchoring Mechanism for Efficient Referring Visual Grounding | ICCV 2025 | |
| Latent-VG | Latent Expression Generation for Referring Image Segmentation and Grounding | ICCV 2025 | |
| DeRIS | DeRIS: Decoupling Perception and Cognition for Enhanced Referring Image Segmentation through Loopback Synergy | ICCV 2025 | [code] |
| WeakMCN | WeakMCN: Multi-task Collaborative Network for Weakly Supervised Referring Expression Comprehension and Segmentation | CVPR 2025 | [code] |
| HybridGL | Hybrid Global-Local Representation with Augmented Spatial Guidance for Zero-Shot Referring Image Segmentation | CVPR 2025 | [code] |
| IteRPrimE | IteRPrimE: Zero-shot Referring Image Segmentation with Iterative Grad-CAM Refinement and Primary Word Emphasis | AAAI 2025 | [code] |
| DETRIS | Densely Connected Parameter-Efficient Tuning for Referring Image Segmentation | AAAI 2025 | [code] |
| VATEX | Vision-Aware Text Features in Referring Image Segmentation: From Object Understanding to Context Understanding | WACV 2025 | [code] [webpage] |
| Shared-RIS | A Simple Baseline with Single-encoder for Referring Image Segmentation | arxiv 24.08 | [code] |
| ASDA | Adaptive Selection based Referring Image Segmentation | ACM MM 2024 | code |
| NeMo | Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image Segmentation | ECCV 2024 | [webpage] [code] |
| ReMamber | ReMamber: Referring Image Segmentation with Mamba Twister | ECCV 2024 | [code] |
| GTMS | GTMS: A Gradient-driven Tree-guided Mask-free Referring Image Segmentation Method | ECCV 2024 | [code] |
| SAM4MLLM | SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation | ECCV 2024 | [code] |
| Pseudo-RIS | Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation | ECCV 2024 | [code] |
| SafaRi | SafaRi: Adaptive Sequence Transformer for Weakly Supervised Referring Expression Segmentation | ECCV 2024 | [webpage] |
| CM-MaskSD | CM-MaskSD: Cross-Modality Masked Self-Distillation for Referring Image Segmentation | TMM 2024 | |
| Prompt-RIS | Prompt-Driven Referring Image Segmentation with Instance Contrasting | CVPR 2024 | |
| LQMFormer | LQMFormer: Language-aware Query Mask Transformer for Referring Image Segmentation | CVPR 2024 | |
| PPT | Curriculum Point Prompting for Weakly-Supervised Referring Image Segmentation | CVPR 2024 | |
| GSVA | GSVA: Generalized Segmentation via Multimodal Large Language Models | CVPR 2024 | [code] |
| RMSIN | Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation | CVPR 2024 | [code] |
| MRES | Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentation | CVPR 2024 | [code] [webpage] |
| MagNet | Mask Grounding for Referring Image Segmentation | CVPR 2024 | [webpage] |
| LISA | LISA: Reasoning Segmentation via Large Language Model | CVPR 2024 | [code] |
| RefSegformer | Towards Robust Referring Image Segmentation | TIP 2024 | [code] |
| JMCELN | Referring Image Segmentation via Joint Mask Contextual Embedding Learning and Progressive Alignment Network | EMNLP 2023 | [code] |
| CVMN | Unsupervised Domain Adaptation for Referring Semantic Segmentation | ACM MM 2023 | [code] |
| CARIS | CARIS: Context-Aware Referring Image Segmentation | ACM MM 2023 | [code] |
| TAS | Text Augmented Spatial-aware Zero-shot Referring Image Segmentation | EMNLP 2023 | |
| BKINet | Bilateral Knowledge Interaction Network for Referring Image Segmentation | TMM 2023 | [code] |
| Group-RES | Advancing Referring Expression Segmentation Beyond Single Image | ICCV 2023 | [code] |
| Weakly Supervised Referring Image Segmentation with Intra-Chunk and Inter-Chunk Consistency | ICCV 2023 | |
| Shatter and Gather: Learning Referring Image Segmentation with Text Supervision | ICCV 2023 | |
| TRIS | Referring Image Segmentation Using Text Supervision | ICCV 2023 | [code] |
| RIS-DMMI | Beyond One-to-One: Rethinking the Referring Image Segmentation | ICCV 2023 | [code] |
| ETRIS | Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation | ICCV 2023 | [code] |
| SEEM | Segment Everything Everywhere All at Once | arXiv 23.04 | [code] |
| SLViT | SLViT: Scale-Wise Language-Guided Vision Transformer for Referring Image Segmentation | IJCAI 2023 | [code] |
| WiCo | WiCo: Win-win Cooperation of Bottom-up and Top-down Referring Image Segmentation | IJCAI 2023 | |
| M3Att | Multi-Modal Mutual Attention and Iterative Interaction for Referring Image Segmentation | TIP 2023 | |
| X-Decoder | X-Decoder: Generalized Decoding for Pixel, Image and Language | CVPR 2023 | [code] [project] |
| Partial-RES | Learning to Segment Every Referring Object Point by Point | CVPR 2023 | [code] |
| MCRES | Meta Compositional Referring Expression Segmentation | CVPR 2023 | |
| Global-Local CLIP | Zero-shot Referring Image Segmentation with Global-Local Context Features | CVPR 2023 | [code] |
| PolyFormer | PolyFormer: Referring Image Segmentation as Sequential Polygon Generation | CVPR 2023 | [code] [project] |
| GRES | GRES: Generalized Referring Expression Segmentation | CVPR 2023 | [code] [dataset] [project] |
| CGFormer | Contrastive Grouping with Transformer for Referring Image Segmentation | CVPR 2023 | [code] |
| SADLR | Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation | AAAI 2023 | |
| R-RIS | Towards Robust Referring Image Segmentation | arXiv 22.09 | [code] [project] |
| - | Learning From Box Annotations for Referring Image Segmentation | TNNLS 2022 | [code] |
| - | Instance-Specific Feature Propagation for Referring Segmentation | TMM 2022 | |
| LAVT | LAVT: Language-Aware Vision Transformer for Referring Image Segmentation | CVPR 2022 | [code] |
| CRIS | CRIS: CLIP-Driven Referring Image Segmentation | CVPR 2022 | [code] |
| ReSTR | ReSTR: Convolution-free Referring Image Segmentation Using Transformers | CVPR 2022 | [project] |
| TV-Net | Two-stage Visual Cues Enhancement Network for Referring Image Segmentation | ACM MM 2021 | [code] |
| VLT | Vision-Language Transformer and Query Generation for Referring Segmentation | ICCV 2021 | [code] |
| MDETR | MDETR - Modulated Detection for End-to-End Multi-Modal Understanding | ICCV 2021 | [code] [project] |
| CEFNet | Encoder Fusion Network with Co-Attention Embedding for Referring Image Segmentation | CVPR 2021 | [code] |
| BUSNet | Bottom-Up Shift and Reasoning for Referring Image Segmentation | CVPR 2021 | [code] |
| LTS | Locate then Segment: A Strong Pipeline for Referring Image Segmentation | CVPR 2021 | |
| CGAN | Cascade Grouped Attention Network for Referring Expression Segmentation | ACM MM 2020 | |
| LSCM | Linguistic Structure Guided Context Modeling for Referring Image Segmentation | ECCV 2020 | [code] |
| CMPC-Refseg | Referring Image Segmentation via Cross-Modal Progressive Comprehension | CVPR 2020 | [code] |
| BRINet | Bi-directional Relationship Inferring Network for Referring Image Segmentation | CVPR 2020 | [code] |
| PhraseCut | PhraseCut: Language-based Image Segmentation in the Wild | CVPR 2020 | [code] [project] |
| MCN | Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation | CVPR 2020 | [code] |
| - | Dual Convolutional LSTM Network for Referring Image Segmentation | TMM 2020 | |
| STEP | See-Through-Text Grouping for Referring Image Segmentation | ICCV 2019 | |
| lang2seg | Referring Expression Object Segmentation with Caption-Aware Consistency | BMVC 2019 | [code] |
| CMSA | Cross-Modal Self-Attention Network for Referring Image Segmentation | CVPR 2019 | [code] |
| KWA | Key-Word-Aware Network for Referring Expression Image Segmentation | ECCV 2018 | [code] |
| DMN | Dynamic Multimodal Instance Segmentation Guided by Natural Language Queries | ECCV 2018 | [code] |
| RRN | Referring Image Segmentation via Recurrent Refinement Networks | CVPR 2018 | [code] |
| MAttNet | MAttNet: Modular Attention Network for Referring Expression Comprehension | CVPR 2018 | [code] [Demo] |
| RMI | Recurrent Multimodal Interaction for Referring Image Segmentation | ICCV 2017 | [code] |
| LSTM-CNN | Segmentation from natural language expressions | ECCV 2016 | [code] [project] |