ZIM: Zero-Shot Image Matting for Anything

August 28, 2025 ยท View on GitHub

Beomyoung Kim, Chanyong Shin, Joonhyun Jeong, Hyungsik Jung, Se-Yun Lee, Sewhan Chun, Dong-Hyun Hwang, Joonsang Yu

NAVER Cloud, ImageVision

Paper Page ๐Ÿค— demo ๐Ÿค— Dataset ๐Ÿค— Models ๐Ÿค— Collection

Teaser Teaser

Introduction

The recent segmentation foundation model, Segment Anything Model (SAM), exhibits strong zero-shot segmentation capabilities, but it falls short in generating fine-grained precise masks. To address this limitation, we propose a novel zero-shot image matting model, called ZIM, with two key contributions: First, we develop a label converter that transforms segmentation labels into detailed matte labels, constructing the new SA1B-Matte dataset without costly manual annotations. Training SAM with this dataset enables it to generate precise matte masks while maintaining its zero-shot capability. Second, we design the zero-shot matting model equipped with a hierarchical pixel decoder to enhance mask representation, along with a prompt-aware masked attention mechanism to improve performance by enabling the model to focus on regions specified by visual prompts. We evaluate ZIM using the newly introduced MicroMat-3K test set, which contains high-quality micro-level matte labels. Experimental results show that ZIM outperforms existing methods in fine-grained mask generation and zero-shot generalization. Furthermore, we demonstrate the versatility of ZIM in various downstream tasks requiring precise masks, such as image inpainting and 3D NeRF. Our contributions provide a robust foundation for advancing zero-shot matting and its downstream applications across a wide range of computer vision tasks.

Model overview

Updates

  • 2025.07.24: ZIM has been accepted to ICCV 2025 as a Highlight Paper!
  • 2024.11.04: official ZIM code update

Installation

Install the required packages with the command below:

pip install zim_anything

or

git clone https://github.com/naver-ai/ZIM.git
cd ZIM; pip install -e .

To enable GPU acceleration, please install the compatible onnxruntime-gpu package based on your environment settings (CUDA and CuDNN versions), following the instructions in the onnxruntime installation docs.

Demo

Hugging Face We provide a Gradio demo code in demo/gradio_demo.py. You can run our model demo locally by running:

python demo/gradio_demo.py

Hugging Face In addition, we provide a Gradio demo code demo/gradio_demo_comparison.py to qualitatively compare ZIM with SAM:

python demo/gradio_demo.py

Getting Started

After the installation step is done, you can utilize our model in just a few lines as below. ZimPredictor is compatible with SamPredictor, such as set_image() or predict().

from zim_anything import zim_model_registry, ZimPredictor

backbone = "vit_l"
ckpt_p = "results/zim_vit_l_2092"

model = zim_model_registry[backbone](checkpoint=ckpt_p)
if torch.cuda.is_available():
    model.cuda()

predictor = ZimPredictor(model)
predictor.set_image(<image>)
masks, _, _ = predictor.predict(<input_prompts>)

We also provide code for generating masks for an entire image and visualization:

from zim_anything import zim_model_registry, ZimAutomaticMaskGenerator
from zim_anything.utils import show_mat_anns

backbone = "vit_l"
ckpt_p = "results/zim_vit_l_2092"

model = zim_model_registry[backbone](checkpoint=ckpt_p)
if torch.cuda.is_available():
    model.cuda()

mask_generator = ZimAutomaticMaskGenerator(model)
masks = mask_generator.generate(<image>)  # Automatically generated masks
masks_vis = show_mat_anns(<image>, masks)  # Visualize masks

Additionally, masks can be generated for images from the command line:

bash script/run_amg.sh

We provide Pretrained-weights of ZIM.

MODEL ZOOLink
zim_vit_bdownload
zim_vit_ldownload

Dataset Preparation

1) MicroMat-3K Dataset

MicroMat-3K We introduce a new test set named MicroMat-3K, to evaluate zero-shot interactive matting models. It consists of 3,000 high-resolution images paired with micro-level matte labels, providing a comprehensive benchmark for testing various matting models under different levels of detail.

Downloading MicroMat-3K dataset is available here or huggingface

1-1) Dataset structure

Dataset structure should be as follows:

โ””โ”€โ”€ /path/to/dataset/MicroMat3K
    โ”œโ”€โ”€ img
    โ”‚   โ”œโ”€โ”€ 0001.png
    โ”œโ”€โ”€ matte
    โ”‚   โ”œโ”€โ”€ coarse
    โ”‚   โ”‚   โ”œโ”€โ”€ 0001.png
    โ”‚   โ””โ”€โ”€ fine
    โ”‚       โ”œโ”€โ”€ 0001.png
    โ”œโ”€โ”€ prompt
    โ”‚   โ”œโ”€โ”€ coarse
    โ”‚   โ”‚   โ”œโ”€โ”€ 0001.png
    โ”‚   โ””โ”€โ”€ fine
    โ”‚       โ”œโ”€โ”€ 0001.png
    โ””โ”€โ”€ seg
        โ”œโ”€โ”€ coarse
        โ”‚   โ”œโ”€โ”€ 0001_01.json
        โ””โ”€โ”€ fine
            โ”œโ”€โ”€ 0001_01.json

1-2) Prompt file configuration

Prompt file configuration should be as follows:

{
    "point": [[x1, y1, 1], [x2, y2, 0], ...],   # 1: Positive, 0: Negative prompt
    "bbox": [x1, y1, x2, y2]                    # [X, Y, X, Y] format
}

Evaluation

We provide an evaluation script, which includes a comparison with SAM, in script/run_eval.sh. Make sure the dataset structure is prepared.

First, modify data_root in script/run_eval.sh

...
data_root="/path/to/dataset/"
...

Then, run evaluation script file.

bash script/run_eval.sh

The evaluation result on the MicroMat-3K dataset would be as follows:

Table

How To Cite

@article{kim2024zim,
  title={ZIM: Zero-Shot Image Matting for Anything},
  author={Kim, Beomyoung and Shin, Chanyong and Jeong, Joonhyun and Jung, Hyungsik and Lee, Se-Yun and Chun, Sewhan and Hwang, Dong-Hyun and Yu, Joonsang},
  journal={arXiv preprint arXiv:2411.00626},
  year={2024}
}

License

ZIM
Copyright (c) 2024-present NAVER Cloud Corp.
CC BY-NC 4.0 (https://creativecommons.org/licenses/by-nc/4.0/)