SAM4MLLM

March 20, 2025 · View on GitHub

This is the implementation of our ECCV'24 "SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation"

Dataset Preparation

Download each dataset from website:

You are responsible for checking if the dataset license is fit for the intended purpose.

Put all of them under data directory so you should get:

    SAM4MLLM/
    ├──dataset/
    |  ├──ADE20K/
    |  ├──PACO-LVIS/
    |  ├──Part-ImageNet/
    |  ├──RefCOCO/
    |  ├──GRES/

Installation

Checkpoint

Download each checkpoint:

Put all of them under checkpoint directory so you should get:

    SAM4MLLM/
    ├──checkpoint/
    |  ├──llama3-llava-next-8b/
    |  ├──sam4mllm/
    |  ├──sam4mllm_plus/
    |  ├──xl1.pt/
    |  ├──effvit_xl1_decoder_coco_ft.pt

Data pre-process

  • Rearrange data

In data, Run each jupyter notebook to generate dataset for training.

  • Convert the data into dialouge format:
python to_chat_format.ipynb

Traning

python sam4mllm_train.py

Inference

Run simple_infer.ipynb

Licenses

Copyright © 2024, NVIDIA Corporation. All rights reserved.

This work is made available under the NVIDIA Source Code License-NC. Click here to view a copy of this license.