PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model

December 30, 2024 · View on GitHub

Zheng Zhang*, Yeyao Ma*, Enming Zhang*, Xiang Bai

^{* Equal Contribution}

Arxiv Paper

Features

A powerful extension of the Large Multi-modal Model for generic (panoptic, instance, semantic) segmentation, referring segmentation and interactivate segmentation.
Support joint training across multiple segmentation tasks and visual-language tasks.
Demonstrates zero-shot capabilities on unseen task, such as open-vocabulary segmentation, generalizaed referring segmentation, and video object segmentation.

teaser

Updates

Release evaluation code
Release training code

Installation

See Installation instructions.

Getting Started

See Preparing Datasets for PSALM.

See Getting Started with PSALM.

Model Zoo

Download PSALM here.

Citation

If you think this work is useful for your research, please use the following BibTeX entry.

@inproceedings{zhang2025psalm,
  title={Psalm: Pixelwise segmentation with large multi-modal model},
  author={Zhang, Zheng and Ma, Yeyao and Zhang, Enming and Bai, Xiang},
  booktitle={European Conference on Computer Vision},
  pages={74--91},
  year={2025},
  organization={Springer}
}

Acknowledgement

Thanks for awesome works: Mask2former, Mask2former-Simplify and LLaVA. Code is based on these works.