README.md

April 5, 2026 · View on GitHub

FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers
ICCV 2025

Renshan Zhang¹, Rui Shao¹†, Gongwei Chen¹, Miao Zhang¹, Weili Guan¹, Kaiwen Zhou², Liqiang Nie¹†

¹Harbin Institute of Technology, Shenzhen
²Huawei Noah's Ark Lab
†Corresponding author

If you find this work useful for your research, please kindly cite our paper and star our repo.

Updates

[01/2026] :fire: The extended paper of FALCON++ is released on TechRxiv.
[12/2025] :fire: Checkpoint released. Enjoy it!
[07/2025] :fire: The code and project page are released. Enjoy it!
[06/2025] :fire: The arXiv paper is updated.
[06/2025] FALCON is accepted to ICCV 2025!
[01/2025] arXiv paper released.

This is the github repository of FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers. In this work, we propose the FALCON model, which introduces a novel visual register technique to simultaneously address the issues of visual redundancy and fragmentation in the high-resolution visual encoding of MLLMs.

Installation

Clone this repository and navigate to the folder

git clone git@github.com:iLearn-Lab/ICCV25-FALCON.git
cd falcon

Install Package

conda create -n falcon python=3.10 -y
conda activate falcon
pip install --upgrade pip
pip install -e .

Install additional packages for training cases

pip install -e ".[train]"
pip install flash-attn --no-build-isolation

Quick Start

We have developed a well-encapsulated class JiutianHDInfer specifically designed for model inference in jiutian/eval/model_infer.py.

Below is an example of how to use the JiutianHDInfer class. By calling the inference method, you can easily obtain the model's inference results.

from jiutian.eval.model_infer import JiutianHDInfer

model_infer = JiutianHDInfer(
    model_path='/path/to/ckpt',
    model_base='/path/to/base_ckpt or None',
    conv_mode='llama_3_1',
)

image_file = '/path/to/image'
question = 'question'
model_infer.inference(image_file, question)

@inproceedings{zhang2025falcon,
  title={Falcon: Resolving visual redundancy and fragmentation in high-resolution multimodal large language models via visual registers},
  author={Zhang, Renshan and Shao, Rui and Chen, Gongwei and Zhang, Miao and Zhou, Kaiwen and Guan, Weili and Nie, Liqiang},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={23530--23540},
  year={2025}
}

README.md

FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers
ICCV 2025

If you find this work useful for your research, please kindly cite our paper and star our repo.

Updates

Introduction

Installation

Quick Start

Evaluations

Training

Citation

FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual RegistersICCV 2025

FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers
ICCV 2025