README.md

April 1, 2026 ยท View on GitHub


FGVEdit

Official code and data release for
Visual-Oriented Fine-Grained Knowledge Editing for Multimodal Large Language Models

Dataset

Table of Contents

๐Ÿ› ๏ธ About This Project

FGVEdit is a benchmark and codebase for fine-grained multimodal knowledge editing. The current release contains data processing and experiment entrypoints for four editing methods:

  • IKE
  • MEND
  • SERAC
  • MSCKE

The released code currently supports two multimodal backbones:

  • BLIP-2
  • MiniGPT-4

The default data split used by this repository is:

  • train_data.json: 8334 samples
  • test_data.json: 2778 samples

Each sample contains the edit target, rephrase query, textual locality query, fine-grained generality query, fine-grained locality query, and the associated image path.

(back to top)

๐Ÿš€ Getting Started

Download Data

The released dataset is available at ZhenZeng/FGVEdit.

You can download it with Hugging Face CLI:

huggingface-cli download ZhenZeng/FGVEdit --repo-type dataset --local-dir ./tmp/FGVEdit_download

After download, arrange the files into the layout expected by the current code:

data/
โ”œโ”€โ”€ FGVEdit/
โ”‚   โ”œโ”€โ”€ train_data.json
โ”‚   โ””โ”€โ”€ test_data.json
โ””โ”€โ”€ images/
    โ”œโ”€โ”€ train2014/
    โ””โ”€โ”€ val2014/

The dataset loader reads:

  • data/FGVEdit/train_data.json
  • data/FGVEdit/test_data.json
  • image paths under data/images/

The image field inside each JSON record is a relative path such as train2014/xxx.jpg or val2014/xxx.jpg, so the image root must be data/images.

Environment Setup

This repository currently provides a single Python dependency file: requirements.txt.

We recommend using conda:

conda create -n fgvedit python=3.10 -y
conda activate fgvedit
python -m pip install --upgrade pip
python -m pip install -r requirements.txt

Download Pre-trained Models

The current hparams/ files reference pre-trained model caches and checkpoints with the following layout:

hugging_cache/
โ”œโ”€โ”€ all-MiniLM-L6-v2/
โ”œโ”€โ”€ bert-base-uncased/
โ”œโ”€โ”€ clip-vit-large-patch14/
โ”œโ”€โ”€ distilbert-base-cased/
โ”œโ”€โ”€ opt-2.7b/
โ”œโ”€โ”€ opt-125m/
โ”œโ”€โ”€ Vicuna/
โ”‚
โ”œโ”€โ”€ blip2_pretrained_flant5xxl.pth
โ”œโ”€โ”€ blip2_pretrained_opt2.7b.pth
โ”œโ”€โ”€ eva_vit_g.pth
โ””โ”€โ”€ pretrained_minigpt4_7b.pth

Links are in the following:

all-MiniLM-L6-v2 bert-base-uncased distilbert-base-cased
opt-2.7b opt-125m vicuna-7b
blip2_pretrained_flant5xxl.pth blip2_pretrained_opt2.7b.pth prerained_minigpt4_7b.pth
eva_vit_g.pth clip-vit-large-patch14

(back to top)

๐Ÿงช Usage

All experiment entrypoints are at the repository root:

  • edit_IKE.py
  • edit_MEND.py
  • edit_SERAC.py
  • edit_MSCKE.py

All hyper-parameters are stored in hparams/.

IKE

Configs:

  • hparams/IKE/blip2.yaml
  • hparams/IKE/minigpt4.yaml

Run embedding generation only:

python edit_IKE.py --model blip2 --mode embed
python edit_IKE.py --model minigpt4 --mode embed

Run evaluation:

python edit_IKE.py --model blip2 --mode eval
python edit_IKE.py --model minigpt4 --mode eval

The script will build train-set embeddings and then evaluate on data/FGVEdit/test_data.json.

MEND

Training configs:

  • hparams/TRAINING/MEND/blip2.yaml
  • hparams/TRAINING/MEND/minigpt4.yaml

Evaluation configs:

  • hparams/MEND/blip2.yaml
  • hparams/MEND/minigpt4.yaml

Run training:

python edit_MEND.py --model blip2 --mode train
python edit_MEND.py --model minigpt4 --mode train

Run evaluation:

python edit_MEND.py --model blip2 --mode eval
python edit_MEND.py --model minigpt4 --mode eval

Important:

  • --mode eval requires a trained checkpoint.
  • Before evaluation, update the archive field in the chosen evaluation YAML so that it points to a concrete .pt checkpoint file produced during training.
  • Training outputs are saved under results_dir/models/<ALG>/.

SERAC

Training configs:

  • hparams/TRAINING/SERAC/blip2.yaml
  • hparams/TRAINING/SERAC/minigpt4.yaml

Evaluation configs:

  • hparams/SERAC/blip2.yaml
  • hparams/SERAC/minigpt4.yaml

Run training:

python edit_SERAC.py --model blip2 --mode train
python edit_SERAC.py --model minigpt4 --mode train

Run evaluation:

python edit_SERAC.py --model blip2 --mode eval
python edit_SERAC.py --model minigpt4 --mode eval

Important:

  • --mode eval requires a trained checkpoint.
  • Before evaluation, update the archive field in the chosen evaluation YAML to the actual checkpoint file path.

MSCKE

Training configs:

  • hparams/TRAINING/MSCKE/blip2.yaml
  • hparams/TRAINING/MSCKE/minigpt4.yaml

Evaluation configs:

  • hparams/MSCKE/blip2.yaml
  • hparams/MSCKE/minigpt4.yaml

Run training:

python edit_MSCKE.py --model blip2 --mode train
python edit_MSCKE.py --model minigpt4 --mode train

Run evaluation:

python edit_MSCKE.py --model blip2 --mode eval
python edit_MSCKE.py --model minigpt4 --mode eval

Important:

  • --mode eval requires a trained checkpoint.
  • Before evaluation, update the archive field in the chosen evaluation YAML to the actual checkpoint file path produced during training.

Practical Notes

  • The current scripts read device from the selected YAML file, so change that field before running on your machine.
  • If you move data or checkpoints, update the corresponding paths in the YAML file instead of relying on implicit defaults.
  • MEND, SERAC, and MSCKE evaluation configs currently contain placeholder archive paths. They must be replaced with real checkpoint filenames.

(back to top)

๐ŸŽ‰ Acknowledgments

This repository builds on the multimodal editing ecosystem around EasyEdit, and uses pretrained components or model implementations from LAVIS / BLIP-2, MiniGPT-4, Transformers, Sentence-Transformers, and CLIP.

We thank the authors and maintainers of these projects for making their code and models publicly available.

(back to top)