README.md

November 15, 2025 · View on GitHub

Multigranular Evaluation for Brain Visual Decoding

Weihao Xia Cengiz Oztireli
University of Cambridge
AAAI 2026

📰 News

2025/11/15 🌟 Codes are released.
2025/11/08 🚀 BASIC is accepted to AAAI 2026 in Singapore.

🎯 Introduction

Why Existing Evaluation Protocols Fall Short

Current evaluation protocols are limited in:

Discriminative power: metrics often saturate and fail to reveal meaningful differences.
Neuroscientific grounding: they poorly capture perceptual plausibility or human-like alignment.
Visual cognition coverage: they overlook the multi-level nature of visual understanding.

We introduce BASIC (Brain-Aligned Structural, Inferential, and Contextual similarity), a discriminative, interpretable, and comprehensive framework designed to fill key gaps in brain visual decoding evaluation. BASIC combines semantic reasoning and structural matching to measure how closely decoded outputs align with reference stimuli across multiple levels.

BASIC evaluates decoding from three complementary perspectives:

Structural: measures spatial organization and category boundaries using granularity-aware correspondence.
Inferential: assesses semantic accuracy via object categories, attributes, and relations from MLLM captions.
Contextual: evaluates global plausibility via MLLM-based scene reasoning, quantifying narrative coherence.

⚙️ Usage

The visual decoding results should be structured as follows:

res_decoding
├── cc2017
│   ├── decofuse
│   ├── mind-video
│   └── neuroclips
├── eeg-neuro3d
│   └── neuro3d
├── eeg-things
│   ├── atm
│   ├── cognitioncapturer
│   └── images
├── fmri-shape
│   ├── images
│   ├── mind3d
│   └── mind3d++
├── nsd
│   ├── braindiffuser
│   ├── brainguard
│   ├── dream
│   ├── images
│   ├── mindbridge
│   ├── mindeye
│   ├── mindeye2
│   ├── mindtuner
│   ├── neuropictor
│   ├── sdrecon
│   ├── sepbrain
│   ├── sttm
│   ├── test_images
│   ├── umbrae
│   ├── unibrain
│   └── neurovla
└── seed-dv

Download LLaVA checkpoints into detailcap/llava/model_weights.

# git lfs install
ckpt = 'llava-v1.5-7b' # 'llava-v1.5-13b', 'llava-next-vicuna-7b', 'llava-next-vicuna-13b'
from huggingface_hub import snapshot_download
snapshot_download(repo_id=f"liuhaotian/{ckpt}", local_dir=ckpt)

BASIC-H: Semantic Reasoning

1. Generate Detailed Captions

# Step 1: Generate detailed captions for GT images and decoded images
models=("llava-v1.5-7b") # "llava-v1.5-13b" "llava-next-vicuna-7b", "llava-next-vicuna-13b", "llava-next-34b")

methods=(
    "nsd/images"
    "nsd/sdrecon/sub01"
    "nsd/braindiffuser/sub01"
    "nsd/mindeye/sub01"
    "nsd/umbrae/sub01"
    "nsd/dream/sub01"
    "nsd/mindbridge/sub01"
    "nsd/mindeye2/sub01"
    "nsd/neurovla/sub01"
    "nsd/sepbrain/sub01"
    "nsd/unibrain/sub01"
    "nsd/brainguard/sub01"
    "nsd/neuropictor/sub01"
)

for model in "${models[@]}"; do
    for method in "${methods[@]}"; do
  echo "Processing model: $model, data: $method"
  python detailcap/generate_detailed_image_caption.py --model_path "detailcap/llava/model_weights/${model}"  --prompt detailcap/prompts/basic.txt \
      --data_path "res_decoding/${method}" --save_path "res_decoding/captions/${model}/${method}" \
  done
done

2. Evaluate Detailed Captions

# Step 2: Evaluate the detailed captions
for model in "${models[@]}"; do
    refs="res_decoding/captions/${model}/nsd/images/fmricap.json"
    for method in "${methods[@]}"; do
    if [ "$method" == "nsd/images" ]; then
        echo "Skipping evaluation for GT data: $method"
        continue
    fi
    echo "Evaluating model: $model, data: $method"
    preds="res_decoding/captions/${model}/${method}/fmricap.json"
    python capture/eval_detailcap.py "${refs}" "${preds}" --is_strict True
    done
done

Note

Set is_strict = False when performing VINDEX evaluation.

BASIC-L: Structural Matching

0. Environment & Code

Download codes and pretrained models from IDEA-Research/Grounded-SAM-2, and follow the official instructions for environment setup.

cd basic/grounded-sam2
git clone https://github.com/IDEA-Research/Grounded-SAM-2

1. (Optional) Generate Hierarchical Class Captions

We provide processed hierarchical categories in grounded-sam2/hierarchical_classes.

# Step 1: Generate hierarchical class captions for images
model="llava-next-vicuna-13b"
for data in "nsd/images"
do
python detailcap/generate_detailed_image_caption.py --model_path "detailcap/llava/model_weights/${model}" \
    --data_path "res_decoding/${data}" --save_path "res_decoding/captions/${model}/${data}/hierarchical_classes" \
    --prompt detailcap/prompts/hierarchical_class.txt
done

2. Generate Multigranular Segmentation Masks

# Step 2: Generate multigranular segmentation masks based on the hierarchical classes
prompt_path='basic/grounded-sam2/hierarchical_classes/nsd/class_dict.json'

methods=(
    "nsd/images"
    "nsd/sdrecon/sub01"
    "nsd/braindiffuser/sub01"
    "nsd/mindeye/sub01"
    "nsd/umbrae/sub01"
    "nsd/dream/sub01"
    "nsd/mindbridge/sub01"
    "nsd/mindeye2/sub01"
    "nsd/neurovla/sub01"
    "nsd/sepbrain/sub01"
    "nsd/unibrain/sub01"
    "nsd/brainguard/sub01"
    "nsd/neuropictor/sub01"
)

tth=0.3 
bth=0.25

for data in "${methods[@]}"; do
    echo "Processing $data with box threshold $bth and text threshold $tth"
    python batch_process_grounded_sam2_hierclass.py --text-prompt-path $prompt_path \
        --img-path "res_decoding/${data}" --output-dir "res_decoding/${data}_b${bth}_t${tth}_hierarchical_llava" \
        --box-threshold $bth --text-threshold $tth --img-size 512
done

3. Evaluate Segmentation Masks

# Step 3: Evaluate the segmentation masks
methods=("sdrecon" "braindiffuser"  "mindeye" "umbrae" "dream" "mindbridge" "mindeye2" "neurovla" "sepbrain" "unibrain" "brainguard" "neuropictor")

for method in "${methods[@]}"; do
    echo "Calculating matching ratio for method: $method (bth=$bth, tth=$tth)"
    python cal_hierarchical_matching_ratio_ap.py --dataset nsd --method "$method" --thbox "$bth" --thtxt "$tth"
done

📚 Acknowledgements

The codebase is built upon the following excellent projects: LLaVA, CAPTURE and Grounded SAM.

We also highlight a series of our works on multimodal brain decoding and benchmarking:

DREAM: mirrors pathways in the human visual system for stimulus reconstruction.
UMBRAE: interprets brain activations into multimodal explanations with task-specific MLLM prompting.
VINDEX: explores the impact of different image feature spaces on fine-grained multimodal decoding.
MEVOX: introduces Multi-Expert Vision systems for Omni-contextual eXplanations.
BASIC: provides interpretable and multigranular benchmarking for brain visual decoding.

📖 Citation

@inproceedings{xia2026basic,
  title     = {Multigranular Evaluation for Brain Visual Decoding},
  author    = {Xia, Weihao and Öztireli, Cengiz},
  booktitle = {AAAI Conference on Artificial Intelligence (AAAI)},
  year      = {2026},
}