README.md
November 15, 2025 Β· View on GitHub
Multigranular Evaluation for Brain Visual Decoding
Weihao Xiaβ
Cengiz Oztireli
University of Cambridge
AAAI 2026
π° News
2025/11/15π Codes are released.2025/11/08π BASIC is accepted to AAAI 2026 in Singapore.
π― Introduction
Why Existing Evaluation Protocols Fall Short
Current evaluation protocols are limited in:
- Discriminative power: metrics often saturate and fail to reveal meaningful differences.
- Neuroscientific grounding: they poorly capture perceptual plausibility or human-like alignment.
- Visual cognition coverage: they overlook the multi-level nature of visual understanding.
Our Method: BASIC
We introduce BASIC (Brain-Aligned Structural, Inferential, and Contextual similarity), a discriminative, interpretable, and comprehensive framework designed to fill key gaps in brain visual decoding evaluation. BASIC combines semantic reasoning and structural matching to measure how closely decoded outputs align with reference stimuli across multiple levels.
BASIC evaluates decoding from three complementary perspectives:
- Structural: measures spatial organization and category boundaries using granularity-aware correspondence.
- Inferential: assesses semantic accuracy via object categories, attributes, and relations from MLLM captions.
- Contextual: evaluates global plausibility via MLLM-based scene reasoning, quantifying narrative coherence.
βοΈ Usage
The visual decoding results should be structured as follows:
res_decoding
βββ cc2017
β βββ decofuse
β βββ mind-video
β βββ neuroclips
βββ eeg-neuro3d
β βββ neuro3d
βββ eeg-things
β βββ atm
β βββ cognitioncapturer
β βββ images
βββ fmri-shape
β βββ images
β βββ mind3d
β βββ mind3d++
βββ nsd
β βββ braindiffuser
β βββ brainguard
β βββ dream
β βββ images
β βββ mindbridge
β βββ mindeye
β βββ mindeye2
β βββ mindtuner
β βββ neuropictor
β βββ sdrecon
β βββ sepbrain
β βββ sttm
β βββ test_images
β βββ umbrae
β βββ unibrain
β βββ neurovla
βββ seed-dv
Download LLaVA checkpoints into detailcap/llava/model_weights.
# git lfs install
ckpt = 'llava-v1.5-7b' # 'llava-v1.5-13b', 'llava-next-vicuna-7b', 'llava-next-vicuna-13b'
from huggingface_hub import snapshot_download
snapshot_download(repo_id=f"liuhaotian/{ckpt}", local_dir=ckpt)
BASIC-H: Semantic Reasoning
1. Generate Detailed Captions
# Step 1: Generate detailed captions for GT images and decoded images
models=("llava-v1.5-7b") # "llava-v1.5-13b" "llava-next-vicuna-7b", "llava-next-vicuna-13b", "llava-next-34b")
methods=(
"nsd/images"
"nsd/sdrecon/sub01"
"nsd/braindiffuser/sub01"
"nsd/mindeye/sub01"
"nsd/umbrae/sub01"
"nsd/dream/sub01"
"nsd/mindbridge/sub01"
"nsd/mindeye2/sub01"
"nsd/neurovla/sub01"
"nsd/sepbrain/sub01"
"nsd/unibrain/sub01"
"nsd/brainguard/sub01"
"nsd/neuropictor/sub01"
)
for model in "${models[@]}"; do
for method in "${methods[@]}"; do
echo "Processing model: $model, data: $method"
python detailcap/generate_detailed_image_caption.py --model_path "detailcap/llava/model_weights/${model}" --prompt detailcap/prompts/basic.txt \
--data_path "res_decoding/${method}" --save_path "res_decoding/captions/${model}/${method}" \
done
done
2. Evaluate Detailed Captions
# Step 2: Evaluate the detailed captions
for model in "${models[@]}"; do
refs="res_decoding/captions/${model}/nsd/images/fmricap.json"
for method in "${methods[@]}"; do
if [ "$method" == "nsd/images" ]; then
echo "Skipping evaluation for GT data: $method"
continue
fi
echo "Evaluating model: $model, data: $method"
preds="res_decoding/captions/${model}/${method}/fmricap.json"
python capture/eval_detailcap.py "${refs}" "${preds}" --is_strict True
done
done
Note
Set is_strict = False when performing VINDEX evaluation.
BASIC-L: Structural Matching
0. Environment & Code
Download codes and pretrained models from IDEA-Research/Grounded-SAM-2, and follow the official instructions for environment setup.
cd basic/grounded-sam2
git clone https://github.com/IDEA-Research/Grounded-SAM-2
1. (Optional) Generate Hierarchical Class Captions
We provide processed hierarchical categories in grounded-sam2/hierarchical_classes.
# Step 1: Generate hierarchical class captions for images
model="llava-next-vicuna-13b"
for data in "nsd/images"
do
python detailcap/generate_detailed_image_caption.py --model_path "detailcap/llava/model_weights/${model}" \
--data_path "res_decoding/${data}" --save_path "res_decoding/captions/${model}/${data}/hierarchical_classes" \
--prompt detailcap/prompts/hierarchical_class.txt
done
2. Generate Multigranular Segmentation Masks
# Step 2: Generate multigranular segmentation masks based on the hierarchical classes
prompt_path='basic/grounded-sam2/hierarchical_classes/nsd/class_dict.json'
methods=(
"nsd/images"
"nsd/sdrecon/sub01"
"nsd/braindiffuser/sub01"
"nsd/mindeye/sub01"
"nsd/umbrae/sub01"
"nsd/dream/sub01"
"nsd/mindbridge/sub01"
"nsd/mindeye2/sub01"
"nsd/neurovla/sub01"
"nsd/sepbrain/sub01"
"nsd/unibrain/sub01"
"nsd/brainguard/sub01"
"nsd/neuropictor/sub01"
)
tth=0.3
bth=0.25
for data in "${methods[@]}"; do
echo "Processing $data with box threshold $bth and text threshold $tth"
python batch_process_grounded_sam2_hierclass.py --text-prompt-path $prompt_path \
--img-path "res_decoding/${data}" --output-dir "res_decoding/${data}_b${bth}_t${tth}_hierarchical_llava" \
--box-threshold $bth --text-threshold $tth --img-size 512
done
3. Evaluate Segmentation Masks
# Step 3: Evaluate the segmentation masks
methods=("sdrecon" "braindiffuser" "mindeye" "umbrae" "dream" "mindbridge" "mindeye2" "neurovla" "sepbrain" "unibrain" "brainguard" "neuropictor")
for method in "${methods[@]}"; do
echo "Calculating matching ratio for method: $method (bth=$bth, tth=$tth)"
python cal_hierarchical_matching_ratio_ap.py --dataset nsd --method "$method" --thbox "$bth" --thtxt "$tth"
done
π Acknowledgements
The codebase is built upon the following excellent projects: LLaVA, CAPTURE and Grounded SAM.
We also highlight a series of our works on multimodal brain decoding and benchmarking:
- DREAM: mirrors pathways in the human visual system for stimulus reconstruction.
- UMBRAE: interprets brain activations into multimodal explanations with task-specific MLLM prompting.
- VINDEX: explores the impact of different image feature spaces on fine-grained multimodal decoding.
- MEVOX: introduces Multi-Expert Vision systems for Omni-contextual eXplanations.
- BASIC: provides interpretable and multigranular benchmarking for brain visual decoding.
π Citation
@inproceedings{xia2026basic,
title = {Multigranular Evaluation for Brain Visual Decoding},
author = {Xia, Weihao and Γztireli, Cengiz},
booktitle = {AAAI Conference on Artificial Intelligence (AAAI)},
year = {2026},
}