MAGB[](https://doi.org/10.5281/zenodo.15551657)

May 30, 2025 ยท View on GitHub

MAGB: A Comprehensive Benchmark for Multimodal Attributed Graphs

In many real-world scenarios, graph nodes are associated with multimodal attributes, such as texts and images, resulting in Multimodal Attributed Graphs (MAGs).

MAGB first provide 5 dataset from E-Commerce and Social Networks. And we evaluate two major paradigms: GNN-as Predictor and VLM-as-Predictor . The datasets are publicly available:

๐Ÿค— Hugging Faceย ย  | ย ย ๐Ÿ“‘ Paperย ย 

๐Ÿ“– Table of Contents


๐Ÿ“– Introduction

Multimodal attributed graphs (MAGs) incorporate multiple data types (e.g., text, images, numerical features) into graph structures, enabling more powerful learning and inference capabilities.
This benchmark provides:
โœ… Standardized datasets with multimodal attributes.
โœ… Feature extraction pipelines for different modalities.
โœ… Evaluation metrics to compare different models.
โœ… Baselines and benchmarks to accelerate research.


๐Ÿ’ป Installation

Ensure you have the required dependencies installed before running the benchmark.

# Clone the repository
git clone https://github.com/sktsherlock/MAGB.git
cd MAGB

# Install dependencies
pip install -r requirements.txt

๐Ÿš€ Usage

1. Download the datasets from MAGB. ๐Ÿ‘

cd Data/
sudo apt-get update && sudo apt-get install git-lfs && git clone https://huggingface.co/datasets/Sherirto/MAGB .
ls

Now, you can see the Movies, Toys, Grocery, Reddit-S and Reddit-M under the ''Data'' folder.

Each dataset consists of several parts shown in the image below, including:

  • Graph Data (*.pt): Stores the graph structure, including adjacency information and node labels. It can be loaded using DGL.
  • Node Textual Metadata (*.csv): Contains node textual descriptions, neighborhood relationships, and category labels.
  • Text, Image, and Multimodal Features (TextFeature/, ImageFeature/, MMFeature/): Pre-extracted embeddings from the MAGB paper for different modalities.
  • Raw Images (*.tar.gz): A compressed folder containing images named by node IDs. It needs to be extracted before use.

Because of the Reddit-M dataset is too large, you may need to follow the below scripts to unzip the dataset.

cd MAGB/Data/
cat RedditMImages_parta RedditMImages_partb RedditMImages_partc > RedditMImages.tar.gz
tar -xvzf RedditMImages.tar.gz

2. Experiments

In this section, we demonstrate the execution code for both GNN-as-Predictor and VLM-as-Predictor.

GNN-as-Predictor

๐Ÿงฉ Node Classification

In the GNN/Library directory, we provide the code for models evaluated in the paper, including GCN, GraphSAGE, GAT, RevGAT,and MLP. Additionally, we have added graph learning models such as APPNP, SGC, Node2Vec, and DeepWalk for your use. Below, we show the code for node classification using GCN on the Movies dataset in two scenarios: 3-shot learning and supervised learning.

python GNN/Library/GCN.py --graph_path Data/Movies/MoviesGraph.pt --feature Data/Movies/TextFeature/ Movies_roberta_base_512_mean.npy --fewshots 3
python GNN/Library/GCN.py --graph_path Data/Movies/MoviesGraph.pt --feature Data/Movies/TextFeature/ Movies_roberta_base_512_mean.npy --train_ratio 0.6 --val_ratio 0.2

Note: The file Movies_roberta_base_512_mean.npy contains the textual features of the Movies dataset extracted using the RoBERTa-Base model. 512 indicates the maximum text length used, and mean indicates that mean pooling was applied to extract the features. You can use the features we provide or extract your own.

Similarly, you can replace GCN.py with the corresponding code for other models, such as GraphSAGE.py, GAT.py, etc. For all node classification training code, it is necessary to pass the graph data path and the corresponding feature file. Other basic parameters can be found in the GNN/Utils/model_config.py file.

Below are the key parameters related to model training, along with their default values and descriptions:

ParameterTypeDefault ValueDescription
--n-runsint3Number of runs for averaging results.
--lrfloat0.005Learning rate for model optimization.
--n-epochsint1000Total number of training epochs.
--n-layersint3Number of layers in the model.
--n-hiddenint256Number of hidden units per layer.
--dropoutfloat0.5Dropout rate to prevent overfitting.
--label-smoothingfloat0.1Smoothing factor for label smoothing to reduce overfitting.
--train_ratiofloat0.6Proportion of the dataset used for training.
--val_ratiofloat0.2Proportion of the dataset used for validation.
--fewshotsintNoneNumber of samples for few-shot learning.
--metricstr'accuracy'Evaluation metric (e.g., accuracy, precision, recall, f1).
--averagestr'macro'Averaging method (e.g., weighted, micro, macro).
--graph_pathstrNonePath to the graph dataset file (e.g., .pt file).
--featurestrNoneSpecifies the unimodal feature embedding to use as input.
--undirectedboolTrueWhether to treat the graph as undirected.
--selfloopboolTrueWhether to add self-loops to the graph.

Note: Some models may have their own unique parameters, such as 'edge-drop' for RevGAT and GAT. For these parameters, please refer to the respective code for details.

In the GNN/LinkPrediction directory, we provide the code for link prediction experiments using three backbone models: GCN, GraphSAGE, and MLP. Below, we demonstrate the code for running link prediction using GCN on the Movies dataset. The parameters for GraphSAGE and MLP are similar, and you can replace GCN.py with SAGE.py or MLP.py to run experiments with those models.

python GNN/LinkPrediction/GCN.py \
  --n-hidden 256 \
  --n-layers 3 \
  --n-runs 5 \
  --lr 0.001 \
  --neg_len 5000 \
  --dropout 0.2 \
  --batch_size 2048 \
  --graph_path Data/Movies/MoviesGraph.pt \
  --feature Data/Movies/TextFeature/Movies_Llama_3.2_1B_Instruct_512_mean.npy \
  --link_path Data/LinkPrediction/Movies/

Below are the unique parameters specifically used for link prediction tasks:

ParameterTypeDefault ValueDescription
--neg_lenint5000Number of negative samples used for training.
--batch_sizeint2048Batch size for training.
--link_pathstrNonePath to the directory containing link prediction data (e.g., positive and negative edges).

These parameters are critical for handling the unique requirements of link prediction tasks, such as generating and managing negative samples, processing large datasets efficiently, and specifying the location of link prediction data.

VLM-as-Predictor

The MLLM/Zero-shot.py script is designed for zero-shot node classification tasks using multimodal large language models (MLLMs). Below are the key command-line arguments for this script:

ParameterTypeDefault ValueDescription
--model_namestr'meta-llama/Llama-3.2-11B-Vision-Instruct'HuggingFace model name or path.
--dataset_namestr'Movies'Name of the dataset (corresponds to a subdirectory in the Data folder).
--base_dirstrProject root directoryPath to the root directory of the project.
--max_new_tokensint15Maximum number of tokens to generate.
--neighbor_modestr'both'Mode for using neighbor information (text, image, or both).
--use_center_textstr'True'Whether to use the center node's text.
--use_center_imagestr'True'Whether to use the center node's image.
--add_CoTstr'False'Whether to add Chain of Thought (CoT) reasoning.
--num_samplesint5Number of test samples to evaluate.
--num_neighboursint0Number of neighbors to consider for each node.

Below, we present the code for performing zero-shot node classification on the Movies dataset using the LLaMA-3.2-11B Vision Instruct model with different strategies. This is provided to help researchers reproduce the experimental results presented in our paper.

  1. Center-only\text{Center-only}
python MLLM/Zero-shot.py --model_name meta-llama/Llama-3.2-11B-Vision-Instruct --num_samples 300 --max_new_tokens 30 --dataset_name Moives
  1. GRE-Tk=1\text{GRE-T}_{k=1}
python MLLM/Zero-shot.py --model_name meta-llama/Llama-3.2-11B-Vision-Instruct --num_neighbours 1 --neighbor_mode text --num_samples 300 --max_new_tokens 30 --dataset_name Moives
  1. GRE-Vk=1\text{GRE-V}_{k=1}
python MLLM/Zero-shot.py --model_name meta-llama/Llama-3.2-11B-Vision-Instruct --num_neighbours 1 --neighbor_mode image --num_samples 300 --max_new_tokens 30 --dataset_name Moives
  1. GRE-Mk=1\text{GRE-M}_{k=1}
python MLLM/Zero-shot.py --model_name meta-llama/Llama-3.2-11B-Vision-Instruct --num_neighbours 1 --neighbor_mode both --num_samples 300 --max_new_tokens 30 --dataset_name Moives

Please note that both the VLMs and GNNs used the same original test set for the node classification task. However, for efficiency during VLM testing, we randomly selected 300 samples from this original test set. We observed that the experimental results obtained on this subset did not deviate significantly from those obtained on the complete test set.

๐Ÿ”ง Customizing load_model_and_processor for Unsupported VLMs

The load_model_and_processor function in MLLM/Library.py is designed to load specific models and their corresponding processors from the Hugging Face library. If you want to use a model that is not currently supported, you can modify this function to include your custom model. Below is an example to guide you through the process.

Example: Adding Support for a Custom Model

Suppose you want to add support for a new model, custom-org/custom-model-7B, which uses the AutoModelForCausalLM class and AutoProcessor. Here's how you can modify the load_model_and_processor function:

  1. Open the MLLM/Library.py file.
  2. Locate the model_mapping dictionary inside the load_model_and_processor function.
  3. Add a new entry for your custom model.

Here is the modified code:

def load_model_and_processor(model_name: str):
    """
    Load the model and processor based on the Hugging Face model name.
    """
    model_mapping = {
        "meta-llama/Llama-3.2-11B-Vision-Instruct": {
            "model_cls": MllamaForConditionalGeneration,
            "processor_cls": AutoProcessor,
        },
        "custom-org/custom-model-7B": {  # Add your custom model here
            "model_cls": AutoModelForCausalLM,  # Replace with the correct model class
            "processor_cls": AutoProcessor,    # Replace with the correct processor class
        },
        # Other existing models...
    }

    # Other existing codes...

    return model, processor

๐Ÿค Contributing

We welcome contributions to MAGB. To contribute:

  1. Fork the repository.
  2. Create a new branch for your feature or bug fix.
  3. Submit a pull request with a detailed description of your changes.

For major changes, please open an issue first to discuss what you would like to change.

๐Ÿ“š Citation

If you use MAGB in your research, please cite our paper:

@misc{yan2025graphmeetsmultimodalbenchmarking,
      title={When Graph meets Multimodal: Benchmarking and Meditating on Multimodal Attributed Graphs Learning},
      author={Hao Yan and Chaozhuo Li and Jun Yin and Zhigang Yu and Weihao Han and Mingzheng Li and Zhengxin Zeng and Hao Sun and Senzhang Wang},
      year={2025},
      eprint={2410.09132},
      archivePrefix={arXiv},
      url={https://arxiv.org/abs/2410.09132},
}