G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model
February 20, 2025 · View on GitHub

G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model
This repository contains the code and data for the paper titled "G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model".
Paper, Dataset , Models(G-LLaVA-7B, G-LLaVA-13B)

Install Packages
cd G-LLaVA
conda create -n gllava python=3.10 -y
conda activate gllava
pip install -e .
Enable Deepspeed
pip install deepspeed
Data Preparation
Download our dataset.
Place the data under playground/data. Here is the data structure:
playground/data/
├── images/
│ ├── geo3k/
│ ├── geoqa_plus/
│ ├── test/
├── alignment.json
├── qa_tuning.json
├── test_question.jsonl
├── test_answers.jsonl
"test_question.jsonl" and "test_answers.jsonl" correspond to the test set of GeoQA.
First Stage Alignment
This stage enables the model to better interpret the content of geometric figures.
bash scripts/run_alignment.sh
Second Stage Instruction Tuning
This stage equips the model with stronger ability for solving geometry problems.
bash scripts/run_qa.sh
Evaluation
Generate responses from the model.
bash scripts/eval_multi.sh /
path-to-model /
playground/data/test_questions.jsonl /
path-to-output /
path-to-image-folder /
num_gpus /
temperature
Run automatic evaluation to calculate the accuracy (GeoQA).
python scripts/geo_acc_calculate.py /
--ground_truth_file playground/data/test_answers.jsonl /
--predictions_file path-to-output-file
Here are some example scripts:
bash scripts/eval_multi.sh /path/to/checkpoint/ playground/data/test_questions.jsonl results_try/Gllava-test playground/data/images/ 8 0
python scripts/geo_acc_calculate.py --ground_truth_file playground/data/test_answers.jsonl --predictions_file results_try/Gllava-test_merged.jsonl
Acknowledgement
The project is built on top of the amazing LLaVA repository. Thanks for their great work!
If you find our code and dataset helpful to your research, please consider citing us with this BibTeX:
@misc{gao2023gllava,
title={G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model},
author={Jiahui Gao and Renjie Pi and Jipeng Zhang and Jiacheng Ye and Wanjun Zhong and Yufei Wang and Lanqing Hong and Jianhua Han and Hang Xu and Zhenguo Li and Lingpeng Kong},
year={2023},
eprint={2312.11370},
archivePrefix={arXiv},
primaryClass={cs.CL}
}