G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model

February 20, 2025 · View on GitHub

This repository contains the code and data for the paper titled "G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model".

Paper, Dataset , Models(G-LLaVA-7B, G-LLaVA-13B)

ex1

Install Packages

cd G-LLaVA
conda create -n gllava python=3.10 -y
conda activate gllava
pip install -e .

Enable Deepspeed

pip install deepspeed

Data Preparation

Download our dataset.

Place the data under playground/data. Here is the data structure:

playground/data/
├── images/
│   ├── geo3k/
│   ├── geoqa_plus/
│   ├── test/
├── alignment.json
├── qa_tuning.json
├── test_question.jsonl
├── test_answers.jsonl

"test_question.jsonl" and "test_answers.jsonl" correspond to the test set of GeoQA.

First Stage Alignment

This stage enables the model to better interpret the content of geometric figures.

bash scripts/run_alignment.sh

Second Stage Instruction Tuning

This stage equips the model with stronger ability for solving geometry problems.

bash scripts/run_qa.sh

Evaluation

Generate responses from the model.

bash scripts/eval_multi.sh /
                path-to-model /
                playground/data/test_questions.jsonl /
                path-to-output /
                path-to-image-folder /
                num_gpus /
                temperature

Run automatic evaluation to calculate the accuracy (GeoQA).

python scripts/geo_acc_calculate.py /
             --ground_truth_file playground/data/test_answers.jsonl /
             --predictions_file path-to-output-file

Here are some example scripts:

bash scripts/eval_multi.sh /path/to/checkpoint/ playground/data/test_questions.jsonl results_try/Gllava-test playground/data/images/ 8 0

python scripts/geo_acc_calculate.py  --ground_truth_file playground/data/test_answers.jsonl --predictions_file results_try/Gllava-test_merged.jsonl

Acknowledgement

The project is built on top of the amazing LLaVA repository. Thanks for their great work!

If you find our code and dataset helpful to your research, please consider citing us with this BibTeX:

@misc{gao2023gllava,
      title={G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model}, 
      author={Jiahui Gao and Renjie Pi and Jipeng Zhang and Jiacheng Ye and Wanjun Zhong and Yufei Wang and Lanqing Hong and Jianhua Han and Hang Xu and Zhenguo Li and Lingpeng Kong},
      year={2023},
      eprint={2312.11370},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}