README.md

November 4, 2024 Β· View on GitHub

R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models

Please give us a star ⭐ for the latest update.

https://github.com/user-attachments/assets/1c3c578b-c689-4638-9dfe-e0b67ec2650d

News

  • 2024.10.18 πŸŽ‰πŸŽ‰πŸŽ‰ We source the GeoMM dataset.
  • 2024.10.19 πŸŽ‰πŸŽ‰πŸŽ‰ We source the model weights for the R-CoT-8B, R-CoT-7B, and R-CoT-2B, as well as the evaluation code.
  • 2024.10.21 πŸŽ‰πŸŽ‰πŸŽ‰ We source the training code.
  • 2024.10.23πŸŽ‰πŸŽ‰πŸŽ‰ We release the paper R-CoT.

Dataset

You can download the training and testing data used by R-CoT from R-CoT_Data.

Examples of GeoMM:


🐳 Model Zoo

Model NameVision PartLanguage ModelTransformers (HF)MathVista(Geo)GeoQA
R-CoT-8BInternViT‑300M‑448pxinternlm2_5‑7b‑chatπŸ€—R-CoT-8B75.075.1
R-CoT-7BEVA-CLIPInternLM-Chat-7BπŸ€—R-CoT-7B62.568.2
R-CoT-2BInternViT‑300M‑448pxinternlm2-chat-1_8bπŸ€—R-CoT-2B57.762.6
R-CoT-QwenVit-BigGQwen-7BπŸ€—R-CoT-Qwen50.557.0

Environment

GPU

conda create -n rcot python=3.9 -y
conda activate rcot
pip install -r requirements.txt
pip install flash-attn==2.3.6 --no-build-isolation

NPU

pip install --upgrade deepspeed
pip install torchvision==0.16.0
pip install torch==2.1.0
pip install transformers==4.32.0
pip install torch_npu==2.1.0

Modify code to adapt to NPU

Needs to be added in a training script (e.g. finetune.py):

import torch_npu
from torch_npu.contrib import transfer_to_npu

Replace --bp16 with --fp16 in sh scripts and weight config files.

Evaluation

MathVista (geometry problem solving)

You need to download the test image MathVista_test.zip. Unzip and rename it to "images" and place it in the path MathVista_eval/data.

We give the response generation scripts for the different models, they start with "generate_response_geo", here R-CoT-7B is used as an example:

cd MathVista_eval/evaluation
python generate_response_geo_rcot7b.py -output_dir ../results --output_file output_bard.json --checkpoint weight_path

Extract the short answer text for score calculation:

python extract_answer.py --output_dir ../results --output_file output_bard.json 

Calculate the final score:

python calculate_score.py --output_dir ../results --output_file output_bard.json --score_file scores.json

GeoQA

You need to download the test image GeoQA_test.zip. Unzip and rename it to "test" and place it in the path GeoQA_test/images/test. Generate responses from the model:

cd GeoQA_test
python model_vqa.py --checkpoint weight_path

Run automatic evaluation to calculate the accuracy:

python geo_acc_calculate.py --predictions_file path-to-output-file

Train

The json file used for R-CoT training can be downloaded at Link. Please change the image path in the json file to your path and put the image under your path.

For R-CoT-8B: You need to place the downloaded 'rcot8b_rcot2b_training_json' under the path set in 'shell/data/rcot_finetune.json'

cd R-CoT8B-main
sh shell/R-CoT-8B/rcot8b_finetune_full.sh

For R-CoT-7B: You need to place the downloaded 'GeoMM.json' and 'geo170k.json' under the path set in 'data.txt'

cd R-CoT7B-main
sh finetune.sh

For R-CoT-2B: You need to place the downloaded 'rcot8b_rcot2b_training_json' under the path set in 'shell/data/rcot_finetune.json'

cd R-CoT2B-main
sh shell/R-CoT-2B/rcot2b_finetune_full.sh

Citing R-CoT

If you wish to refer to the baseline results published here, please use the following BibTeX entries:

@article{deng2024r,
  title={R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models},
  author={Deng, Linger and Liu, Yuliang and Li, Bohan and Luo, Dongliang and Wu, Liang and Zhang, Chengquan and Lyu, Pengyuan and Zhang, Ziyang and Zhang, Gang and Ding, Errui and others},
  journal={arXiv preprint arXiv:2410.17885},
  year={2024}
}

Acknowledgement

R-CoT focuses on generating high-quality mathematical inference data to improve the inference performance of models. R-CoT is based on QwenVL, InternVL2, and InternLM-XC2. Thanks to Qwen-VL, InternVL, InternLM-XC2 and LLaVA.

R-CoT project is intended for non-commercial use only. For commercial inquiries or to explore more advanced versions of the R-CoT series LMMs, please contact us at ylliu@hust.edu.cn.