README.md

December 13, 2025 ยท View on GitHub

๐Ÿ“Š START: Spatial and Textual Learning for Chart Understanding

Zhuoming Liu1, Xiaofeng Gao2, Feiyang Niu2, Qiaozi Gao2, Liu Liu3, Robinson Piramuthu2

1University of Wisconsin-Madison 2Amazon AGI 3MIT

Introduction

This code repo holds the open-source models and resources reproduced version of START following the method proposed in paper START: Spatial and Textual learning for chART understanding.

Key Contributions

  • Spatial-Textual Learning:
    • (i) Chart-element grounding โ€” to strengthen an MLLM's understanding of chart visual layout
    • (ii) Chart-to-code generation โ€” to enhance understanding of data details
  • START-Dataset: A novel data-generation pipeline that first leverages an MLLM to translate real chart images into executable chart code, recovering the underlying data representation while preserving the visual distribution of real-world charts. We then evolve the code with a Large Language Model (LLM) to ascertain the positions of chart elements that capture the chart's visual structure, addressing challenges that existing methods cannot handle.
  • CS-Bench (Chart Spatial understanding Benchmark): A new benchmark to evaluate a model's ability to understand chart spatial structures, filling a critical gap in comprehensive chart understanding evaluation.
  • State-of-the-art Performance: Leveraging spatial and textual learning, START delivers consistent gains across model sizes and benchmarks over the base models and surpasses prior state-of-the-art by a clear margin.

Contents

1. Install

Please install the environment below to reproduce the model performance on CS-Bench.

conda create -n qwenvl_env python=3.10
conda activate qwenvl_env

pip install vllm==0.11.0
pip install transformers==4.57.0
# Install Qwen-VL utility library (recommended for offline inference)
pip install qwen-vl-utils==0.0.14
pip install omegaconf
pip install matplotlib ipdb
pip install packaging
pip install ninja
pip install flash-attn==2.8.2 --no-build-isolation
pip install wandb
pip install anls

2. Reproduced START Dataset

2.1 The Raw Code data

You can download the Python Chart codes here, and all the evolved Python Chart codes here.

Running these python codes will give you the rendered image and the json file with chart's element location (when you running the evolved code):

# run all the python chart code: generate rendered image
python data_preprocessing/run_simulated_code.py \
    --code_dir /<YOUR_DATA_DIR>/generated_code \
    --image_dir /<YOUR_DATA_DIR>/simulated_images/ \
    --info_file /<YOUR_DATA_DIR>/simulated_images/all_info.jsonl \
    --save_dir /<YOUR_DATA_DIR>/temp_code/

# run all the evolved python chart code: generate rendered image and json files
python data_preprocessing/run_evolved_code.py \
    --code_dir /<YOUR_DATA_DIR>/descriptive_evolved_code/ \
    --image_dir /<YOUR_DATA_DIR>/evolved_simulated_images/ \
    --info_file /<YOUR_DATA_DIR>/evolved_simulated_images/all_info.jsonl \
    --save_dir /<YOUR_DATA_DIR>/temp_evolved_code/

2.2 The START Dataset - Annotation and images

You can download the prepared annotations and image here.

Dataset Statistics

Dataset# Annotations# Unique Images
SFT Annotation399,89770,652
RL Annotation39,2113,979

Question Type Distribution (RL Annotation)

In this open-source version, we also includes the point and line training data.

Question TypeCountPercentage
bbox6,69417.07%
Number-in-Chart6,05915.45%
Text-in-Chart5,84214.90%
Number-in-General4,96912.67%
Text-in-General2,9557.54%
dot1,7644.50%
line1,4983.82%
chart_to_code8812.25%

Question Type Distribution (SFT Annotation)

Question TypeCountPercentage
Number-in-General145,67636.43%
Chart to code (With QA)111,03227.77%
Text-in-General68,91717.23%
Text-in-Chart27,2626.82%
Number-in-Chart23,2845.82%
Multiple Round Agentic19,7894.95%
Other3,9370.99%

In this open-source version, we also includes the agentic multiple-rounds converation in SFT dataset, sample:

{'messages': [{'content': '<image>### Role\nYou are a meticulous chart-analyzing agent.  \nYou may call **Atomic Operations** to gather extra visual evidence, *or* you may answer directly if confident.\n\n### Chart Image\n(see attached image)\n\n### User Question\nHow many subplots show both lines increasing from All Rp to Rp/Re < 10?\n\n### Available Atomic Operations\n1. **CropRegion** โ€“ args: `[x_min, y_min, x_max, y_max]`    \n2. **DrawBBox**    โ€“ args: `[x_min, y_min, x_max, y_max]`    \n3. **DrawLine**    โ€“ args: `[x1, y1, x2, y2]`\n\n### Instructions\nProduce **one** of the two response styles below:\n\n**(A) Plan-and-Act** โ€“ if you need operations first\n````\n```json\n[\n  {"op": "<operation_name>", "args": [...], "purpose": "<why>"},\n  โ€ฆ\n]\n```\n````\n\n**(B) Direct Answer** โ€“ if no operations are required  \n````\n<think><step-by-step explanation></think>\nThe final answer is: \\boxed{<answer>}\n````\n', 'role': 'user'}, {'content': '```json\n[{"op": "DrawBBox", "args": [62, 429, 468, 761], "purpose": "Highlight the AGN subplot to examine the trend of both lines"}, {"op": "DrawBBox", "args": [62, 37, 468, 368], "purpose": "Highlight the Quiescent subplot for comparison - both lines decrease"}, {"op": "DrawBBox", "args": [554, 37, 961, 368], "purpose": "Highlight the Star Forming subplot for comparison - only control line increases"}, {"op": "DrawBBox", "args": [554, 429, 961, 761], "purpose": "Highlight the Composite subplot for comparison - both lines decrease"}]\n```', 'role': 'assistant'}, {'content': '<image>\nHere are the images base on your operation.\n', 'role': 'user'}, {'content': '<think>By examining each subplot: Quiescent shows both lines decreasing, Star Forming shows only the control line increasing while all pairs decreases, AGN shows both lines clearly increasing from left to right, and Composite shows both lines decreasing. Therefore, only 1 subplot (AGN) shows both lines increasing from All Rp to Rp/Re < 10.</think>\nThe final answer is: \\boxed{1}\n', 'role': 'assistant'}], 'images': ['0905.3386_1_origin.png', '0905.3386_1_6_draw.png']}

3. Reproduced CS-Bench

Please download the CS-Bench here.

CS-Bench Statistics

MetricValue
Number of Annotations692
Number of Unique Images600

Question Type Distribution

Question TypeCountPercentage
Grounding Question35050.58%
QA Grounding Question34249.42%

4. Reproduced Checkpoint

The following model is trained on the reproduced START dataset initialized with Qwen2.5-VL-7B.

ModelsCharXiv (desc)CharXiv (rea)ChartQAProChartMimicCS-Bench (Recall@0.3)CS-Bench (acc)
Qwen2.5-VL-7B73.043.241.947.123.150.8
START-RL-7B78.547.647.269.948.665.5

5. Reproduce Checkpoint Results on Benchmarks

We provide evaluation Code for CS-Bench, ChartQAPro, CharXiv, ChartMimic in this section.

5.1 CS-Bench Evaluation

Please refer to CS-Bench Eval.

5.2 ChartQAPro Evaluation

Please refer to ChartQAPro Eval.

5.3 CharXiv Evaluation

Please refer to CharXiv Eval.

5.4 ChartMimic Evaluation

Please refer to ChartMimic Eval.

6. Training Code

Training code will come soon.

Contact

Zhuoming Liu (zliu2346@wisc.edu)

๐Ÿ“– Citation

Please kindly cite our paper if you find this project helpful.

@misc{liu2025startspatialtextuallearning,
      title={START: Spatial and Textual Learning for Chart Understanding},
      author={Zhuoming Liu and Xiaofeng Gao and Feiyang Niu and Qiaozi Gao and Liu Liu and Robinson Piramuthu},
      year={2025},
      eprint={2512.07186},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2512.07186},
}

๐Ÿ’ก Acknowledgements

This project was built upon the following repositories with many thanks to their authors.

Qwen2.5-VL, vLLM