VisualMRC

March 31, 2025 ยท View on GitHub

VisualMRC is a visual machine reading comprehension dataset that proposes a task: given a question and a document image, a model produces an abstractive answer.

Figure 1 from paper

You can find more details, analyses, and baseline results in our paper. You can cite it as follows:

@inproceedings{VisualMRC2021,
  author    = {Ryota Tanaka and
               Kyosuke Nishida and
               Sen Yoshida},
  title     = {VisualMRC: Machine Reading Comprehension on Document Images},
  booktitle = {AAAI},
  year      = {2021}
}

๐Ÿ“ข News

  • [2025.03.27] Our VisualMRC dataset is available on ๐Ÿค—HuggingFace.

Download

Statistics

  • 10,197 images
  • 30,562 QA pairs
  • 10.53 average question tokens (tokenizing with NLTK tokenizer)
  • 9.53 average answer tokens (tokenizing wit NLTK tokenizer)
  • 151.46 average OCR tokens (tokenizing with NLTK tokenizer)