VNHSGE: VietNamese High School Graduation Examination Dataset for Large Language Models

July 20, 2023 · View on GitHub

VNHSGE: VietNamese High School Graduation Examination Dataset for Large Language Models

VNHSGE is a dataset for large language models, collected from the Vietnamese National High School Graduation Examination and similar exams.

  • Evaluate large language models in multitasks such as question answering, text generation, reading comprehension, visual question answering, and more.
  • Cover nine subjects including 300 essays on literature and 19,000 multiple-choice questions on other subjects including mathematics, physics, chemistry, biology, English, history, geography, and civic education
  • Contain both text and images
  • Support Vietnamese and English languages

arXiv PWC

VNHSGE dataset and other datasets: the performance of ChatGPT and BingChat on the VNHSGE dataset is compared to other datasets in the GPT-4 Report

alt text

Latest News

  • Full dataset will be uploaded soon!
  • [7/20/2023] Full VNHSGE geography dataset was uploaded (200, 200, 1600 questions for eval, test, train sets);
  • [7/19/2023] Full VNHSGE civic education dataset was uploaded (200, 200, 1600 questions for eval, test, train sets);
  • [7/19/2023] Full VNHSGE history dataset was uploaded (200, 200, 1600 questions for eval, test, train sets);
  • [5/31/2023] Eval test was uploaded: 30 essays on literature and 1700 multiple-choice questions on other subjects.

Dataset structure

VNHSGE dataset covers nice subjects including 300 essays on Literature and 19,000 multiple choice questions on other subjects.

SubjectTypeNumber of questions per examNumber of examsQuestion Total
MathematicsMultiple choice50502500
LiteratureEssay650300
EnglishMultiple choice50502500
PhysicsMultiple choice40502000
ChemistryMultiple choice40502000
BiologyMultiple choice40502000
HistoryMultiple choice40502000
GeographyMultiple choice40502000
Civic EducationMultiple choice40502000

Dataset folder

VNHSGE
├── VNHSGE-V                                 # Vietnamese versions
│    ├── JSON format                         # JSON folder                            
│    │   ├── eval                            # eval set
│    │   │     ├── Mathematics               # VNHSGE mathematics dataset
│    │   │     │   ├── MET_Math_IE_2023.json # JSON file     
│    │   │     │   ├── MET_Math_IE_2023      # Image folder
│    │   │     ├── ..........                # 
│    │   │     ├── Civic Education           # VNHSGE civic education dataset
│    │   ├── test                            # test set
│    │   ├── train                           # train set
│    └── Word format                         # Word folder                            
│    │   ├── eval                   
│    │   │     ├── Mathematics               # VNHSGE mathematics dataset
│    │   │     │   ├── MET_Math_IE_2023.docx # Word file 
│    │   │     ├── ..........                # 
│    │   │     ├── Civic Education           # VNHSGE civic education dataset
├── VNHSGE-E                                 # English versions

Word format

IDIQQCIAE
1The volume of a cube with edge 2a is: A. 8a^3 B. 2a^3 C. a^3 D. 6a^3AThe volume of a cube with edge 2a is: V=(2a)^3=8a^3

ID refers to the ID of the question; IQ refers to the images of the question; Q refers to the question content; C refers to the choice options; IE refers to the images of the explanation; and E refers to the explanation content.

JSON format

{ 
    "ID": "1", 
    "IQ": " ", 
    "Q": "1) The volume of a cube with edge 2a is:\nA. 8a^3.\t\nB. 2a^3.\t\nC. a^3.\t\nD. 6a^3.", "C": "A", 
    "IA": " ", 
    "E": "The volume of a cube with edge 2a is: V=(2a)^3=8a^3.",
}

ChatGPT and Bing AI Chat Performances

  • Response

JSON format

{ 
    "ID": "1", 
    "IQ": " ", 
    "Q": "1) The volume of a cube with edge 2a is:\nA. 8a^3.\t\nB. 2a^3.\t\nC. a^3.\t\nD. 6a^3.", "C": "A", 
    "IA": " ", 
    "E": "The volume of a cube with edge 2a is: V=(2a)^3=8a^3.", 
    "CC": "A", 
    "CE": "The formula for the volume of a cube is V = s^3, where s is the length of one of its sides. Therefore, the volume of the
cube with a side length of 2a is: V = (2a)^3 = 8a^3", 
}
  • Performance

We evaluated the performance of ChatGPT and BingChat on the VNHSGE dataset.

Math ChatGPTMath BingChatLit ChatGPTLit BingChatEng ChatGPTEng BingChatPhy ChatGPTPhy BingChatChe ChatGPTChe BingChatBio ChatGPTBio BingChatHis ChatGPTHis BingChatGeo ChatGPTGeo BingChatCiv ChatGPTCiv BingChat
201952567552.757692605540556067.542.582.550756075
2020665668.951.25869662.567.542.557.56072.547.58552.5707087.5
202160667560.2576866067.562.55052.567.555907582.562.592.5
2022626056.37080946567.547.547.557.572.56092.562.58582.590
2023546264.849.75789457.572.547.552.5606577.592.567.58577.582.5
AVG58.8606856.879.292.461664852.8586956.588.561.579.570.585.5

For complex calculation and reasoning subjects like mathematics, physics, chemistry, and biology, ChatGPT and BingChat have performance ranges from 48% to 69%. However, for subjects that rely more on language skills, such as literature, English, history, geography, and civic education, their performances range from 56.5% to 92.4%.

alt text

  • Comparison of performances among ChatGPT, BingChat, and Vietnamese students in score spectrum: Our objective is to assess whether LLMs have capabilities comparable to human-level performance, despite the challenges posed by different settings. Through this comparison, we can evaluate the potential of LLMs as effective learning support tools for students across various subject areas. alt text (Mathematics score spectrum of Vietnamese students in 2021).

Citation

If you find this work useful for your research, please feel free to use them (don't forget to cite our paper):

@article{xuan2023vnhsge,
  title={VNHSGE: VietNamese High School Graduation Examination Dataset for Large Language Models},
  author={Xuan-Quy, Dao and Ngoc-Bich, Le and The-Duy, Vo and Xuan-Dung, Phan and Bac-Bien, Ngo and Van-Tien, Nguyen and Thi-My-Thanh, Nguyen and Hong-Phuoc, Nguyen},
  journal={arXiv preprint arXiv:2305.12199},
  year={2023}
}

@article{dao2023can,
  title={Can ChatGPT pass the Vietnamese National High School Graduation Examination?},
  author={Xuan-Quy, Dao and Ngoc-Bich, Le and Xuan-Dung, Phan and Bac-Bien, Ngo},
  journal={arXiv preprint arXiv:2306.09170},
  year={2023}
}

@article{dao2023chatgpt,
  title={ChatGPT is Good but Bing Chat is Better for Vietnamese Students},
  author={Xuan-Quy, Dao and Ngoc-Bich, Le and Xuan-Dung, Phan and Bac-Bien, Ngo},
  journal={arXiv preprint arXiv:2307.08272},
  year={2023}
}