Large Language Models are Better Reasoners with Self-Verification
October 18, 2023 ยท View on GitHub
This is the official implementation of Large Language Models are Better Reasoners with Self-Verification.
(EMNLP 2023 Findings)
Demo
Installation
Make sure you have Python>=3.8 installed on your machine.
pip install torch==1.8.2+cu111 torchtext==0.9.2 -f https://download.pytorch.org/whl/lts/1.8/torch_lts.html
pip install tqdm transformers sklearn pandas numpy sentencepiece openai
Set your OpenAI API key
# https://beta.openai.com/account/api-keys
export OPENAI_API_KEY=(YOUR OPENAI API KEY)
Set arguments.
model=CODEX # {"gpt3", "gpt3-medium", "gpt3-large", "gpt3-xl", "CODEX", "CODEX-001"}. "codex" is the smallest model.
dataset=multiarith # We can use other datasets. See help for the details.
api_time_interval=4.0 # Caution. The API allows users request API up to 20 times in a minutes, otherwise errors happen.
Quick Start
Demo
python demo.py
Self-Verification (our proposal)
python main.py --method=verifier_cot --model=${model} --dataset=${dataset}
CoT
# MultiArith and GSM8K are currently available.
python main.py --method=few_shot_cot --model=${model} --dataset=${dataset}
Method

- Forward Reasoning, the LLM generates candidate thought chains and conclusions for a given problem text;
- Backward Verification, we use the LLM to verify whether the conditions meet the candidate conclusions and rank the candidate conclusions based on a verification score.
Cite
@misc{weng2023large, title={Large Language Models are Better Reasoners with Self-Verification}, author={Yixuan Weng and Minjun Zhu and Fei Xia and Bin Li and Shizhu He and Kang Liu and Jun Zhao}, year={2023}, eprint={2212.09561}, archivePrefix={arXiv}, primaryClass={cs.AI} }