Say What You Mean! Large Language Models Speak Too Positively about Negative Commonsense Knowledge
July 11, 2023 ยท View on GitHub
This repo contains the experimental code and resources used in our ACL 2023 paper: Say What You Mean! Large Language Models Speak Too Positively about Negative Commonsense Knowledge.
Install Requirements
export PJ_HOME=${YOUR_WORKING_DIR}/uncommongen/
export OPENAI_API_KEY=${YOUR_API_KEY}
pip3 install -r requirements.txt
Download Datasets
The CSK-PN dataset can be found at [Google drive] in jsonline format.
Since running OpenAI models are costly, we also release the generated results by these LLMs along with the dataset (so it's a pretty big json per line). You will find them nested under cg_pred, qa_pred, etc.
How to Run
Note: OpenAI has deprecated
code-davinci-002.
Running constrained generation (CG)
For detailed parameters, please refer to constrained_generation/llm_constrained_generation.py.
An example:
python3 constrained_generation/llm_constrained_generation.py -i ${INPUT_FILE} -o ${OUTPUT_FILE} -m ${MODEL_NAME} --posk ${POSK} --negk ${NEGK} -b 16 --cot none
Running boolean question answering (QA)
For detailed parameters, please refer to boolqa/llm_answer_prediction.py.
An example:
python3 boolqa/llm_answer_prediction.py -i ${INPUT_FILE} -o ${OUTPUT_FILE} -m ${MODEL_NAME} --posk ${POSK} --negk ${NEGK} -b 16 --cot none
Evaluation
Evaluate constrained generation (CG)
python3 evaluation/eval_constrained_generation.py -i ${INPUT_FILE} -m ${MODEL_KEY}
Note that ${MODEL_KEY} is the id of a generation in the input json file, typically in the form of ${MODEL_NAME}_ex-${POSK}p${NEGK}n, such as text-davinci-002_ex-3p3n. Different parameters could result in different model keys. Please check the code carefully.
Evaluate boolean question answering (QA)
python3 evaluation/eval_boolqa.py -i ${INPUT_FILE} -m ${MODEL_KEY}
Same note as the CG task.
Citation
If you find our paper or resources useful, please kindly cite our paper. If you have any questions, please contact us!
@inproceedings{chen-etal-2023-say,
title = "Say What You Mean! Large Language Models Speak Too Positively about Negative Commonsense Knowledge",
author = "Chen, Jiangjie and
Shi, Wei and
Fu, Ziquan and
Cheng, Sijie and
Li, Lei and
Xiao, Yanghua",
booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = jul,
year = "2023",
address = "Toronto, Canada",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.acl-long.550",
pages = "9890--9908",
abstract = "Large language models (LLMs) have been widely studied for their ability to store and utilize positive knowledge. However, negative knowledge, such as {``}lions don{'}t live in the ocean{''}, is also ubiquitous in the world but rarely mentioned explicitly in text.What do LLMs know about negative knowledge?This work examines the ability of LLMs on negative commonsense knowledge.We design a constrained keywords-to-sentence generation task (CG) and a Boolean question answering task (QA) to probe LLMs.Our experiments reveal that LLMs frequently fail to generate valid sentences grounded in negative commonsense knowledge, yet they can correctly answer polar yes-or-no questions.We term this phenomenon the belief conflict of LLMs.Our further analysis shows that statistical shortcuts and negation reporting bias from language modeling pre-training cause this conflict.",
}