Say What You Mean! Large Language Models Speak Too Positively about Negative Commonsense Knowledge

July 11, 2023 · View on GitHub

This repo contains the experimental code and resources used in our ACL 2023 paper: Say What You Mean! Large Language Models Speak Too Positively about Negative Commonsense Knowledge.

Install Requirements

export PJ_HOME=${YOUR_WORKING_DIR}/uncommongen/
export OPENAI_API_KEY=${YOUR_API_KEY}

pip3 install -r requirements.txt

Download Datasets

The CSK-PN dataset can be found at [Google drive] in jsonline format.

Since running OpenAI models are costly, we also release the generated results by these LLMs along with the dataset (so it's a pretty big json per line). You will find them nested under cg_pred, qa_pred, etc.

How to Run

Note: OpenAI has deprecated code-davinci-002.

Running constrained generation (CG)

For detailed parameters, please refer to constrained_generation/llm_constrained_generation.py.

An example:

python3 constrained_generation/llm_constrained_generation.py -i ${INPUT_FILE} -o ${OUTPUT_FILE} -m ${MODEL_NAME} --posk ${POSK} --negk ${NEGK} -b 16 --cot none

Running boolean question answering (QA)

For detailed parameters, please refer to boolqa/llm_answer_prediction.py.

An example:

python3 boolqa/llm_answer_prediction.py -i ${INPUT_FILE} -o ${OUTPUT_FILE} -m ${MODEL_NAME} --posk ${POSK} --negk ${NEGK} -b 16 --cot none

Evaluation

Evaluate constrained generation (CG)

python3 evaluation/eval_constrained_generation.py -i ${INPUT_FILE} -m ${MODEL_KEY}

Note that ${MODEL_KEY} is the id of a generation in the input json file, typically in the form of ${MODEL_NAME}_ex-${POSK}p${NEGK}n, such as text-davinci-002_ex-3p3n. Different parameters could result in different model keys. Please check the code carefully.

Evaluate boolean question answering (QA)

python3 evaluation/eval_boolqa.py -i ${INPUT_FILE} -m ${MODEL_KEY}

Same note as the CG task.

Citation

If you find our paper or resources useful, please kindly cite our paper. If you have any questions, please contact us!

@inproceedings{chen-etal-2023-say,
    title = "Say What You Mean! Large Language Models Speak Too Positively about Negative Commonsense Knowledge",
    author = "Chen, Jiangjie  and
      Shi, Wei  and
      Fu, Ziquan  and
      Cheng, Sijie  and
      Li, Lei  and
      Xiao, Yanghua",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.acl-long.550",
    pages = "9890--9908",
    abstract = "Large language models (LLMs) have been widely studied for their ability to store and utilize positive knowledge. However, negative knowledge, such as {``}lions don{'}t live in the ocean{''}, is also ubiquitous in the world but rarely mentioned explicitly in text.What do LLMs know about negative knowledge?This work examines the ability of LLMs on negative commonsense knowledge.We design a constrained keywords-to-sentence generation task (CG) and a Boolean question answering task (QA) to probe LLMs.Our experiments reveal that LLMs frequently fail to generate valid sentences grounded in negative commonsense knowledge, yet they can correctly answer polar yes-or-no questions.We term this phenomenon the belief conflict of LLMs.Our further analysis shows that statistical shortcuts and negation reporting bias from language modeling pre-training cause this conflict.",
}