Assessing Step-by-Step Reasoning against Lexical Negation: A Case Study on Syllogism (EMNLP 2023)

November 5, 2025 · View on GitHub

This repository contains the official code for the EMNLP 2023 paper: "Assessing Step-by-Step Reasoning against Lexical Negation: A Case Study on Syllogism".

Experimental Overview

The experimental pipeline consists of two main stages:

Data Generation: Generate vocabulary and question datasets tailored for different experimental settings.
Experiment Execution: Run experiments using various models and experimental configurations.

Setup

Clone the repository:

git clone https://github.com/muyo8692/stepbystep-reasoning-vs-negation.git
cd stepbystep-reasoning-vs-negation

Install dependencies:
```
pip install -r requirements.txt
```

Running Experiments

Step 1: Generate Data

Execute the following script to generate the necessary datasets for all experiments.

bash src/scripts/generate_data.sh

This will populate the data/ directory with the required vocabulary and question files.

Step 2: Run an Experiment

The run_experiment.sh script is used to execute experiments. The script is highly configurable via command-line arguments.

Experimental Settings

The --exp_level argument controls the reasoning setting, as described in the paper:

Setting	Description	`exemplars_type`	`questions_type`	`vocab_type`
`BASE`	Standard syllogistic reasoning with real-world knowledge.	`base`	`base`	`real`
`FIC`	Reasoning with fictional knowledge to test logical deduction.	`base`	`base`	`fiction`
`FICNEG`	In-domain negation: both exemplars and questions contain negation.	`neg`	`neg`	`fiction`
`FICNEG-O`	Out-of-domain negation: only questions contain negation.	`base`	`neg`	`fiction`

Examples

Run the default experiment (BASE setting with GPT-3.5):
```
bash src/scripts/run_experiment.sh
```

Run the FICNEG-O experiment on the occupation domain with opt-175b on a specific GPU:

bash src/scripts/run_experiment.sh \
    --model_name opt-175b \
    --exp_level FICNEG-O \
    --task_domain occupation \
    --gpu_num 1 \
    --certain_gpus_list 0

Run an exemplar_reorder experiment in the FIC setting:

bash src/scripts/run_experiment.sh \
    --model_name openai-gpt-4 \
    --exp_level FIC \
    --task_domain sports \
    --task_type exemplar_reorder \
    --exemplar_order_list 'exemplar_a' 'exemplar_b' 'exemplar_c' \
    --exemplar_label_str 'yes_no_no'

Results will be saved to the output/ directory, organized by date and model name.

Cite our work

@inproceedings{ye-etal-2023-assessing,
    title = "Assessing Step-by-Step Reasoning against Lexical Negation: A Case Study on Syllogism",
    author = "Ye, Mengyu and Kuribayashi, Tatsuki and Suzuki, Jun and Kobayashi, Goro and Funayama, Hiroaki",
    booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
    year = "2023",
    url = "https://aclanthology.org/2023.emnlp-main.912/",
}