๐Ÿ’ฌ Citation

December 6, 2023 ยท View on GitHub

๐Ÿ“– Introduction

This is the official repository for the paper "Know What I don't Know: Handling Ambiguous and Unknown Questions for Text-to-SQL".

In this paper, we propose a simple yet effective counterfactual example generation approach that automatically produces ambiguous and unanswerable text-to-SQL examples. Furthermore, we propose a weakly supervised DTE (Detecting-Then-Explaining) model for error detection, localization, and explanation.

DTE aims to detect unanswerable and ambiguous spans in user question and give explainations to end-users by probing grounding knowledge from pretrained language models.

โšก๏ธ Environment

config your local environment.

conda create -n dte python=3.7
conda activate dte
conda install pytorch==1.7.1    cudatoolkit=10.1 -c pytorch
pip install -r requirements.txt

set your own wandb key, get it from https://wandb.ai/home:

export WANDB_API_KEY=your_own_key_abcd

๐Ÿ”ง Data Preparation

We put the data in data folder. You can download the data from here. Unzip the data.zip and put the subfolders in data folder.

๐Ÿ”จ Model Training

Training scripts: run.sh. Before your training, please figure out every command and arguments, and change user-related, path-related and gpu-device-related arguments to your own environment. The training time is about 3 hours in 4 x 16G V100.

Usage:

chmod +x run.sh
./run.sh exp_name

๐Ÿ’ฌ Citation

If you find our work is helpful, please cite as:

@inproceedings{wang-etal-2023-know,
    title = "Know What {I} don{'}t Know: Handling Ambiguous and Unknown Questions for Text-to-{SQL}",
    author = "Wang, Bing and Gao, Yan and Li, Zhoujun  and Lou, Jian-Guang",
    booktitle = "Findings of ACL 2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.findings-acl.352",
    doi = "10.18653/v1/2023.findings-acl.352",
    pages = "5701--5714"
}

๐Ÿ‘ Contributing

We welcome contributions and suggestions!