README.md

June 24, 2024 ยท View on GitHub

X-Gear: Multilingual Generative Language Models for Zero-Shot Cross-Lingual Event Argument Extraction

Code for our ACL-2022 paper Multilingual Generative Language Models for Zero-Shot Cross-Lingual Event Argument Extraction.

Setup

  • Python=3.7.10
$ conda env create -f environment.yml

Data and Preprocessing

  • Go into the folder ./preprocessing/
  • If you follow the instruction in the README.md, then you can get your data in the folder ./processed_data/

Training

  • Run ./scripts/generate_data_ace05.sh and ./scripts/generate_data_ere.sh to generate training examples of different languages for X-Gear. The generated training data will be saved in ./finetuned_data/.

  • Run ./scripts/train_ace05.sh or ./scripts/train_ere.sh to train X-Gear. Alternatively, you can run the following command.

    python ./xgear/train.py -c ./config/config_ace05_mT5copy-base_en.json
    

    This trains X-Gear with mT5-base + copy mechanisim for ACE-05 English. The model will be saved in ./output/. You can modify the arguments in the config file or replace the config file with other files in ./config/.

Evaluating

  • Run the following script to evaluate the performance for ACE-05 English, Arabic, and Chinese.

    ./scripts/eval_ace05.sh [model_path] [prediction_dir]
    

    If you want to test X-Gear with mT5-large, remember to modify the config file in ./scripts/eval_ace05.sh.

  • Run the following script to evaluate the performance for ERE English and Spanish.

    ./scripts/eval_ere.sh [model_path] [prediction_dir]
    

    If you want to test X-Gear with mT5-large, remember to modify the config file in ./scripts/eval_ere.sh.

We provide our pre-trained models and show their performances as follows.

ACE-05

en Arg-Ien Arg-Car Arg-Iar Arg-Czh Arg-Izh Arg-C
X-Gear-ace05-mT5-base+copy-en73.3969.2847.6442.0957.8154.46
X-Gear-ace05-mT5-base+copy-ar33.8727.1772.9766.9231.1428.84
X-Gear-ace05-mT5-base+copy-zh59.8555.1538.0434.8872.9368.99
X-Gear-ace05-mT5-large+copy-en75.1671.8554.1850.0063.1458.40
X-Gear-ace05-mT5-large+copy-ar38.8134.5773.4967.7539.2636.13
X-Gear-ace05-mT5-large+copy-zh61.4455.4038.7136.1470.4566.99

ERE

en Arg-Ien Arg-Ces Arg-Ies Arg-C
X-Gear-ere-mT5-base+copy-en78.2671.5564.3158.70
X-Gear-ere-mT5-base+copy-es69.2159.7970.6766.37
X-Gear-ere-mT5-large+copy-en78.1073.0464.8260.35
X-Gear-ere-mT5-large+copy-es69.0363.7371.4768.49

Citation

If you find that the code is useful in your research, please consider citing our paper.

@inproceedings{acl2022xgear,
    author    = {Kuan-Hao Huang and I-Hung Hsu and Premkumar Natarajan and Kai-Wei Chang and Nanyun Peng},
    title     = {Multilingual Generative Language Models for Zero-Shot Cross-Lingual Event Argument Extraction},
    booktitle = {Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL)},
    year      = {2022},
}