RAP for Text2Event
October 29, 2022 · View on GitHub
Requirements
General
- Python (verified on 3.8)
- CUDA (verified on 11.1)
Python Packages
- see requirements.txt
conda create -n text2event python=3.8
conda activate text2event
pip install -r requirements.txt
Dataset
Casie
The casie dataset is in the folder data/text2tree.
data/text2tree/
├── casie
├── casie_001
└── casie_010
ACE05
We followed the same split ratio in DEGREE, so when implement ace in Text2Event, please conduct the dataset preprocessing in BaseModel/DEGREE folder.
After preprocessing in DEGREE, run python ./preprocess/convert_ace.py to convert the data in Text2Event format. The processed data folders are listed below.
data/text2tree/
├── ace
├── ace_001
├── ace_003
├── ace_005
├── ace_010
├── ace_020
└── ace_030
Data Format
Data folder contains four files:
data/text2tree/casie
├── event.schema
├── test.json
├── train.json
└── val.json
train/val/test.json are data files, and each line is a JSON instance.
Each JSON instance contains text and event fields, in which text is plain text, and event is event linearized form.
If you want to use other key names, it is easy to change the input format in run_seq2seq.py.
Retrieval
Generate the retrieved reference as command following:
cd ../../retrieval
python retrieve.py --base_model t2e
For each split setting, a new folder will be generated like ace_001_retrieved/.
Model Training
Training scripts as follows:
run_seq2seq.py: Python code entry, modified from the transformers/examples/seq2seq/run_seq2seq.pyrun_seq2seq.bash: Model training script logging to the log file.
Trained models are saved in the models/ folder.
bash run_seq2seq.bash -d 1 -f tree -m t5-base -l 3e-4 --data casie_010_retrieved --lr_scheduler linear --warmup_steps 300 -b 16 --epoch 50 --eval_steps 100 --gradient_accumulation_steps 1
bash run_seq2seq.bash -d 1 -f tree -m t5-large -l 3e-4 --data ace_005_retrieved --lr_scheduler linear --warmup_steps 100 -b 4 --epoch 60 --eval_steps 50 --gradient_accumulation_steps 8
-drefers to the GPU device id.-mrefers to themodel-folder-path, we selectt5-baseas backbone forCASIEandt5-largeforACE05.-lrefers to the learning rate.--datarefers to the dataset_name, we takecasie_010_retrievedandace_005_retrievedas example here.-brefers to the batch size.
Model Evaluation
bash run_eval.bash -d 0 -m <model-folder-path> -i <data-folder-path> -c -b 8
The evaluation is also conducted after training.