README.md

July 13, 2022 ยท View on GitHub

This example shows the reproducing procedure for our TACL paper: End-to-end Argument Mining with Cross-corpora Multi-task Learning.

Environment

  • Required: Python 3.8
  • Recommended: Pyenv and Virtualenv

We tested the code on Linux and OSX.

To ensure reproducibility, please use following previous version:

git checkout example/mtam

Install packages

./examples/multitask_am/install.sh

Download and pre-process corpora

Before running the pre-processing code, you must see license information:

# Download and pre-process corpora
./examples/multitask_am/setup_all.sh
# [Optional] Show corpora statistics
./examples/multitask_am/show_statistics.sh

Experiments

Single-task (i.e., single corpus, ST) training

Training and evaluation:

# AAEC (UKP corpus in Essay-level)
./examples/multitask_am/st/aaec_essay.sh
# AAEC (UKP corpus in Paragraph-level)
./examples/multitask_am/st/aaec_para.sh
# AASD (SciDTB with argument annotation)
./examples/multitask_am/st/aasd.sh
# AbstRCT (Abstracts from RCT domain)
./examples/multitask_am/st/abstrct.sh
# CDCP (User comments for Consumer Debt Collection Practices)
./examples/multitask_am/st/cdcp.sh
# MTC (Microtext corpus)
./examples/multitask_am/st/mtc.sh

Example of showing test data scores:

# Span identification
cat log/multitask_am/st/aaec_para/*/finetune/evaluation/test.jsonl | jq .scores.aaec_para.anchors.f
# Component classification
cat log/multitask_am/st/aaec_para/*/finetune/evaluation/test.jsonl | jq .scores.aaec_para.labels.total.f
# Relation classification
cat log/multitask_am/st/aaec_para/*/finetune/evaluation/test.jsonl | jq .scores.aaec_para.edges.total.f
# Link identification
cat log/multitask_am/st/aaec_para/*/finetune/evaluation/test.jsonl | jq .scores.aaec_para.edges.link.f


# [w/ oracle span] Component classification
cat log/multitask_am/st/aaec_para/*/finetune/os/evaluation/test.jsonl | jq .labels.total.f
# [w/ oracle span] Relation classification
cat log/multitask_am/st/aaec_para/*/finetune/os/evaluation/test.jsonl | jq .edges.total.f
# [w/ oracle span] Link identification
cat log/multitask_am/st/aaec_para/*/finetune/os/evaluation/test.jsonl | jq .edges.link.f

Example of showing predicted test data:

# Only shows the first sample
cat log/multitask_am/st/aaec_para/0/finetune/prediction/test.mrp | head -1 | jq .
# [w/ oracle span]
cat log/multitask_am/st/aaec_para/0/finetune/os/prediction/test.mrp | head -1 | jq .

Multi-task cross-corpora (i.e., multi-task using all the corpora, MT-All) training

Training and evaluation:

./examples/multitask_am/mt_all/aaec_essay.sh
./examples/multitask_am/mt_all/aasd.sh
./examples/multitask_am/mt_all/abstrct.sh
./examples/multitask_am/mt_all/cdcp.sh
./examples/multitask_am/mt_all/mtc.sh

The evaluation and prediction results can be found in the similar manner to the ST example.

QAs

  • Q: I can not reproduce exactly the same values as the paper

    • It is possible that you can not reproduce the exact values due to GPU environmental differences and so on (even using the same seed). Also, Longformer seems to behave in a non-deterministic manner: see issues/12482 in the Huggingface transformers library.
  • Q: Some corpus statistics are different from original papers

    • CDCP: Our pre-processing seems to a bit differ from other previous studies in how we handle link-transitivity. However, we have the same statistics for the test data, so we believe there is no problem in comparing performance.
    • AbstRCT: The original statistics contained an erroneously double-annotated relationship. Our statistics corrected this.
    • AASD: The published corpus appears to contain minor modifications.
    • MTC: The pre-processing method for Add relations follows previous work. See comments in the pre-processing code.

Citation

@article{10.1162/tacl_a_00481,
    author = {Morio, Gaku and Ozaki, Hiroaki and Morishita, Terufumi and Yanai, Kohsuke},
    title = "{End-to-end Argument Mining with Cross-corpora Multi-task Learning}",
    journal = {Transactions of the Association for Computational Linguistics},
    volume = {10},
    pages = {639-658},
    year = {2022},
    month = {05},
    issn = {2307-387X},
    doi = {10.1162/tacl_a_00481},
    url = {https://doi.org/10.1162/tacl\_a\_00481},
    eprint = {https://direct.mit.edu/tacl/article-pdf/doi/10.1162/tacl\_a\_00481/2022965/tacl\_a\_00481.pdf},
}