2020-SemEval Task 6 Definition Extraction from Free Text with the DEFT Corpus.md

October 5, 2021 · View on GitHub

2020-SemEval Task 6: Definition Extraction from Free Text with the DEFT Corpus

  • 任务简介

  • 时间:2019.8~2020.3

  • 数据示例

    1. 句子分类:给定一个句子,判断该句子里是否包含定义
    2. 序列标注:根据给定的tag schema用BIO标记每个词。已知前四列,预测第五列Tag。
    3. 关系抽取:给定relation schema和序列标注结果,标记出tag之间的关系。已知前六列信息,预测Root_ID和relation。

    数据示例

    • Token:句子里的单词
    • Source:标识当前句子来源于哪篇文章
    • Start/End:单词在文章中的起始位置
    • Tag:tag schema中的标签,符合BIO标注格式
    • Tag_ID:Tag标签的唯一标识,如果是O标签,则为-1
    • Root_ID:当前Tag_ID所关联的Tag_ID
    • Relation:relation schema中的关系
  • 数据说明

    数据总共有215个文件,包含26552个句子

    traindevtest下载
    806867
  • 竞赛方案

    task1 方案 / rankF1说明代码
    50.8444Multi-task BERT×
    60.8304RoBERTa + Stochastic Weight Averaging×
    120.8077Joint classification and sequence labeling pre-trained model with MLP and CRF layer×
    160.8007BERT with two-step fine tuning×
    180.7971BERT with BiLSTM + attention×
    260.7885XLNet×
    320.7772RoBERTa with finetuning
    400.7593BERT with fine-tuned language model
    410.7555*
    460.7109FastText and ELMo embeddings with RNN ensemble×
    470.6851Concatenated GloVe and on-the-fly POS embeddings with BiLSTM and 1D-Conv + MaxPool layers×
    task2 方案 / rankMacro-F1说明代码
    230.5233BERT×
    270.4968CRF tagger
    340.4589XLNet - large×
    370.4398RoBERTa + CRF with finetuning
    460.2577*
    task3 方案 / rankMacro-F1说明代码
    1(知乎paper1.0BERT + hand crafted rules×
    40.9943Random Forest×
  • 推荐资料

    官方总结