Chinese Entity Linking

July 29, 2020 · View on GitHub

Background

Entity linking identifies pieces of text and links them with entries in a standard database, knowledge base, gazetteer, Wikipedia page, etc. In addition to proper names (“Bob”), mentions may also include nominals (“the player”).

Example input/output.

Input:

美国国防部长马蒂斯说,与首尔举行的名为“秃鹫”的军事演习每年春天在韩国进行,但2019年将“缩小规模”。

Output:

[美国]wiki/United_States国防部长[马蒂斯]wiki/Jim_Mattis说,与[首尔]wiki/Seoul举行的名为“秃鹫”的军事演习每年春天在[韩国]wiki/South_Korea进行,但2019年将“缩小规模”。

Standard Metrics

  • F-score for selecting correct piece of text and linking it to the correct concept from a knowledge base.
  • Entity mentions that have no corresponding KB element (NIL mentions) must be clustered, with clusters evaluated by CEAF (successor to B-cubed).

TAC-KBP / EDL 2017 Track.

The NIST TAC Knowledge Base Population (KBP) Entity Discovery and Linking (EDL) track includes Chinese entity tagging for 5 types: person (PER), geo-political entity (GPE), location (LOC), organization (ORG) and facility (FAC).

Entities are linked to BaseKB (LDC2015E42: TAC KBP 2015 Tri-Lingual Entity Discovery and Linking Knowledge Base).

Data for this evaluation was prepared by the Linguistic Data Consortium (LDC).

Data for this evaluation is available from the Linguistic Data Consortium (LDC).

Test setSize (documents)Genre
TAC-KBP-EDL 2015313 (train + eval)News
TAC-KBP-EDL 2016166News
TAC-KBP-EDL 2017167News

Metrics

NERC F-score

  • Requires identifying both text-span and knowledge base ID (KB-id) of mention
  • 2016 and 2017 tasks includes both name and nominal mentions
  • Scoring code (likewise for 2015 ad 2016)

Results

SystemTAC-KBP / EDL 2015
Names
TAC-KBP / EDL 2016
Names and nominals
TAC-KBP / EDL 2017
Names and nominals
Sil et al (2018)84.4
Pan et al (2020)84.2
Pan et al (2020)81.2 (unsupervised)
Best anonymous system in shared task writeup76.976.267.8

Resources

Train and test sets are available from the Linguistic Data Consortium (LDC).


Suggestions? Changes? Please send email to chinesenlp.xyz@gmail.com