Awesome-LLM4IE-Papers
November 18, 2024 Ā· View on GitHub
š„š„š„ The article has been accepted by Frontiers of Computer Science (FCS).
Awesome papers about generative Information extraction using LLMs
The organization of papers is discussed in our survey: Large Language Models for Generative Information Extraction: A Survey.
If you find any relevant academic papers that have not been included in our research, please submit a request for an update. We welcome contributions from everyone.
If any suggestions or mistakes, please feel free to let us know via email at derongxu@mail.ustc.edu.cn and chenweicw@mail.ustc.edu.cn. We appreciate your feedback and help in improving our work.
If you find our survey useful for your research, please cite the following paper:
@article{xu2024large,
title={Large language models for generative information extraction: A survey},
author={Xu, Derong and Chen, Wei and Peng, Wenjun and Zhang, Chao and Xu, Tong and Zhao, Xiangyu and Wu, Xian and Zheng, Yefeng and Wang, Yang and Chen, Enhong},
journal={Frontiers of Computer Science},
volume={18},
number={6},
pages={186357},
year={2024},
publisher={Springer}
}
š Table of Contents
- Information Extraction tasks
- Information Extraction Techniques
- Specific Domain
- Evaluation and Analysis
- Project and Toolkit
- ā° Recently Updated Papers (After 2024/09/04, the updated papers is here~)
- āļø Datasets (with Download Link~)
š” News
- Update Logs
- The details can be find in
./update_new_papers_list. - 2024/09/04 Add 22 papers
- 2024/06/06 Add 41 papers
- 2024/03/30 Add 27 papers
- 2024/03/29 Add 20 papers
- The details can be find in
Information Extraction tasks
A taxonomy by various tasks.
Named Entity Recognition
Models targeting only ner tasks.
Entity Typing
| Paper | Venue | Date | Code |
|---|---|---|---|
| Calibrated Seq2seq Models for Efficient and Generalizable Ultra-fine Entity Typing | EMNLP Findings | 2023-12 | GitHub |
| Generative Entity Typing with Curriculum Learning | EMNLP | 2022-12 | GitHub |
Entity Identification & Typing
Relation Extraction
Models targeting only RE tasks.
Relation Classification
Relation Triplet
Relation Strict
Event Extraction
Models targeting only EE tasks.
Event Detection
Event Argument Extraction
Event Detection & Argument Extraction
Universal Information Extraction
Unified models targeting multiple IE tasks.
NL-LLMs based
Code-LLMs based
Information Extraction Techniques
A taxonomy by techniques.
Supervised Fine-tuning
Few-shot
Few-shot Fine-tuning
In-Context Learning
Zero-shot
Zero-shot Prompting
Cross-Domain Learning
Cross-Type Learning
| Paper | Venue | Date | Code |
|---|---|---|---|
| Document-level event argument extraction by conditional generation | NAACL | 2021-06 | GitHub |
Data Augmentation
Data Annotation
Knowledge Retrieval
Inverse Generation
Synthetic Datasets for Instruction-tuning
Prompts Design
Question Answer
Chain of Thought
Self-Improvement
Constrained Decoding Generation
| Paper | Venue | Date | Code |
|---|---|---|---|
| An Autoregressive Text-to-Graph Framework for Joint Entity and Relation Extraction | AAAI | 2024-03 | GitHub |
| Grammar-Constrained Decoding for Structured NLP Tasks without Finetuning | EMNLP | 2024-01 | GitHub |
| DORE: Document Ordered Relation Extraction based on Generative Framework | EMNLP Findings | 2022-12 | |
| Autoregressive Structured Prediction with Language Models | EMNLP Findings | 2022-12 | GitHub |
| Unified Structure Generation for Universal Information Extraction | ACL | 2022-05 | GitHub |
Specific Domain
Evaluation and Analysis
Project and Toolkit
| Paper | Type | Venue | Date | Link |
|---|---|---|---|---|
| ONEKE | Project | - | - | Link |
| TechGPT-2.0: A Large Language Model Project to Solve the Task of Knowledge Graph Construction | Project | Arxiv | 2024-01 | Link |
| CollabKG: A Learnable Human-Machine-Cooperative Information Extraction Toolkit for (Event) Knowledge Graph Construction | Toolkit | Arxiv | 2023-07 | Link |
Recently Updated Papers
2024/09/04
Datasets
* denotes the dataset is multimodal. # refers to the number of categories or sentences.
| Task | Dataset | Domain | #Class | #Train | #Val | #Test | Link |
|---|---|---|---|---|---|---|---|
| NER | ACE04 | News | 7 | 6202 | 745 | 812 | Link |
| ACE05 | News | 7 | 7299 | 971 | 1060 | Link | |
| BC5CDR | Biomedical | 2 | 4560 | 4581 | 4797 | Link | |
| Broad Twitter Corpus | Social Media | 3 | 6338 | 1001 | 2000 | Link | |
| CADEC | Biomedical | 1 | 5340 | 1097 | 1160 | Link | |
| CoNLL03 | News | 4 | 14041 | 3250 | 3453 | Link | |
| CoNLLpp | News | 4 | 14041 | 3250 | 3453 | Link | |
| CrossNER-AI | Artificial Intelligence | 14 | 100 | 350 | 431 | Link | |
| CrossNER-Literature | Literary | 12 | 100 | 400 | 416 | ||
| CrossNER-Music | Musical | 13 | 100 | 380 | 465 | ||
| CrossNER-Politics | Political | 9 | 199 | 540 | 650 | ||
| CrossNER-Science | Scientific | 17 | 200 | 450 | 543 | ||
| FabNER | Scientific | 12 | 9435 | 2182 | 2064 | Link | |
| Few-NERD | General | 66 | 131767 | 18824 | 37468 | Link | |
| FindVehicle | Traffic | 21 | 21565 | 20777 | 20777 | Link | |
| GENIA | Biomedical | 5 | 15023 | 1669 | 1854 | Link | |
| HarveyNER | Social Media | 4 | 3967 | 1301 | 1303 | Link | |
| MIT-Movie | Social Media | 12 | 9774 | 2442 | 2442 | Link | |
| MIT-Restaurant | Social Media | 8 | 7659 | 1520 | 1520 | Link | |
| MultiNERD | Wikipedia | 16 | 134144 | 10000 | 10000 | Link | |
| NCBI | Biomedical | 4 | 5432 | 923 | 940 | Link | |
| OntoNotes 5.0 | General | 18 | 59924 | 8528 | 8262 | Link | |
| ShARe13 | Biomedical | 1 | 8508 | 12050 | 9009 | Link | |
| ShARe14 | Biomedical | 1 | 17404 | 1360 | 15850 | Link | |
| SNAP* | Social Media | 4 | 4290 | 1432 | 1459 | Link | |
| Temporal Twitter Corpus (TTC) | Social Meida | 3 | 10000 | 500 | 1500 | Link | |
| Tweebank-NER | Social Media | 4 | 1639 | 710 | 1201 | Link | |
| Twitter2015* | Social Media | 4 | 4000 | 1000 | 3357 | Link | |
| Twitter2017* | Social Media | 4 | 3373 | 723 | 723 | Link | |
| TwitterNER7 | Social Media | 7 | 7111 | 886 | 576 | Link | |
| WikiDiverse* | News | 13 | 6312 | 755 | 757 | Link | |
| WNUT2017 | Social Media | 6 | 3394 | 1009 | 1287 | Link | |
| RE | ACE05 | News | 7 | 10051 | 2420 | 2050 | Link |
| ADE | Biomedical | 1 | 3417 | 427 | 428 | Link | |
| CoNLL04 | News | 5 | 922 | 231 | 288 | Link | |
| DocRED | Wikipedia | 96 | 3008 | 300 | 700 | Link | |
| MNRE* | Social Media | 23 | 12247 | 1624 | 1614 | Link | |
| NYT | News | 24 | 56196 | 5000 | 5000 | Link | |
| Re-TACRED | News | 40 | 58465 | 19584 | 13418 | Link | |
| SciERC | Scientific | 7 | 1366 | 187 | 397 | Link | |
| SemEval2010 | General | 19 | 6507 | 1493 | 2717 | Link | |
| TACRED | News | 42 | 68124 | 22631 | 15509 | Link | |
| TACREV | News | 42 | 68124 | 22631 | 15509 | Link | |
| EE | ACE05 | News | 33/22 | 17172 | 923 | 832 | Link |
| CASIE | Cybersecurity | 5/26 | 11189 | 1778 | 3208 | Link | |
| GENIA11 | Biomedical | 9/11 | 8730 | 1091 | 1092 | Link | |
| GENIA13 | Biomedical | 13/7 | 4000 | 500 | 500 | Link | |
| PHEE | Biomedical | 2/16 | 2898 | 961 | 968 | Link | |
| RAMS | News | 139/65 | 7329 | 924 | 871 | Link | |
| WikiEvents | Wikipedia | 50/59 | 5262 | 378 | 492 | Link |