Awesome LLM-generated Text Detection
December 30, 2024 · View on GitHub
The powerful ability of large language models (LLMs) to understand, follow, and generate complex languages has enabled LLM-generated texts to flood many areas of our daily lives at an incredible rate, with potentially negative impacts and risks on society and academia. As LLMs continue to expand, how can we detect LLM-generated texts to help minimize the threat posed by the misuse of LLMs?
¹ Junchao Wu, ¹ Shu Yang, ¹ Runzhe Zhan, ¹ ² Yulin Yuan, ¹ Derek Fai Wong, ¹ Lidia Sam Chao
¹ University of Macau, ² Peking University
📢 News
- [2024.11.28] ✨ Our survey paper is accepted by Computational Linguistics Journal. This survey offers a comprehensive overview of the latest advancements in LLM-generated text detection, highlighting the urgent need for more robust methods. It reviews mainstream approaches, addresses key challenges, and outlines promising future research directions. The paper serves as both a clear introduction for newcomers and a valuable resource for experts seeking updates in the field. Please refer to arXiv: A Survey on LLM-Generated Text Detection: Necessity, Methods, and Future Directions and Github Repo LLM-generated-Text-Detection for details.
- [2024.09.26] ✨ Our benchmark paper is accepted by NeurIPS 2024 D&B track. We released DetectRL, a benchmark for real-world LLM-generated text detection, provide real utility to researchers on the topic and practitioners looking for consistent evaluation methods. Please refer to arXiv: DetectRL: Benchmarking LLM-Generated Text Detection in Real-World Scenarios and Github Repo DetectRL for details.
- [2023.10.24] Our survey paper is now available on arXiv: A Survey on LLM-generated Text Detection: Necessity, Methods, and Future Directions.
- [2023.05.01] : We began to explore the topic of LLM-generated Text Detection.
🔍 Table of Contents
📃 Papers
Overview
A survey and reflection on the latest research breakthroughs in LLM-generated Text detection, including data, detectors, metrics, current issues and future directions. Please refer to our article/paper for more details.
Datasets
Benchmarks
| Benchmarks / Datasets | Use | Human | LLMs |
|---|---|---|---|
| HC3 | train | 58k | 26k |
| HC3-Chinese | train | 22k | 17k |
| CHEAT | train | 15k | 35k |
| GROVER Dataset | train valid test | 5k 2k 8k | 5k 1k 4k |
| TweepFake | train | 12k | 12k |
| GPT-2 Output Dataset | train | 250k | 250k |
| TuringBench | train | 10k | 190k |
| MGTBench | train test | 2k 563 | 13k 3k |
| ArguGPT | train valid test | 3k 350 350 | 3k 350 350 |
| DeepfakeText-Dataset | train valid test | 95k 29k 29k | 236k, 29k 28k |
| M4 | train valid test | 122k 500 500 | 122k 500 500 |
| GPABenchmark | train | 600k | 600k |
| Scientific-articles Benchmark | train test | 8k 4k | 8k 4k |
Potential Datasets
| Tasks | Datasets |
|---|---|
| Questions Answering | PubMedQA, Children book corpus (CBT), ELI5, TruthfulQA, NarrativeQA |
| Scientific writing | Peer Read, arXiv, TOEFL11 |
| Story generation | WritingPrompts |
| News Article writing | XSum |
| Web Text | Wiki40b, WebText, Avax tweets dataset, Climate Change Tweets Ids |
| Opinion statements | r/ChangeMyView (CMV) Reddit subcommunity, Yelp , IMDB Dataset |
| Comprehension and Reasoning | SciGen, ROCStories Corpora, HellaSwag, SQuAD |
Detectors
Watermark Technology
Zero-shot Methods
Fine-tuning LMs Methods
Adversarial Learning Methods
LLMs as Detector
Related Works
Other Surveys
🚩 Citation
If our research helps you, please kindly cite our paper.
@article{wu2023survey,
title={A Survey on LLM-gernerated Text Detection: Necessity, Methods, and Future Directions},
author={Junchao Wu and Shu Yang and Runzhe Zhan and Yulin Yuan and Derek F. Wong and Lidia S. Chao},
journal = {CoRR},
volume = {abs/2310.14724},
year = {2023},
url = {https://arxiv.org/abs/2310.14724},
eprinttype = {arXiv},
eprint = {2310.14724},
Contributing
Contributions are welcome! If you have any ideas, suggestions, or bug reports, please open an issue or submit a pull request. We appreciate your contributions to making LLM-generated Text Detection work even better.