AI Supervision Risk

March 17, 2025 · View on GitHub

This repo include the papers discussed risks and challenges in AI supervision, i.e. data synthesis and LLM-as-a-judge. Corresponding to Dawei Li (daweili5@asu.edu).
While these AI supervision paradigms bring convenience and efficiency, it's crucial to recognize that this new paradigm blurs the boundaries between models, data, and evaluation methods, posing potential risks and challenges.
Want to know more about AI supervision in the era of LLMs? Check out our paper list about LLM-based data synthesis and LLM-as-a-judge!

Model Collapse

AI models collapse when trained on recursively generated data. Nature (2024) [Paper]
A Tale of Tails: Model Collapse as a Change of Scaling Laws. Arxiv (2024) [Paper]
How Bad is Training on Synthetic Data? A Statistical Analysis of Language Model Collapse. Arxiv (2024) [Paper]
Strong Model Collapse. Arxiv (2024) [Paper]
Large Language Models Suffer From Their Own Output: An Analysis of the Self-Consuming Training Loop. Arxiv (2023) [Paper]

AI Deluge

Neural Retrievers are Biased Towards LLM-Generated Content. KDD (2024) [Paper]
Bias and Unfairness in Information Retrieval Systems: New Challenges in the LLM Era. KDD (2024) [Paper]
Spiral of Silence: How is Large Language Model Killing Information Retrieval? -- A Case Study on Open Domain Question Answering. ACL (2024) [Paper]
Invisible Relevance Bias: Text-Image Retrieval Models Prefer AI-Generated Images. SIGIR (2024) [Paper]
Perplexity Trap: PLM-Based Retrievers Overrate Low Perplexity Documents. ICLR (2025) [Paper]
Cocktail: A Comprehensive Information Retrieval Benchmark with LLM-Generated Documents Integration. ACL (2024) [Paper] [Benchmark]
Generative Ghost: Investigating Ranking Bias Hidden in AI-Generated Videos. Arxiv (2025) [Paper]
Source Echo Chamber: Exploring the Escalation of Source Bias in User, Data, and Recommender System Feedback Loop. Arxiv (2024) [Paper]

Self-bias

Self-Preference Bias in LLM-as-a-Judge. Arxiv (2024) [Paper]
LLMs as Narcissistic Evaluators: When Ego Inflates Evaluation Scores. Arxiv (2023) [Paper]
Benchmarking Cognitive Biases in Large Language Models as Evaluators. ACL (2024) [Paper]
LLM Evaluators Recognize and Favor Their Own Generations. NeurIPS (2024) [Paper]
Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge. ICLR (2024) [Paper]
Pride and Prejudice: LLM Amplifies Self-Bias in Self-Refinement. Arxiv (2024) [Paper]
Style Over Substance: Evaluation Biases for Large Language Models. Arxiv (2023) [Paper]

Preference Leakage

Preference Leakage: A Contamination Problem in LLM-as-a-judge. ArXiv (2025) [Paper]
Great Models Think Alike and this Undermines AI Oversight. ArXiv (2025) [Paper]
Peeking Behind Closed Doors: Risks of LLM Evaluation by Private Data Curators. ArXiv (2025) [Paper]

Reference

If this paper list is useful for your research, please kindly cite the following papers:

@article{li2024llmasajudge,
  title   = {From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge},
  author  = {Dawei Li and Bohan Jiang and Liangjie Huang and Alimohammad Beigi and Chengshuai Zhao and Zhen Tan and Amrita Bhattacharjee and Yuxuan Jiang and Canyu Chen and Tianhao Wu and Kai Shu and Lu Cheng and Huan Liu},
  year    = {2024},
  journal = {arXiv preprint arXiv: 2411.16594}
}

@article{tan2024large,
  title={Large language models for data annotation: A survey},
  author={Tan, Zhen and Li, Dawei and Wang, Song and Beigi, Alimohammad and Jiang, Bohan and Bhattacharjee, Amrita and Karami, Mansooreh and Li, Jundong and Cheng, Lu and Liu, Huan},
  journal={arXiv preprint arXiv:2402.13446},
  year={2024}
}

@inproceedings{dai2024bias,
  title={Bias and unfairness in information retrieval systems: New challenges in the llm era},
  author={Dai, Sunhao and Xu, Chen and Xu, Shicheng and Pang, Liang and Dong, Zhenhua and Xu, Jun},
  booktitle={Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
  pages={6437--6447},
  year={2024}
}