Awesome Red-Teaming LLMs [](https://awesome.re)
April 22, 2026 ยท View on GitHub
A comprehensive guide to understanding Attacks, Defenses and Red-Teaming for Large Language Models (LLMs).
Contents
Red-Teaming Attack Taxonomy

Evaluation & Benchmarks
| Title | Link |
|---|---|
| ATBench: A Diverse and Realistic Agent Trajectory Benchmark for Safety Evaluation and Diagnosis | Link |
| Beyond Uniform Criteria: Scenario-Adaptive Multi-Dimensional Jailbreak Evaluation (SceneJailEval) | Link |
| NESSiE: The Necessary Safety Benchmark -- Identifying Errors that should not Exist | Link |
Other Surveys
| Title | Link |
|---|---|
| SoK: Prompt Hacking of Large Language Models | Link |
| A Survey on Trustworthy LLM Agents: Threats and Countermeasures | Link |
| The Emerged Security and Privacy of LLM Agent: A Survey with Case Studies | Link |
Red-Teaming
| Title | Link |
|---|---|
| Red-Teaming for Generative AI: Silver Bullet or Security Theater? | Link |
| Lessons From Red Teaming 100 Generative AI Products | Link |
| Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts | Link |
| Manifold of Failure: Behavioral Attraction Basins in Language Models | Link |
| Red-Teaming LLM Multi-Agent Systems via Communication Attacks | Link |
| Red Teaming the Mind of the Machine: A Systematic Evaluation of Prompt Injection and Jailbreak Vulnerabilities in LLMs | Link |
| Red-Teaming LLM Multi-Agent Systems via Communication Attacks | Link |
| TRIDENT: Enhancing Large Language Model Safety with Tri-Dimensional Diversified Red-Teaming Data Synthesis | Link |
| The State of Multilingual LLM Safety Research: From Measuring the Language Gap to Mitigating It | Link |
| RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments | Link |
| RRTL: Red Teaming Reasoning Large Language Models in Tool Learning | Link |
| Capability-Based Scaling Laws for LLM Red-Teaming | Link |
| Automated Red Teaming with GOAT: the Generative Offensive Agent Tester | Link |
| Strategize Globally, Adapt Locally: A Multi-Turn Red Teaming Agent with Dual-Level Learning | Link |
| Bypassing AI Control Protocols via Agent-as-a-Proxy Attacks | Link |
| AJAR: Adaptive Jailbreak Architecture for Red-teaming | Link |
If you like our work, please consider citing. If you would like to add your work to our taxonomy please open a pull request.
BibTex
@article{verma2024operationalizing,
title={Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)},
author={Verma, Apurv and Krishna, Satyapriya and Gehrmann, Sebastian and Seshadri, Madhavan and Pradhan, Anu and Ault, Tom and Barrett, Leslie and Rabinowitz, David and Doucette, John and Phan, NhatHai},
journal={arXiv preprint arXiv:2407.14937},
year={2024}
}