Awesome Red-Teaming LLMs [](https://awesome.re)

April 22, 2026 ยท View on GitHub

A comprehensive guide to understanding Attacks, Defenses and Red-Teaming for Large Language Models (LLMs).

Red-Teaming LLMs

Twitter Thread arXiv

Contents

Red-Teaming Attack Taxonomy

Taxonomy

Evaluation & Benchmarks

TitleLink
ATBench: A Diverse and Realistic Agent Trajectory Benchmark for Safety Evaluation and DiagnosisLink
Beyond Uniform Criteria: Scenario-Adaptive Multi-Dimensional Jailbreak Evaluation (SceneJailEval)Link
NESSiE: The Necessary Safety Benchmark -- Identifying Errors that should not ExistLink

Other Surveys

TitleLink
SoK: Prompt Hacking of Large Language ModelsLink
A Survey on Trustworthy LLM Agents: Threats and CountermeasuresLink
The Emerged Security and Privacy of LLM Agent: A Survey with Case StudiesLink

Red-Teaming

TitleLink
Red-Teaming for Generative AI: Silver Bullet or Security Theater?Link
Lessons From Red Teaming 100 Generative AI ProductsLink
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial PromptsLink
Manifold of Failure: Behavioral Attraction Basins in Language ModelsLink
Red-Teaming LLM Multi-Agent Systems via Communication AttacksLink
Red Teaming the Mind of the Machine: A Systematic Evaluation of Prompt Injection and Jailbreak Vulnerabilities in LLMsLink
Red-Teaming LLM Multi-Agent Systems via Communication AttacksLink
TRIDENT: Enhancing Large Language Model Safety with Tri-Dimensional Diversified Red-Teaming Data SynthesisLink
The State of Multilingual LLM Safety Research: From Measuring the Language Gap to Mitigating ItLink
RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS EnvironmentsLink
RRTL: Red Teaming Reasoning Large Language Models in Tool LearningLink
Capability-Based Scaling Laws for LLM Red-TeamingLink
Automated Red Teaming with GOAT: the Generative Offensive Agent TesterLink
Strategize Globally, Adapt Locally: A Multi-Turn Red Teaming Agent with Dual-Level LearningLink
Bypassing AI Control Protocols via Agent-as-a-Proxy AttacksLink
AJAR: Adaptive Jailbreak Architecture for Red-teamingLink

If you like our work, please consider citing. If you would like to add your work to our taxonomy please open a pull request.

BibTex


@article{verma2024operationalizing,
  title={Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)},
  author={Verma, Apurv and Krishna, Satyapriya and Gehrmann, Sebastian and Seshadri, Madhavan and Pradhan, Anu and Ault, Tom and Barrett, Leslie and Rabinowitz, David and Doucette, John and Phan, NhatHai},
  journal={arXiv preprint arXiv:2407.14937},
  year={2024}
}