BackdoorAgent: A Unified Framework for Backdoor Attacks on LLM-based Agents

March 16, 2026 · View on GitHub

A research-focused benchmark for studying backdoor behaviors in agentic LLM systems

Overview

BackdoorBench is a modular codebase for evaluating backdoor behaviors across multiple agentic tasks (e.g., QA, web navigation, autonomous driving planning, code/medical agents). It provides:

Unified task runner with YAML + CLI configuration.
Multiple backdoor attack implementations (e.g., agentpoison, trojanrag, demonagent, badchain).
Task-specific pipelines with structured logging and result artifacts.
Reproducible experiment setup via config-driven overrides.

Repository Structure


.
├── attack/                 # Attack implementations
├── configs/                # Default + task-specific configs
├── runs/                   # Entry points and run scripts
├── tasks/                  # Task-specific pipelines (agent_qa, agent_web, agent_driver, agent_code)
├── llm_client.py           # Unified LLM client wrapper
├── utils.py                # Utilities (merging configs, printing, etc.)
└── result/                 # Outputs (created at runtime)

Requirements

Python 3.9+
Core dependencies typically include:
- openai
- torch
- transformers
- tqdm
- tenacity

Quick Start

1) Configure API access

Edit configs/default.yaml with your API key and endpoint:

openai:
  api_key: "<YOUR_KEY>"
  api_url: "<YOUR_ENDPOINT>"

2) Run a task

python runs/run.py --task agent_qa --attack normal --model qwen3-max

3) Explore outputs

Results and logs are written under:

result/<task>/<attack>/

Tasks

Task	Description	Entry Module
`agent_qa`	StrategyQA-style QA with retrieval	`tasks/agent_qa`
`agent_web`	Web navigation agent	`tasks/agent_web`
`agent_driver`	Autonomous driving planning	`tasks/agent_driver`
`agent_code`	Code/medical coding agent	`tasks/agent_code`

Attacks

Attack methods are configured in configs/task_configs/<task>.yaml. Examples include:

agentpoison
trojanrag
demonagent
badagent
badchain
advagent

Each attack exposes tunable parameters such as trigger sequences, poisoned ratios, and target keywords.

Configuration System

Configuration is composed from:

configs/default.yaml
configs/task_configs/<task>.yaml
CLI overrides (e.g., --task, --attack, --model)

Configs are merged at runtime by runs/run.py.

Example Experiments

Run a batch of attacks for agent_code:

bash run.sh

Run individual attacks:

python runs/run.py --task agent_driver --attack poisonedrag --model qwen3-max
python runs/run.py --task agent_qa --attack badchain --model qwen3-max

Reproducibility Notes

Seed handling and dataset splits are task-specific.
If you introduce new models, update runs/run.py and task configs as needed.
Large runs can be parallelized, but ensure output paths do not collide.

Citation

If you use this repository in academic work, please cite the corresponding paper (if applicable):

@article{feng2026backdooragent,
  title={BackdoorAgent: A Unified Framework for Backdoor Attacks on LLM-based Agents},
  author={Yunhao Feng, Yige Li, Yutao Wu, Yingshui Tan, Yanming Guo, Yifan Ding, Kun Zhai, Xingjun Ma, and Yu-Gang Jiang},
  journal={arXiv preprint arXiv:2601.04566},
  year={2026}
}

License

This project is licensed under the Apache License 2.0. See the LICENSE file in the repository root for details.

Note: Apache-2.0 permits commercial use, modification, and distribution, provided you follow the license terms (e.g., preserving copyright notices).

Acknowledgements

We thank the community for open-source tooling that enables reproducible research in LLM safety and evaluation.