Awesome-ML-Security

February 6, 2026 · View on GitHub

A curated list of awesome machine learning security references, guidance, tools, and more.

Table of Contents

Awesome-ML-Security

Relevant work, standards, literature

CIA of the model

Membership attacks, model inversion attacks, model extraction, adversarial perturbation, prompt injections, etc.

Reconstruction (model inversion; attribute inference; gradient and information leakage), theft of data, Membership inference and reidentification of data, Model extraction (model theft), property inference (leakage of dataset properties), etc.

Integrity

Backdoors/neural trojans (same as for non-ML systems), adversarial evasion (perturbation of an input to evade a certain classification or output), data poisoning and ordering (providing malicious data or changing the order of the data flow into an ML model).

A Systematic Survey of Backdoor Attack, Weight Attack and Adversarial Examples
Poisoning Web-Scale Training Datasets is Practical
Planting Undetectable Backdoors in Machine Learning Models
Motivating the Rules of the Game for Adversarial Example Research
On Evaluating Adversarial Robustness
Tree of Attacks: Jailbreaking Black-Box LLMs Automatically
Universal and Transferable Adversarial Attacks on Aligned Language Models
Manipulating SGD with Data Ordering Attacks
Adversarial reprogramming - repurposing a model for a different task than its original intended purpose
Model spinning attacks (meta backdoors) - forcing a model to produce output that adheres to a meta task (for ex. making a general LLM produce propaganda)
LLM Censorship: A Machine Learning Challenge or a Computer Security Problem?
Securing LLM Systems Against Prompt Injection & Mitigating Stored Prompt Injection Attacks Against LLM Applications
- Best Practices for Securing LLM-Enabled Applications
- NVIDIA NeMo Guardrails: Security Guidelines
Multi-Agent Systems Execute Arbitrary Malicious Code
Agentic Autonomy Levels and Security
Rerouting LLM Routers
Defeating Prompt Injections by Design
Arcanum Prompt Injection Taxonomy

Availability

Energy-latency attacks - denial of service for neural networks

Driving to Safety: How Many Miles of Driving Would It Take to Demonstrate Autonomous Vehicle Reliability?

LLM Alignment

When Your AIs Deceive You: Challenges with Partial Observability of Human Evaluators in Reward Learning

The Artificial Intelligence Act (proposed)

Other

Safety standards

Toward Comprehensive Risk Assessments and Assurance of AI-Based Systems
ISO/IEC 42001 — Artificial intelligence — Management system
ISO/IEC 22989 — Artificial intelligence — Concepts and terminology
ISO/IEC 38507 — Governance of IT — Governance implications of the use of artificial intelligence by organizations
ISO/IEC 23894 — Artificial Intelligence — Guidance on Risk Management
ANSI/UL 4600 Standard for Safety for the Evaluation of Autonomous Products — addresses fully autonomous systems that move such as self-driving cars, and other vehicles including lightweight unmanned aerial vehicles (UAVs). Includes safety case construction, risk analysis, design process, verification and validation, tool qualification, data integrity, human-machine interaction, metrics and conformance assessment.
High-Level Expert Group on AI in European Commission — Ethics Guidelines for Trustworthy Artificial Intelligence

Taxonomies and frameworks

NIST AI 100-2e2023
MITRE ATLAS
AI Incident Database
OWASP Top 10 for LLMs
OWASP AI Exchange - comprehensive AI security guide with 300+ pages of practical guidance on protecting AI systems
Guidelines for secure AI system development

Security tools and techniques

API probing

PrivacyRaven: runs different privacy attacks against ML models; the tool only runs black-box label-only attacks
Counterfit: runs different adversarial ML attacks against ML models
Garak

Model backdoors

Fickling: a decompiler, static analyzer, and bytecode rewriter for Python pickle files; injects backdoors into ML model files
Semgrep rules for ML

Incident	Type	Loss
Tay	Poor training set selection	Reputational
Apple NeuralHash	Adversarial evasion (led to hash collisions)	Reputational
PyTorch Compromise	Dependency confusion
Proofpoint - CVE-2019-20634	Model extraction
ClearviewAI Leak	Source Code misconfiguration
Kubeflow Crypto-mining attack	System misconfiguration
OpenAI - takeover someone's account, view their chat history, and access their billing information	Web Cache Deception	Reputational
OpenAI- first message of a newly-created conversation was visible in someone else’s chat history	Cache - Redis Async I/O	Reputational
OpenAI- ChatGPT's new Browser SDK was using some relatively recently known-vulnerable code (specifically MinIO CVE-2023-28432)	Security vulnerability resulting in information disclosure of all environment variables, including MINIO_SECRET_KEY and MINIO_ROOT_PASSWORD.	Reputational
ML Flow	MLFlow - combined Local File Inclusion/Remote File Inclusion vulnerability which can lead to a complete system or cloud provider takeover.	Monetary and Reputational
HuggingFace Spaces - Rubika	System misuse
Microsoft AI Data Leak	SAS token misconfiguration
HuggingFace Hub- Takeover of the Meta and Intel organizations	Password Reuse
HuggingFace API token exposure	API token exposure
ShadowRay - Active Cryptominer campaign against Ray clusters	Improper authentication	Monetary and Reputational
Nullbudge attacks on ML supply chain	Supply chain compromise	Monetary and Reputational

Notable harms

Incident	Type	Loss
Google Photos Gorillas	Algorithmic bias	Reputational
Uber hits a pedestrian	Model failure
Facebook mistranslation leads to arrest	Algorithmic bias

Awesome-ML-Security

Relevant work, standards, literature

CIA of the model

Confidentiality

Integrity

Availability

Degraded model performance

ML-Ops

AI’s effect on attacks/security elsewhere

Self-driving cars

LLM Alignment

Regulatory actions

US

EU

Other

Safety standards

Taxonomies and frameworks

Security tools and techniques

API probing

Model backdoors

Other

Background information

DeepFakes, disinformation, and abuse

Notable incidents

Notable harms