Awesome AI Security

July 5, 2026 · View on GitHub

Curated resources, research, and tools for securing AI systems. Managed by AISecHub. Powered by: InnovGuard

InnovGuard
Technology Risk & Cybersecurity Advisory - Innovate with Confidence, Lead with Assurance.
Schedule a meeting


Table of Contents


Best Practices, Frameworks & Controls

Governance & Management Frameworks

Controls & Verification Standards

Top 10s

Scoring & Rating Systems

Testing, Evaluation & Red Teaming

Implementation Guides & Patterns

Agentic Systems (Standards, Governance & Patterns)

Threat Modeling

  • OWASP - Multi-Agentic System Threat Modeling Guide - Applies OWASP’s agentic threat taxonomy to multi-agent systems and demonstrates modeling using the MAESTRO framework with worked examples.
  • AWS - Threat modeling your generative AI workload to evaluate security risk - Practical, four-question approach (what are we working on; what can go wrong; what are we going to do about it; did we do a good enough job) with concrete deliverables: DFDs and assumptions, threat statements using AWS’s threat grammar, mapped mitigations, and validation; includes worked examples and AWS Threat Composer templates.
  • Microsoft - Threat Modeling AI/ML Systems and Dependencies - Practical guidance for threat modeling AI/ML: “Key New Considerations” questions plus a threats→mitigations catalog (adversarial perturbation, data poisoning, model inversion, membership inference, model stealing) based on “Failure Modes in Machine Learning”; meant for security design reviews of products that use or depend on AI/ML.

Policy Templates

Acceptable Use

Org-facing, ready-to-adapt/adjust policies for secure, responsible AI (e.g., acceptable use, data classification & handling, privacy/PII & retention, model/tool approvals, human-in-the-loop, attribution & content provenance, evaluation/red teaming, incident response, and third-party/vendor risk).

Toolkits & Self-Assessments

Practical workbooks and self-assessments to baseline AI risk, evaluate third parties, and plan improvements.

Organizational Maturity & Governance

Vendor & Third-Party Risk

Privacy & Data Protection

Threat-Informed Design Reviews

Regulatory & Compliance Assessments

Control Catalogues & Mappings (use for gap-assessments)

Impact Assessments (Project/System Level)

Critical Infrastructure


Tools

Inclusion criteria (open-source tools): must have 220+ GitHub stars, active maintenance in the last 12 months, and ≥3 contributors.

Credential Isolation & Agent Access Control

Prevent credential exfiltration by ensuring AI agents never access raw API keys; inject secrets at request time via proxy gateways.

  • OneCLI GitHub Repo stars – Open-source credential vault for AI agents. Rust HTTP gateway intercepts agent requests and injects API credentials transparently; AES-256-GCM encryption, per-agent scoped tokens, full audit trail.

Prompt-Injection Detection & Mitigation

Detect and stop prompt-injection (direct/indirect) across inputs, context, and outputs; filter hostile content before it reaches tools or models.

  • (none from your current list yet) -

Jailbreak & Policy Enforcement (Guardrails)

Enforce safety policies and block jailbreaks at runtime via rules/validators/DSLs, with optional human-in-the-loop for sensitive actions.

Model Artifact Scanners

Analyze serialized model files for unsafe deserialization and embedded code; verify integrity/metadata and block or quarantine on fail.

Model Identification & Provenance (Fingerprinting)

Black-box fingerprinting to identify the underlying LLM/version behind an application or API and to support provenance verification, useful for model discovery, access control (allowlists), vendor due diligence, incident response, and audits.

  • LLMmap GitHub Repo stars – Active, black-box LLM fingerprinting: sends crafted probes to a target and classifies the underlying model/version; includes CLI and scripts.
  • TRAP (Targeted Random Adversarial Prompt) GitHub Repo stars – Black-box identification using targeted adversarial prompts to elicit model-specific behaviors; reference implementation from the paper.

Reverse Engineering

LLM-assisted decompilation and reconstruction for security analysis (malware triage, DFIR, vuln research).

  • LLM4Decompile GitHub Repo stars Its current version supports decompiling Linux x86_64 binaries, ranging from GCC's O0 to O3 optimization levels, into human-readable C source code. Our team is committed to expanding this tool's capabilities, with ongoing efforts to incorporate a broader range of architectures and configurations. (blog, arxiv).

Agent Tooling and MCP Security

Scan/audit MCP servers & client configs; detect tool poisoning, unsafe flows; constrain tool access with least-privilege and audit trails.

Honeypots & Deception (MCP/LLM)

  • Beelzebub GitHub Repo stars - Beelzebub is a honeypot framework designed to provide a secure environment for detecting and analyzing cyber attacks. It offers a low code approach for easy implementation and uses AI to mimic the behavior of a high-interaction honeypot.
  • Krawl GitHub Repo stars - Krawl is a modern, customizable web honeypot and deception server designed to detect and track malicious attackers, web crawlers, and automated scanners through fake web applications, deceptive pages, and realistic decoy content.

Tool manifest/metadata validators

Agent Identity & Trust

Servers & Dev tooling

  • PortSwigger - MCP Server GitHub Repo stars
  • ToolHive GitHub Repo stars - MCP server orchestrator for desktop, CLI, and Kubernetes Operator: discover and deploy servers in isolated containers with restricted permissions, manage secrets, use an optional egress proxy, auto-configure popular MCP clients (e.g., GitHub Copilot, Cursor), and manage at scale via CRDs/registry.
  • Polaxis MCP Server GitHub Repo stars - MCP server that wraps any Model Context Protocol agent with a 7-layer AI security firewall. Stops prompt injection, PII leakage, secret exfiltration, memory poisoning, and authority impersonation at the tool-call layer — before tools run. 99.4% avg detection rate, 0% false-block rate. [Benchmark] [polaxis.io]

Execution Sandboxing for Agent Code

Run untrusted or LLM-triggered code in isolated sandboxes (FS/network/process limits) to contain RCE and reduce blast radius.

  • E2B GitHub Repo stars - SDK + self-hostable infra to run untrusted, LLM-generated code in isolated cloud sandboxes (Firecracker microVMs).

  • microsandbox GitHub Repo stars - self-hosted microVM (libkrun) sandbox for untrusted AI/user code.

Confidential & Verifiable Inference (PCC/TEEs)

Run AI models inside attested TEEs with end-to-end encryption, auditability, and unlinkable requests so prompts and outputs never leave the secure boundary.

  • Phala (DocsGitHub) — Confidential AI cloud platform running LLMs and AI workloads in GPU TEEs (NVIDIA H100/H200). Features hardware attestation, verifiable inference, and Docker-based deployment with zero code changes. Supports both CPU TEE (Intel TDX) and GPU TEE for confidential AI at scale.
  • dstack GitHub Repo stars — Open-source SDK for building TEE-based confidential applications. Provides Docker-to-TEE deployment, remote attestation, secure key management, and TLS termination inside enclaves. Works with Intel TDX and NVIDIA GPU TEEs.
  • OpenPCC GitHub Repo stars — Open-source framework for provably private AI inference (encrypted streaming, hardware attestation, unlinkable requests), inspired by Apple's PCC and deployable on your own infra.

Gateways & Policy Proxies

Centralize auth, quotas/rate limits, cost caps, egress/DLP filters, and guardrail orchestration across all model/providers.

  • Cerbos GitHub Repo stars – open-source, policy-based authorization layer for fine-grained controls in MCP servers, RAG pipelines, and other agentic systems.

Code Review

  • Claude Code Security Reviewer GitHub Repo stars - An AI-powered security review GitHub Action using Claude to analyze code changes for security vulnerabilities.
  • Vulnhuntr GitHub Repo stars - Vulnhuntr leverages the power of LLMs to automatically create and analyze entire code call chains starting from remote user input and ending at server output for detection of complex, multi-step, security-bypassing vulnerabilities that go far beyond what traditional static code analysis tools are capable of performing.

Red-Teaming Harnesses & Automated Security Testing

Automate attack suites (prompt-injection, leakage, jailbreak, goal-based tasks) in CI; score results and produce regression evidence.

Integrated platforms

  • AI-Infra-Guard GitHub Repo stars - AI red-teaming platform: AI infra vulnerability scan (30+ components, ~400 CVEs), MCP server risk scan (9 categories), and jailbreak evaluation; web UI + Docker quick start.

Prompt-injection test suites

  • Promptmap GitHub Repo stars
  • API Relay Audit GitHub Repo stars - Local 14-step audit for AI API relays and LLM proxies covering prompt injection, model substitution, tool-call rewriting, error leakage, SSE anomalies, and Web3 wallet-risk probes.
  • Giskard GitHub Repo stars

Data-leakage/secret-exfil test suites

Jailbreak catalogs & adversarial prompts

Adversarial-robustness (evasion) toolkits

Goal-directed agent attack tasks

CI pipelines & regression gates

  • promptfoo GitHub Repo stars
  • Agentic Radar GitHub Repo stars
  • DeepTeam GitHub Repo stars
  • Buttercup GitHub Repo stars - Trail of Bits’ AIxCC Cyber Reasoning System: runs OSS-Fuzz-style campaigns to find vulns, then uses a multi-agent LLM patcher to generate & validate fixes for C/Java repos; ships SigNoz observability; requires at least one LLM API key.
  • Giskard GitHub Repo stars - Pre-deployment/CI evaluation harness for LLM/RAG: runs scan checks (prompt injection, harmful output, sensitive-information disclosure, robustness), auto-generates RAG evaluation datasets and component scores (retriever, generator, rewriter, router), exports shareable reports, and integrates with CI for regression gates.
  • ultraprobe GitHub Repo stars - Lighthouse-style CLI for AI agents: bundles prompt defense audit (17 vectors), PII detection, and SEO/AEO/AAO scanners into a single AVS Score. Pure deterministic, zero-config, MIT-licensed. Designed for CI gates and pre-deployment scoring.

Scoring/leaderboards & evidence reports

  • (none from your current list yet)

Supply Chain: AI/ML BOM and Attestation

Generate and verify AI/ML BOMs, signatures, and provenance for models/datasets/dependencies; enforce allow/deny policies.

  • (none from your current list yet)

Vector/Memory Store Security

Harden RAG memory: isolate namespaces, sanitize queries/content, detect poisoning/outliers, and prevent secret/PII retention.

  • (none from your current list yet)

Data/Model Poisoning Defenses

Detect and mitigate dataset/model poisoning and backdoors; validate training/fine-tuning integrity and prune suspicious behaviors.

Sensitive Data Leak Prevention (DLP for AI)

Prevent secret/PII exfiltration in prompts/outputs via detection, redaction, and policy checks at I/O boundaries.

  • Presidio GitHub Repo stars - PII/PHI detection & redaction for text, images, and structured data; use as a pre/post-LLM DLP filter and for dataset sanitization.

Monitoring, Logging & Anomaly Detection

Collect AI-specific security logs/signals; detect abuse patterns (PI/jailbreak/leakage), enrich alerts, and support forensics.

  • LangKit GitHub Repo stars - LLM observability metrics toolkit (whylogs-compatible): prompt-injection/jailbreak similarity, PII patterns, hallucination/consistency, relevance, sentiment/toxicity, readability.

  • Alibi Detect GitHub Repo stars - Production drift/outlier/adversarial detection for tabular, text, images, and time series; online/offline detectors with TF/PyTorch backends; returns scores, thresholds, and flags for alerting.


Attack & Defense Matrices

Matrix-style resources covering adversarial TTPs and curated defensive techniques for AI systems.

Attack

Defense


Checklists


Supply Chain Security

Guidance and standards for securing the AI/ML software supply chain (models, datasets, code, pipelines). Primarily specs and frameworks; includes vetted TPRM templates.

Standards & Specs

Normative formats and specifications for transparency and traceability across AI components and dependencies.

  • OWASP - AI Bill of Materials (AIBOM) GitHub Repo stars - Bill of materials format for AI components, datasets, and model dependencies.

Third-Party Assessment

Questionnaires and templates to assess external vendors, model providers, and integrators for security, privacy, and compliance.

  • FS-ISAC - Generative AI Vendor Evaluation & Qualitative Risk Assessment - Assessment Tool XLSXGuide PDF - Vendor due-diligence toolkit for GenAI: risk tiering by use case, integration and data sensitivity; questionnaires across privacy, security, model development and validation, integration, legal and compliance; auto-generated reporting.

Videos & Playlists

Monthly curated playlists of AI-security talks, demos, incidents, and tooling.


Newsletter

  • Adversarial AI Digest - A digest of AI security research, threats, governance challenges, and best practices for securing AI systems.

Datasets

Dataset indexes & portals

  • Kaggle - Community-contributed datasets (IDS, phishing, malware URLs, incidents).
  • Hugging Face - Search HF datasets tagged/related to cybersecurity and threat intel.
  • SafetyPrompts - living index of LLM safety datasets & evals (jailbreak, prompt injection, toxicity, privacy), with filters and a maintained sheet.
  • Awesome Cybersecurity Datasets GitHub po ars

Cybersecurity Skills

Interactive CTFs and self-contained labs for hands-on security skills (web, pwn, crypto, forensics, reversing). Used to assess practical reasoning, tool use, and end-to-end task execution.

CTF Challenges

  • InterCode-CTF GitHub Repo stars - 100 picoCTF challenges (high-school level); categories: cryptography, web, binary exploitation (pwn), reverse engineering, forensics, miscellaneous. [Dataset+Benchmark] arXiv
  • NYU CTF Bench GitHub Repo stars - 200 CSAW challenges (2017-2023); difficulty very easy → hard; categories: cryptography, web, binary exploitation (pwn), reverse engineering, forensics, miscellaneous. [Dataset+Benchmark] arXiv
  • CyBench GitHub Repo stars - 40 tasks from HackTheBox, Sekai CTF, Glacier, HKCert (2022-2024); categories: cryptography, web, binary exploitation (pwn), reverse engineering, forensics, miscellaneous; difficulty grounded by first-solve time (FST). [Dataset+Benchmark] arXiv
  • pwn.college CTF Archive GitHub Repo stars - large collection of runnable CTF challenges; commonly used as a source corpus for research. [Dataset]

Secure Code

Detection
  • Devign / CodeXGLUE-Vul GitHub Repo stars - function-level C vuln detection. [Dataset+Benchmark]
  • DiverseVul GitHub Repo stars - multi-CWE function-level detection (C/C++). [Dataset]
  • Big-Vul GitHub Repo stars - real-world C/C++ detection (often with localization). [Dataset]
  • Py150k HF Downloads - ≈150k Python snippets from GitHub (deduped, AST-parseable, permissive licenses). Used for: pretraining/fine-tuning general code models (e.g., CodeGen, CodeGen2/2.5, CodeLlama, CrystalCoder, CodeT5+); not a labeled vuln dataset. [Dataset]
Generated-Code Security (LLM code-gen eval)
  • LLMSecEval GitHub Repo stars - prompt set mapped to CWEs + secure refs; generate from each prompt and score with static analysis (e.g., CodeQL / Semgrep / Bandit) to label outputs secure vs. vulnerable and compute per-CWE metrics. Used for: benchmarking generated-code security. [Dataset] arXiv
Repair & Patch Mining
  • CVEfixes GitHub Repo stars - CVE-linked fix commits for security repair. [Dataset]
  • Also used for repair: Big-Vul (generate minimal diffs, then build + scan).
Runnable / Scanner Evaluation

Phishing

Phishing dataset gap: there isn’t a public corpus that, per page, stores the URL plus full HTML/CSS/JS, images, favicon, and a screenshot. Most sources are just URL feeds; pages vanish quickly; older benchmarks drift, so models don’t generalize well. Collect a per-URL archive of all page resources, with caveats that screenshots are viewport-only and some assets may be blocked by browser safety.

  • PhishTank - Continuously updated dataset (API/feed); community-verified phishing URLs; labels zero-day phishing; offers webpage screenshots.
  • OpenPhish - Regularly updated phishing URLs with fields such as webpage info, hostname, supported language, IP presence, country code, and SSL certificate; includes brand-target stats.
  • PhreshPhish - 372k HTML–URL samples (119k phishing / 253k benign) with full-page HTML, URLs, timestamps, and brand targets (~185 brands) across 50+ languages; suitable for training and evaluating URL/page-based phishing detection.
  • Phishing.Database - Continuously updated lists of phishing domains/links/IPs (ACTIVE/INACTIVE/INVALID and NEW last hour/today); repo resets daily-download lists; status validated via PyFunceble.
  • UCI – Phishing Websites - 11,055 URLs (phishing and legitimate) with 30 engineered features across URL, content, and third-party signals.
  • Mendeley – Phishing Websites Dataset - Labeled phishing/legitimate samples; provides webpage content (HTML) for each URL.; useful for training/eval.
  • UCI – PhiUSIIL Phishing URL - 235,795 URLs (134,850 legitimate; 100,945 phishing) with 54 URL/content features; labels: Class 1 = legitimate, Class 0 = phishing.
  • MillerSmiles - Large archive of phishing email scams with the URLs used; long-running email corpus (not a live feed).

Cybersecurity Knowledge

Structured Q&A datasets assessing security knowledge and terminology. Used to evaluate factual recall and conceptual understanding.

  • CyberMetric GitHub Repo stars - 10k MCQs via RAG from standards/books/RFCs; subsets (80/500/2k/10k).

  • SecEval HF downloads GitHub Repo stars - ~2k MCQs across 9 domains; eval kit and leaderboard on GitHub.

  • AttackQA HF downloads - 25,335 SOC/MITRE ATT&CK-grounded Q&A with rationales.

  • SECQA HF downloads - 242 MCQs (v1: 127, v2: 115), GPT-4–generated from one textbook; good for quick probes.

Secure Coding & Vulnerability Detection

Code snippet datasets labeled as vulnerable or secure, often tied to CWEs (Common Weakness Enumeration). Used to evaluate the model’s ability to recognize insecure code patterns and suggest secure fixes.

  • SecCodePLT HF Downloads

  • Py150k HF Downloads - ≈150k Python files from GitHub (deduped/fork-removed); Static analysis with Bandit, Semgrep, Snyk identified 42,753 vulnerabilities across 26,147 snippets; common CWEs: XSS (18%), SQLi (15%), Improper Input Validation (12%), OS Command Injection (10%), Information Exposure (8%). Collected from GitHub with dedup/fork removal, only parsable code (AST checks, ≤30k nodes), and permissive licenses. Used for: training and fine-tuning (e.g., CodeGen, CodeGen2/2.5, CodeLlama, CrystalCoder, CodeT5+).

  • PrimeVul GitHub Repo stars – Combines BigVul, CrossVul, CVEfixes, and DiverseVul; de-duplicated and commit-filtered for high-quality labels; temporal train/val/test split by commit time. 224,533 functions from 755 open-source projects; 6,062 vulnerable; broad CWE coverage. (arXiv)

  • CredData (Samsung) GitHub Repo stars - Labeled dataset of credential-like code lines flagged by scanners, with human GroundTruth (T/F/X) + metadata for benchmarking secret scanners.

Malware Behavior & Dynamic Analysis

  • Avast–CTU Public CAPEv2 Dataset GitHub Repo stars - 48,976 sandbox JSON reports (CAPEv2) across 10 families (Adload, Emotet, HarHar, Lokibot, njRAT, Qakbot, Swisyn, Trickbot, Ursnif, Zeus); per-sample metadata: sha256, family, type (banker, trojan, pws, coinminer, rat, keylogger), detection date. Two versions: Full (~13 GB) and Reduced (~566 MB) keeping behavior.summary + static.pe (avoids label leakage). Used for: behavior-based malware classification & concept-drift studies. - arXiv

Deepfake

Audio (Speech) Deepfakes

  • ASVspoof 5 - train / dev / eval - Train: 8 TTS attacks; Dev: 8 unseen (validation/fusion); Eval: 16 unseen incl. adversarial/codec. Labels: bona-fide / spoofed. arXiv
  • In-the-Wild (ITW) - 58 politicians/celebrities with per-speaker pairing; ≈20.7 h bona-fide + 17.2 h spoofed, scraped from social/video platforms. Labels: bona-fide / spoofed. arXiv
  • MLAAD (+M-AILABS) - Multilingual synthetic TTS corpus (hundreds of hours; many models/languages). Labels: bona-fide (M-AILABS) / spoof (MLAAD). arXiv
  • LlamaPartialSpoof - LLM-driven attacker styles; includes full and partial (spliced) spoofs. Labels: bona-fide / fully-spoofed / partially-spoofed. arXiv
  • Fake-or-Real (FoR) - >195k utterances; four variants: for-original, for-norm, for-2sec, for-rerec. Labels: real / synthetic.
  • CodecFake - codec-based deepfake audio dataset (Interspeech 2024); Labels: real / codec-generated fake. arXiv

Video Deepfakes

Jailbreak

Adversarial prompt datasets-both text-only and multimodal-designed to bypass safety mechanisms or test refusal logic. Used to test how effectively a model resists jailbreaks and enforces policy-based refusal.

  • CySecBench GitHub Repo stars cybersecurity-domain jailbreak dataset with 12,662 close-ended prompts across multiple attack categories; paper introduces an obfuscation-based jailbreaking method and LLM evals.
  • JailBreakV-28K GitHub Repo stars multimodal jailbreak benchmark with ~28k test cases (20k text-based transfer attacks + 8k image-based) to assess MLLM robustness; HF page includes a mini-leaderboard and image types.
  • Do-Not-Answer GitHub Repo stars refusal-evaluation set of 939 “should-refuse” prompts plus an automatic evaluator; answering instead of refusing can be used as a jailbreak-success signal.

Prompt Injection

Public prompt-injection datasets have recurring limitations: partial staleness as models and defenses evolve, CTF skew toward basic instruction following, and label mixing across toxicity, jailbreak roleplay, and true injections that inflates measured true positive rates and distorts evaluation.

  • prompt-injection-attack-datasetHF downloads 3.7k rows pairing benign task prompts with attack variants (naive / escape / ignore / fake-completion / combined). Columns for both target and injected tasks; train split only.
  • prompt-injections-benchmark HF downloads 5,000 prompts labeled jailbreak / benign for robustness evals.
  • prompt_injections HF downloads ~1k short injection prompts; multilingual (EN, FR, DE, ES, IT, PT, RO); single train split; CSV/Parquet.
  • prompt-injection HF downloads Large-scale injection/benign corpus (~327k rows, train/test) for training baselines and detectors.
  • prompt-injection-safety HF downloads 60k rows (train 50k / test 10k); 3-way labels: benign 0, injection 1, harmful request 2; Parquet.

System Prompts

Collections of leaked, official, and synthetic system prompts and paired responses used to study guardrails and spot system prompt exposure. Used to build leakage detectors, craft targeted guardrail tests (consent gates, tool use rules, safety policies), and reproduce vendor behaviors for evaluation.

  • Official_LLM_System_Prompts HF downloads - leaked and date-stamped prompts from proprietary assistants (OpenAI, Anthropic, MS Copilot, GitHub Copilot, Grok, Perplexity); 29 rows.
  • system-prompt-leakage HF downloads - synthetic prompts + responses for leakage detection; train 283,353 / test 71,351 (binary leakage labels).
  • system-prompts-and-models-of-ai-tools GitHub stars - community collection of prompts and internal tool configs for code/IDE agents and apps (Cursor, VSCode Copilot Agent, Windsurf, Devin, v0, etc.); includes a security notice.
  • system_prompts_leaks GitHub stars - collection of extracted system prompts from popular chatbots like ChatGPT, Claude & Gemini
  • leaked-system-prompts GitHub stars - leaked prompts across many services; requires verifiable sources or reproducible prompts for PRs.
  • chatgpt_system_prompt GitHub stars - community collection of GPT system prompts, prompt-injection/leak techniques, and protection prompts.
  • CL4R1T4S GitHub stars - extracted/leaked prompts, guidelines, and tooling references spanning major assistants and agents (OpenAI, Google, Anthropic, xAI, Perplexity, Cursor, Devin, etc.).
  • grok-prompts GitHub stars - official xAI repository publishing Grok’s system prompts for chat/X features (DeepSearch, Ask Grok, Explain, etc.).
  • Prompt-Leakage Finetune GitHub stars - adversarial attack prompts (~1,300) used to instruction-tune refusal to system-prompt extraction (synthetic + Gandalf subset).

Courses & Certifications

Career Pathways

Courses (includes labs)

Professional Certifications (exam-based)

Governance & AIMS
Risk Management
Security Management
Audit Assurance

Training

Provider Training Portals

Guided Tracks

CTFs & Challenges

Bespoke


Models

Cybersecurity-Tuned Text Generation

  • segolilylabs/Lily-Cybersecurity-7B-v0.2-GGUF HF downloads - quantized GGUF build of a 7B cybersecurity-tuned chat model.
  • DeepHat/DeepHat-V1-7B HF downloads - 7B cybersecurity-oriented text-generation model.
  • clouditera/secgpt HF downloads - cybersecurity-tuned instruction model (CN/EN) with released weights (variants incl. 1.5B/7B/14B); built on Qwen2.5-Instruct/DeepSeek-R1, Apache-2.0, supports vLLM deployment. GitHub Repo stars
  • ZySec-AI/SecurityLLM HF downloads - cybersecurity-focused chat model (“ZySec-7B”); weights available. Community GGUF quantization exists for llama.cpp.

Domain-Adapted Text LMs (Security / CTI)

Safety / Policy Classifiers (Guardrails & Moderation)

Prompt-Injection & Jailbreak Detection (Classifiers)

Code Security (Code understanding & vuln detection)

Deepfake / Anti-Spoofing (Speech)

Defense-Hardened

  • Meta-SecAlign-8B / 70B (Meta) — open-weight base models with built-in, model-level prompt-injection defense (SecAlign); 8B for lightweight use, 70B for higher capacity. PaperCodeHF 8B HF downloads 8BHF 70B HF downloads 70B

Research Working Groups

📌 (More working groups to be added.)


Communities & Social Groups


Benchmarks

Code Security

Purpose: Evaluates the correctness and security of model-generated code in realistic, production-like settings.

  • SecCodeBench GitHub Repo stars - 37 test cases / 16 CWEs; functionality-first pipeline; dynamic PoC exploits + static checks; includes LLM-as-a-Judge; Gen & Fix modes.
  • AICGSecEval GitHub Repo stars - repository-level, CVE-grounded tasks; multi-language; run scripts + leaderboard. arXiv
  • BaxBench GitHub Repo stars - 392 backend tasks (28 scenarios × 14 frameworks × 6 languages); validates functionality and executes end-to-end exploits. arXiv
  • CWEval GitHub Repo stars - simultaneous functionality+security evaluation with secure/functional oracles; Dockerized runner. arXiv

Adversarial Resilience

Purpose: Evaluates agent performance on offensive-security tasks (pentesting, exploitation, and misuse resistance) with containerized runners and reproducible scoring. NIST AI RMF Alignment: Measure, Manage

  • Measure: Identify risks related to adversarial attacks.
  • Manage: Implement mitigation strategies to ensure resilience.

Autonomous Pentesting & Exploit Generation

Used for: evaluating agents on exploit generation and patch-validated vulnerability triggering across four subtypes with containerized runners and pass/fail scoring.

CTF / Challenge Suites

Used for: time-boxed flag-capture tasks that isolate skills (web/pwn/rev/crypto/etc.) with containerized scoring.

  • NYU CTF Bench GitHub Repo stars - 200 dockerized CSAW challenges across web/pwn/rev/forensics/crypto/misc; success = flag capture. arXiv
VM-Based End-to-End Pentest

Used for: full host compromise across recon→exploit→privesc on realistic VMs with scripted scoring.

CVE App Suites / Task-Based

Used for: targeted exploit generation/execution against apps with known CVEs; measures live-system interaction.

  • CVE-Bench GitHub Repo stars - 40 dockerized web CVEs; success = expected impact triggered. arXiv
  • AutoPenBench GitHub Repo stars - 33 tasks: 22 fundamentals + 11 CVEs; controlled runner with repeatable, fine-grained scoring. arXiv
Patch-Validated Triggering

Used for: PoC inputs that crash the vulnerable build and not the patched build under fixed time/memory; sanitizer oracle.

  • CyberGym GitHub Repo stars - 1,507 instances from 188 OSS projects (via OSS-Fuzz); pre/post-patch builds with ASan/UBSan; input channels: stdin/file/argv; difficulty 0–3; pass/fail oracle. DatasetarXiv

Agent Misuse & Harm Induction

AgentHarm HF downloads human-authored harmful agent tasks for tool-using agents with benign counterparts, synthetic proxy tools, and a reproducible scoring harness; 110 base tasks (440 with augmentation), 11 categories, 104 tools. arXivBest for: measuring refusal vs completion on multi-step tool use and the impact of jailbreaks.

Purple Llama – CyberSecEval GitHub Repo stars - evaluates models’ propensity to assist cyber-offense (exploit/malware) and to generate insecure code; graded-risk tasks with a reproducible harness. Best for: dangerous-capability / misuse-risk scoring (text/IDE, non-agent).

Prompt Injection & Jailbreak Detection

Purpose: Evaluates resistance to prompt-injection and jailbreak attempts in chat/RAG/agent contexts.
NIST AI RMF Alignment: Measure, Manage

  • Lakera PINT Benchmark GitHub Repo stars Prompt-injection benchmark with a curated multilingual test suite, explicit categories (injections, jailbreaks, hard negatives, benign chats/docs), and a reproducible scoring harness (PINT score + notebooks) for fair detector comparison and regression tracking.

  • JailbreakBench GitHub Repo stars standardized jailbreak prompts + scoring harness; measures refusal/compliance and jailbreak success across models and settings.

Model & Data Integrity

Purpose: Assesses AI models for unauthorized modifications, including backdoors and dataset poisoning. Supports trustworthiness and security of model outputs.
NIST AI RMF Alignment: Map, Measure

  • Map: Understand and identify risks to model/data integrity.
  • Measure: Evaluate and mitigate risks through validation techniques.

Governance & Compliance

Purpose: Ensures AI security aligns with governance frameworks, industry regulations, and security policies. Supports auditability and risk management.
NIST AI RMF Alignment: Govern

  • Govern: Establish policies, accountability structures, and compliance controls.

Privacy & Data Protection

Purpose: Evaluates AI for risks like data leakage, membership inference, and model inversion. Helps ensure privacy preservation and compliance.
NIST AI RMF Alignment: Measure, Manage

  • Measure: Identify and assess AI-related privacy risks.
  • Manage: Implement security controls to mitigate privacy threats.

Explainability & Trustworthiness

Purpose: Assesses AI for transparency, fairness, and bias mitigation. Ensures AI operates in an interpretable and ethical manner.
NIST AI RMF Alignment: Govern, Map, Measure

  • Govern: Establish policies for fairness, bias mitigation, and transparency.
  • Map: Identify potential explainability risks in AI decision-making.
  • Measure: Evaluate AI outputs for fairness, bias, and interpretability.

Incident Response

Incident Repositories, Trackers & Monitors

Publicly Disclosed Vulnerabilities

Vulnerabilities disclosed in the last 12 months. Related attack patterns, campaigns, malware, and research items are listed separately below.

Vulnerabilities

Flaws or exploit chains in products, frameworks, features, or workflows that can be directly abused by an attacker.

NameDescriptionSourceDisclosure dateCVE(s)
EchoLeakA zero-click Microsoft 365 Copilot vulnerability that can exfiltrate sensitive data from Copilot context.Aim Security2025-05-31CVE-2025-32711
CurXecuteA Cursor vulnerability that can lead to remote code execution by writing MCP-sensitive files such as .cursor/mcp.json.GitHub Advisory2025-08-02CVE-2025-54135
MCPoisonA Cursor vulnerability where trusted MCP configurations can be modified without re-approval, enabling persistent code execution.Check Point Research2025-08-01CVE-2025-54136
LangGrinchA langchain-core serialization injection flaw that can expose secrets and enable unsafe object instantiation.Cyata2025-12-23CVE-2025-68664
BodySnatcherA ServiceNow AI Platform vulnerability that allows unauthenticated user impersonation through Virtual Agent and Now Assist flows.AppOmni2026-01-13CVE-2025-12420
RepromptA single-click Copilot Personal exploit that abuses crafted URL parameters to exfiltrate data.Varonis Threat Labs2026-01-15
ClinejectionA prompt injection and GitHub Actions cache-poisoning chain that could steal publish tokens and compromise releases.Adnan Khan2026-02-09
RoguePilotA GitHub Codespaces and Copilot exploit chain that can steal GITHUB_TOKEN and enable repository takeover.Orca Security2026-02-16
Claudy DayA Claude.ai exploit chain using invisible prompt injection, Files API abuse, and open redirects to exfiltrate chat history.Oasis Security2026-03-18

Attack Patterns

Methods, techniques, or recurring abuse models that describe how attackers achieve their goals across one or more systems.

NameDescriptionSourceDisclosure dateCVE(s)
HashJackAn indirect prompt injection technique that hides malicious instructions inside the URL fragment.Cato CTRL2025-11-25
LegalPwnA prompt injection technique that disguises malicious instructions as legal or compliance text.Pangea Labs2025-07-30
AIKatzA post-compromise technique that steals tokens and session artifacts from desktop LLM apps.Lumia Security Labs2025-11-12
Tool Poisoning AttackAn MCP attack pattern where malicious instructions are hidden in tool descriptions visible to the model.Invariant2025-04-01
MCP Rug PullAn MCP pattern where a trusted tool or server changes after approval and becomes malicious.Invariant2025-04-01
Cross-Origin EscalationAn MCP pattern where one malicious server influences or redirects trusted tools on another server.Invariant2025-04-11
ShadowMQA recurring pattern where unsafe ZeroMQ plus pickle deserialization spreads RCE risk across AI systems through code reuse.Oligo Security2025-11-13
Living Off AIA pattern where untrusted external input is later executed inside privileged internal AI workflows.Cato CTRL2025-06-19

Campaigns

Clusters of malicious activity or operations observed over time against specific targets.

NameDescriptionSourceDisclosure dateCVE(s)
Operation Bizarre BazaarAn LLMjacking campaign focused on scanning, validating, and monetizing exposed LLM and MCP infrastructure.Pillar Security2026-01-28
ShadowRay 2.0A follow-on ShadowRay campaign that turned exposed Ray clusters into cryptojacking and botnet-style operations.Oligo Security2025-11CVE-2023-48022
ClawHavocA supply chain campaign distributing malicious ClawHub and OpenClaw skills through fake prerequisites and external payloads.Koi Research2026-02-01

Malware

Backdoors, implants, or malicious programs used to execute or support attacks.

NameDescriptionSourceDisclosure dateCVE(s)
SesameOpA backdoor that abuses the OpenAI Assistants API as a command-and-control channel.Microsoft Incident Response2025-11-03

Guides & Playbooks

Regulatory Incident Reporting


Reports and Research

Vendor Reports

Research Papers

Research Feed

Industry Alliance & Nonprofit Reports

📌 (More to be added - A collection of AI security reports, white papers, and academic studies.)


Books

  • CompTIA SecAI+ Study Guide: Comprehensive Exam-Focused AI Security ReferenceView listing (2026)
  • AI Security: Three Towers to Protect the Castle: Breach-Proof AIView listing (2025)
  • Data Security in the Age of AI: A Guide to Protecting Data and Reducing Risk in an AI-Driven WorldView listing (2025)
  • AI Red Teaming: Volume 1: Foundations and MethodologyView listing (2026)
  • Generative AI Security: Theories and PracticesAmazon, Springer (2024)
  • CompTIA SecAI+ CY0-001 Study Guide: Complete Reference with Practice TestsView listing (2026)
  • The AI Cybersecurity HandbookView listing (2025)
  • Cybersecurity Strategy for the AI-Driven EraView listing (2026)
  • Agentic AI for Offensive CybersecurityView listing (2026)
  • CompTIA SecAI+ CY0-001 STUDY GUIDE 2026–2028View listing (2026)
  • CompTIA SecAI+ CY0-001 EXAM PREP 2026–2028: Comprehensive Certification Study GuideView listing (2026)
  • CompTIA SecAI+ Study Guide: Exam CY0-001View listing (2026)
  • AI-Native LLM Security: Threats, defenses, and best practices for building safe and trustworthy AIView listing (2026)
  • Securing AI Agents: Foundations, Frameworks, and Real-World DeploymentView listing (2026)
  • CompTIA SecAI+ CY0-001 Study Guide 2026–2027: Complete Exam PrepView listing (2026)
  • Cybersecurity Architect's Handbook, 2nd ed.View listing (2026)
  • Guardians of the Machine Age: Why AI Security Will DefineView listing (2026)
  • Trustworthy AI: Red Teaming, Risk and Architecture of Secure IntelligenceView listing (2026)
  • Generative AI Security: Defense, Threats, and VulnerabilitiesView listing (2026)
  • Privacy and Security for Large Language ModelsView listing (2026)
  • CompTIA SecAI+ STUDY GUIDE + WORKBOOK – 2 in 1View listing (2026)
  • Practical AI SecurityView listing (2026)
  • Artificial Intelligence and Machine Learning in Cybersecurity: A Comprehensive Guide to Improving Cybersecurity ProtocolView listing (2025)
  • AI Security Essentials: Protecting Intelligent SystemsView listing (2026)
  • Implementing Enterprise Cybersecurity With AIView listing (2026)
  • Augmented Security Operations: AI, Automation and Guardrails in Modern CybersecurityView listing (2026)
  • Agentic AI Security: Safeguarding Autonomous SystemsView listing (2026)
  • AI Security: Attack, Defense AND GovernanceView listing (2026)

Foundations: Glossary, SoK/Surveys & Taxonomies

(Core references and syntheses for orientation and shared language.)

Glossary

(Authoritative definitions for AI/ML security, governance, and risk-use to align terminology across docs and reviews.)

SoK & Surveys

(Systematizations of Knowledge (SoK), surveys, systematic reviews, and mapping studies.)

Taxonomy

(Reusable classification schemes-clear dimensions, categories, and labeling rules for attacks, defenses, datasets, and risks.)


Podcasts

  • The MLSecOps Podcast - Insightful conversations with industry leaders and AI experts, exploring the fascinating world of machine learning security operations.

Market Landscape

Curated market maps of tools and vendors for securing LLM and agentic AI applications across the lifecycle.


Blogs

Industry Leaders

Startup Blogs

A curated list of startups securing agentic AI applications, organized by the OWASP Agentic AI lifecycle (Scope & Plan → Govern). Each company appears once in its best-fit stage based on public positioning, and links point to blog/insights for deeper context. Some startups span multiple stages; placements reflect primary focus.

Inclusion criteria

  1. Startup has not been acquired
  2. Has an active blog
  3. Has an active GitHub organization/repository

Scope & Plan

Design-time security: non-human identities, agent threat modeling, privilege boundaries/authn, and memory scoping/isolation.

no startups here with active blog and active GitHub account

Develop & Experiment

Secure agent loops and tool use; validate I/O contracts; embed policy hooks; test resilience during co-engineering.

no startups here with active blog and active GitHub account

Augment & Fine-Tune Data

Sanitize/trace data and reasoning; validate alignment; protect sensitive memory with privacy controls before deployment.

Test & Evaluate

Adversarial testing for goal drift, prompt injection, and tool misuse; red-team sims; sandboxed calls; decision validation.

Release

Sign models/plugins/memory; verify SBOMs; enforce cryptographically validated policies; register agents/capabilities.

no startups here with active blog and active GitHub account

Deploy

Zero-trust activation: rotate ephemeral creds, apply allowlists/LLM firewalls, and fine-grained least-privilege authorization.

Operate

Monitor memory mutations for drift/poisoning, detect abnormal loops/misuse, enforce HITL overrides, and scan plugins-continuous, real-time vigilance for resilient operations as systems scale and self-orchestrate.

Monitor

Correlate agent steps/tools/comms; detect anomalies (e.g., goal reversal); keep immutable logs for auditability.

Govern

Enforce role/task policies, version/retire agents, prevent privilege creep, and align evidence with AI regulations.



Common Acronyms

AcronymFull Form
AIArtificial Intelligence
AGIArtificial General Intelligence
ALBERTA Lite BERT
AOCArea Over Curve
ASRAttack Success Rate
BERTBidirectional Encoder Representations from Transformers
BGMAttackBlack-box Generative Model-based Attack
CBAComposite Backdoor Attack
CCPACalifornia Consumer Privacy Act
CNNConvolutional Neural Network
CoTChain-of-Thought
DANDo Anything Now
DFSDepth-First Search
DNNDeep Neural Network
DPODirect Preference Optimization
DPDifferential Privacy
FLFederated Learning
GAGenetic Algorithm
GDPRGeneral Data Protection Regulation
GPTGenerative Pre-trained Transformer
GRPOGroup Relative Policy Optimization
HIPAAHealth Insurance Portability and Accountability Act
ICLIn-Context Learning
KLKullback-Leibler Divergence
LASLeakage-Adjusted Simulatability
LMLanguage Model
LLMLarge Language Model
LlamaLarge Language Model Meta AI
LoRALow-Rank Adapter
LRMLarge Reasoning Model
MCTSMonte-Carlo Tree Search
MIAMembership Inference Attack
MCPModern Context Protocol
MDPMasking-Differential Prompting
MLMMasked Language Model
MLLMMultimodal Large Language Model
MLRMMultimodal Large Reasoning Model
MoEMixture-of-Experts
NLPNatural Language Processing
OODOut Of Distribution
ORMOutcome Reward Model
PIPrompt Injection
PIIPersonally Identifiable Information
PAIRPrompt Automatic Iterative Refinement
PLMpre-trained Language Model
PRMProcess Reward Model
QAQuestion-Answering
RAGRetrieval-Augmented Generation
RLReinforcement Learning
RLHFReinforcement Learning from Human Feedback
RLVRReinforcement Learning with Verifiable Reward
RoBERTaRobustly optimized BERT approach
SCMStructural Causal Model
SGDStochastic Gradient Descent
SOTAState of the Art
TAGGradient Attack on Transformer-based Language Models
VRVerifiable Reward
XLNetTransformer-XL with autoregressive and autoencoding pre-training

Contributing

Contributions are welcome! If you have new resources, tools, or insights to add, feel free to submit a pull request.

This repository follows the Awesome Manifesto guidelines.


License

License: MIT

© 2025 Tal Eliyahu. Licensed under the MIT License. See LICENSE.