3. Install specific tools by name.

June 30, 2026 · View on GitHub

pentest-ai

The pentest tool that proves its findings. No oracle, no badge.

Website · Install · Why verification · Docs · Benchmarks · Agents · Discord

⚠️ Offensive tooling, authorized testing only. By installing you accept the AUP and Terms. Full text in Responsible use ↓

ptai is an AI-driven pentest tool that re-runs every exploit to confirm it. It runs recon, logs in, and chains findings into multi-step attack paths, but it does not ask you to trust the results. The way TruffleHog confirms a leaked secret by logging in with it, ptai confirms a web finding by re-running the exploit: a finding stays a candidate until a machine oracle reproduces it N out of N, and only then does it earn a VERIFIED badge. Third-party scanner output (nuclei, nikto, zap) is held back until an oracle re-proves it. Scanner noise is what trains teams to ignore their tools, so the report carries only what ptai could prove, each VERIFIED finding with a portable proof capsule you can replay yourself.

Today 14 vulnerability classes are oracle-verified. On a deliberately-vulnerable test honeypot, 23 findings verify across those classes at 100% precision with zero false positives. On a stock OWASP Juice Shop, 12 verify in a single scan. Runs on your laptop. No cloud, no telemetry.

See it work

ptai scanning OWASP Juice Shop: findings flip from candidate to oracle-VERIFIED, 12 verified in one scan

Scanning a stock OWASP Juice Shop: 12 findings oracle-verified in a single scan. Findings are real; timing is paced for watchability.

Reproduce the core idea yourself in two minutes, no target of your own:

pip install ptai && ptai demo

ptai demo scans a bundled vulnerable app and reports 4 findings, 4 oracle-VERIFIED, replays one live from a proof capsule (replay 3/3), then runs the same routes hardened and reports 0 findings. The only thing that changed between the two runs is the fix, so the findings appear and disappear with the vulnerability, not because the tool went quiet. Two minutes, no API key, no target of your own. Re-prove any capsule yourself with ptai replay.

Honest numbers. The honeypot run (23 verified across 14 classes, 100% precision, zero false positives) and the Juice Shop run (12 verified in one scan) are individual reproducible benchmarks, not field false-positive rates. The oracle gate buys precision, not catch rate: it removes false positives, it does not raise detection. Juice Shop is the most-studied vulnerable app on the internet, so read its raw volume as breadth and the verified count as the precision story; the honeypot, with bugs we wrote ourselves, is the honest signal. The honeypot harness (tests/honeypot/) and a clean-app zero-FP gate (tests/cleanapp/) ship in the repo, so the claims are reproducible rather than screenshots.

What's new in 1.1.0

Verification coverage roughly doubled, and a scan no longer reports zero on a target it knocked over mid-run. Every VERIFIED finding comes from a named machine oracle, never an LLM assertion, enforced in code: a verdict that cannot name its oracle is rejected. This release adds:

Ten new oracle classes (14 total). Trusted-header bypass, JWT alg:none, host-header poisoning, XXE, type confusion, stored XSS, sequential IDOR, mass assignment, non-blind SSRF, and SQLi login-bypass, joining SQLi (boolean/blind), BOLA/IDOR, reflected XSS, open redirect, and path traversal. Each oracle has a control that must fail on a safe target, so a non-vulnerable app abstains instead of earning a badge.
Verification resilience. An aggressive sweep could knock a fragile single-container target over, after which the verify phase failed every oracle and reported 0 despite valid, replayable recipes. It now waits for the target to answer again before re-proving, which took an OWASP Juice Shop scan from 0 to 12 oracle-verified.
Scope safety. Active tools (sqlmap, dalfox) are host-locked to the engagement target; the scan no longer feeds third-party URLs scraped from a page's content to attack tools.
Portable proof capsules with ptai replay, a live TUI that flips verdicts to VERIFIED on screen, and a CI gate (--fail-on verified) that breaks a build only on proven findings.

On a real target: OWASP Juice Shop

Pointed at a stock OWASP Juice Shop, ptai oracle-verifies 12 findings in a single scan: JWT alg:none accepted on protected endpoints, BOLA cross-user reads, sequential IDOR, and type confusion, each re-proven by a machine oracle, not asserted. It detects more than it verifies (SQLi auth-bypass on /rest/user/login, UNION SQLi on /rest/products/search, XXE disclosing /etc/passwd, mass assignment, password-reset bypass); only the verified subset reaches the report. Drive it through Claude Code over MCP with no API key, or standalone.

Honesty caveat. Juice Shop is the most-documented vulnerable app on the internet, so the LLM and the probe authors both have a head start. Against a novel target the catch rate is whatever the curated probe library covers (60+ web probes today, growing each release); the LLM coordinates and reasons about results, it doesn't replace the probes. A private honeypot harness in tests/honeypot/ measures coverage against bugs we wrote ourselves and is asserted in CI (tests/honeypot/test_mcp_honeypot_e2e.py); its numbers are lower than Juice Shop, and that's the point. We publish both. See the full Juice Shop benchmark vs ZAP / Nuclei / HexStrike.

Install

pip install ptai

Path 1: Drive it from Claude Code (no API key)

If you already pay for Claude Pro / Max / Team, your subscription IS the LLM. Wire ptai in as an MCP server:

claude mcp add pentest-ai -- ptai mcp

Restart Claude Code, then ask:

"Run an authenticated pentest against staging.acme.com. Login is at /login, password is in $APP_PASS."

What hits the network: ptai's tools and probes execute locally against your target. Your prompts and the tool output that Claude Code reads go through Anthropic's API, same as any Claude Code session. If you need an air-gapped path, see Path 3 (Ollama / on-prem LLM).

Claude Code drives ptai via these MCP tools (47 of them today):

list_tools / run_tool: list and invoke any of 200+ wrapped security tools
plan_tools / ensure_tools_installed: get the canonical tool list for an engagement, batched install
list_probes / run_probe: 60 SPA-aware probes for OWASP Top 10 bug classes
http_request: raw HTTP under a hard scope guard for novel chains
start_engagement / get_findings / get_attack_chains: the engagement record
plus test_web_app, test_active_directory, test_cloud, test_api_security, and the rest

Path 2: Other MCP clients (Cursor, VS Code Copilot, Codex, Claude Desktop)

ptai setup --mcp

Auto-detects every MCP-compatible client you have installed and writes their config files. Restart the client and the same 47 tools are there.

Path 3: Standalone CLI when you DON'T have an MCP client

If you're using Claude Code, Cursor, Codex, or Claude Desktop, use Path 1 or 2 above and skip this section. No API key needed there.

Path 3 is for CI/CD pipelines, scheduled cron jobs, air-gapped terminals, and users without an MCP client. The standalone CLI has no LLM of its own, so you bring one via env var:

export ANTHROPIC_API_KEY=sk-ant-...           # Claude (best results)
# or
export OPENAI_API_KEY=sk-...                  # OpenAI
# or, fully local, no cloud
export PENTEST_AI_LLM_PROVIDER=ollama         # Ollama (default localhost:11434)
# or, any of 300+ models via LiteLLM (OpenRouter, Azure, DeepSeek, Groq, Mistral, ...)
pip install litellm

ptai start https://your-target.com

Hitting an OpenAI-compatible endpoint (DeepSeek cloud, Groq, Together AI, vLLM, etc.)? Set OPENAI_BASE_URL + PENTEST_AI_MODEL and use the openai provider. Full recipes for every provider - including custom model names, troubleshooting, and the LiteLLM-300+ list - live in docs/llm-providers.md.

Spending cap (Path 3 only)

The standalone agent loop drives its own LLM, so runaway loops cost real money. ptai caps spend per engagement at $10 USD by default. A normal Sonnet 4.6 web-app sweep with prompt caching finishes well under that; an Opus 4.7 deep run can blow past it.

Change it via env var (no CLI flag - env var is the only knob):

export PTAI_PRICE_LIMIT=25        # raise to \$25
export PTAI_PRICE_LIMIT=0         # unlimited (logs a warning)
unset PTAI_PRICE_LIMIT            # back to the \$10 default

If the cap fires mid-engagement, the engagement is marked aborted_cost_limit and its checkpoint is preserved. Raise the cap and resume from where it stopped:

export PTAI_PRICE_LIMIT=25
ptai resume <engagement_id>

Paths 1 and 2 (MCP) don't use this cap - your AI client (Claude Code, Cursor, etc.) handles its own LLM billing.

Installing security tools

ptai wraps 200+ external tools. Three ways to get them on the box:

# 1. Zero-config (recommended). At engagement start, the planner predicts
#    which tools the LLM will need and asks ONCE to install the missing
#    ones. Decline once and the answer persists in
#    ~/.pentest-ai/install-preferences.json.
ptai start https://target.example.com

# 2. Batch install upfront. Skips the engagement-time prompt entirely.
ptai setup --tier core            # ~6 essentials, ~30s
ptai setup --tier recommended     # + fuzzers, crawlers, password tools, ~5m
ptai setup --tier full            # everything, ~30m

# 3. Install specific tools by name.
ptai setup --per-tool wpscan,dalfox,paramspider
ptai setup --wizard               # interactive picker

In non-interactive contexts (PTAI_NON_INTERACTIVE=1 or no TTY) ptai uses what's on PATH and logs (rather than prompts) for anything missing.

Other paths: REST API, MCP composition, HITL teleoperation, cloud workspace, public benchmarks

HTTP REST API (for dashboards and integrations)

pip install ptai[api]
ptai serve --port 8888

Endpoints: /health, /version, /agents, /tools, /engagements (list, detail, findings, chains, detection rules, SARIF export). Write endpoints (POST /engagements, POST /engagements/{id}/abort) require Authorization: Bearer $PENTEST_AI_API_TOKEN. Live event stream at WS /engagements/{id}/stream.

Load other MCP servers as tool sources

Compose with hexstrike or any other MCP-compatible security server. Edit ~/.pentest-ai/mcp_servers.json:

{
  "servers": [
    {"name": "hexstrike", "command": "python3 hexstrike_mcp.py", "transport": "stdio"}
  ]
}

Take over mid-run (HITL teleoperation)

While an engagement is running, press Ctrl+C twice within 600ms to pause the orchestrator and drop into a REPL: step, inspect findings, inject <instruction>, skip, resume, abort. Current LLMs aren't fully autonomous. The operator owns the call when it matters.

Public benchmarks

Reproducible solve-rate measurements live in benchmarks/:

./benchmarks/scripts/run_all.sh   # writes JSON per run + RESULTS.md

Spec, harness, results all in git. The full Juice Shop comparison vs ZAP / Nuclei / HexStrike is at docs/benchmarks/juice-shop.md. No "98.7% detection rate" claims you can't audit.

Cloud workspace (Pro / Team / Enterprise)

The CLI is free forever and stores everything locally. If you want engagement history, branded client-ready PDF reports, and team collaboration, link the CLI to an app.pentestai.xyz workspace:

# Sign up, then Dashboard -> API Keys -> Generate -> copy ptai_...
ptai auth login        # paste the key (hidden prompt)
ptai auth status       # confirm link
# or use an env var for CI:
export PENTESTAI_API_KEY=ptai_...

ptai start auto-syncs findings to your cloud workspace when authed. No cloud = no calls; integration is silently off unless you log in.

No LLM at all (interactive launcher)

ptai menu

Numeric category navigation, search (/term), tag filtering (t web), keyword-based recommendation. Real engagements still go through ptai start with full scope confirmation.

Why it's different


🤖 LLM-coordinated, not LLM-dependent	Seventeen agents cover recon, web, API, AD, cloud, mobile, wireless, browser, credentials, privesc, vuln scan, chaining, PoC, detection, report, social engineering, and LLM red team. The LLM runs the phase loop and reasons about results; bug detection is in the curated deterministic probe library. Set no API key and the same probes still run. The LLM coordinates; it doesn't scan.
🔓 No API key on the MCP path	Claude Code / Cursor / Codex users drive ptai through MCP using their existing subscription. 200+ tool wrappers and 60 probes are LLM-callable without an Anthropic key. The standalone CLI (`ptai start --agent-mode`) is where the API key matters; that's the Codex-without-MCP, CI, and air-gapped paths.
🔐 It logs in	Most scanners die at the login page. This one holds a session, refreshes credentials when they expire, and every downstream tool inherits the cookie. Auth profiles store references (env vars, `op://`, Vault paths, AWS Secrets Manager ARNs), never the value.
🧪 Every finding is proven	A non-destructive proof of concept runs against the target. No more triaging 40 maybes from a noisy scanner.
⚡ CI-native	GitHub Action, severity gates, SARIF output, PR comments. Drop it into your workflow file and it runs on the next PR.
💾 Runs on your laptop	MIT licensed, no cloud calls. Runs offline with Ollama. Findings stay on your disk.

How it works

┌─────────────────────────────────────────────────────────────┐
│                    ptai start <target>                      │
└─────────────────────────────────────────────────────────────┘
                             │
          ┌──────────────────┼──────────────────┐
          ▼                  ▼                  ▼
      ┌────────┐        ┌────────┐        ┌─────────┐
      │ recon  │   ->    │  auth  │   ->    │   web   │
      └────────┘        └────────┘        └─────────┘
                                               │
          ┌────────────────────────────────────┤
          ▼                                    ▼
      ┌────────┐                          ┌─────────┐
      │   ad   │   ┌──────────────────┐   │ cloud   │
      └────────┘   │  Findings DB     │   └─────────┘
          │        │  (sqlite + evidence)│       │
          └───────▶│  scope-guarded     │◀──────┘
                   │  deduplicated      │
                   └──────────────────┘
                             │
                ┌────────────┼────────────┐
                ▼            ▼            ▼
           ┌──────┐    ┌─────────┐  ┌──────────┐
           │chain │    │validate │  │ detect   │
           └──────┘    └─────────┘  └──────────┘
                             │
                             ▼
                       ┌──────────┐
                       │  report  │   md · html · pdf · SARIF · JUnit
                       └──────────┘

Each agent runs with an LLM when you've set a key, or as a deterministic tool loop when you haven't. Either way the phase order is the same.

Agents

Agent	Phase	Does
`recon`	1	Port scan, DNS and subdomain enum, service fingerprinting
`web`	2	Authenticated OWASP Testing Guide v4 pass
`api_security`	2	OpenAPI/GraphQL/REST surface analysis, OWASP API Top 10
`browser`	2	Playwright-driven DOM analysis, XHR capture, security-header grading
`ad`	3	AD enum, Kerberoasting, BloodHound pathfinding, delegation abuse
`cloud`	4	AWS, Azure, GCP IAM, misconfig, K8s RBAC, serverless
`credential_tester`	4	Password spraying, credential stuffing, MFA bypass checks
`privesc`	5	Local and lateral privilege-escalation advice from collected context
`vuln_scanner`	5	Cross-cutting vuln aggregation against the findings DB
`exploit_chain`	6	Correlates findings into multi-step attack paths
`poc_validator`	7	Non-destructive proof of concept per finding
`detection`	8	Sigma, SPL, KQL rules for the blue team
`report`	9	Markdown, HTML, PDF, SARIF, JUnit, compliance maps
`llm_redteam`	opt	OWASP LLM Top 10 probes
`social_engineer`	opt	Phishing corpus and pretext generation
`mobile`	opt	Android/iOS static + dynamic checks
`wireless`	opt	Wireless reconnaissance and handshake capture

Playbooks

Your methodology as a file. Checked into git. Shared with your team.

name: internal-ad-pentest
inputs:
  domain: { required: true, prompt: "AD domain" }
  dc_ip:  { required: true, prompt: "DC IP" }

phases:
  - id: recon
    tools: [nmap, masscan]

  - id: ad-enum
    depends_on: [recon]
    condition: "any_finding(type='open_port', port=445)"
    tools: [enum4linux, ldapsearch, bloodhound-python]

  - id: kerberoast
    requires_finding: { type: ad_user_enumerated }
    tools: [impacket-getuserspns]
    llm_decide: true         # let the LLM skip if context says useless

ptai playbook list                  # show installed playbooks
ptai playbook show web-app-quick    # preview before running
ptai playbook run ./my-ad.yaml      # execute

Five playbooks ship built-in. A community catalog is coming.

Drop it into your CI

# .github/workflows/security.yml
name: Security scan
on: [pull_request]

jobs:
  ptai:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: pip install ptai
      - run: |
          ptai start ${{ vars.STAGING_URL }} \
            --ci \
            --fail-on high \
            --sarif pentest.sarif
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
      - uses: github/codeql-action/upload-sarif@v3
        if: always()
        with:
          sarif_file: pentest.sarif

Findings post as a PR comment, SARIF uploads to GitHub Code Scanning, and the build fails on gated severity. GitLab CI and Jenkins templates plus advanced options (auth profiles in CI, cost gates, scope files) -> docs/ci-cd.md.

Benchmarks

ptai's design is purpose-built for SPA pentesting with curated probe coverage. On OWASP Juice Shop, the published 4-tool matrix showed:

Tool	Findings	Critical+High	OWASP Top 10 buckets	FP rate
ptai 0.13.0	88	46	5	0%
ZAP 2.17.0	593	0	1	47%
Nuclei 3.8.0	1	0	1	0%
HexStrike v6.0	11	0	1	-

n=1 single-rater, single-shot. Methodology + raw artifacts in benchmarks/results/2026-05-12/juice-shop/. The honest read: ptai is better at SPA web pentests with curated probe coverage. HexStrike is broader (cloud, binary, CTF) and likely beats ptai on traditional crawlable surfaces like WordPress. Future releases will widen the comparison.

Recent research context: fully autonomous LLM-pentest agents finish 21-31% of tasks end-to-end; human-assisted setups reach 64% (ARTEMIS, DARPA AICC Atlantis, xOffense). ptai is built for the human-assisted regime: the LLM reasons about results, the curated probes detect, and Ctrl+C twice lets the operator take over.

vs the field

	`ptai`	Hexstrike	ZAP	Nuclei	Burp Pro	PentestGPT
LLM-driven via MCP (no API key)	✓	✓
LLM-synthesized HTTP under scope guard	✓	partial
Authenticated scanning via MCP	✓	partial	partial	raw HTTP	✓
Exploit chaining	✓	partial				partial
Non-destructive PoC validation	✓				partial
Stored injection chains (POST -> GET verify)	✓	manual	partial		manual
Curated probes (specialised, not template-driven)	60	tool-wrapper-driven	rule-driven	8000+ templates	manual + scan	-
Wrapped CLI security tools	200+	150+	-	-	-	-
Tool install wizard	core/recommended/full + per-tool	-	n/a	n/a	n/a	-
Smart install at engagement start	✓
CI-native (SARIF + severity gates)	✓		partial	partial	partial
LLM red team probes	✓
YAML playbooks	✓			templates
License	MIT	MIT	Apache-2.0	MIT	commercial	MIT

What's inside

17 agents across recon, web, API security, AD, cloud, mobile, wireless, browser, credential testing, privilege escalation, vuln scanning, exploit chaining, PoC validation, detection, reporting, LLM red team, social engineering
60 curated web probes covering OWASP Top 10 + API Top 10
200+ tool wrappers with auto-install: nmap, masscan, nuclei, ffuf, sqlmap, gobuster, wapiti, nikto, dalfox, xsstrike, wpscan, hydra, hashcat, enum4linux, bloodhound-python, the impacket suite, trufflehog, gitleaks, kube-hunter, trivy, prowler, scout-suite, and more
4000+ Nuclei templates integrated for atomic vulnerability detection
47 MCP tools for LLM-driven engagements, including plan_tools / ensure_tools_installed that let the outer LLM batch-install tools without an Anthropic API key
300+ LLM models via the LiteLLM provider (Anthropic, OpenAI, Ollama direct; Azure, OpenRouter, DeepSeek, Groq, Mistral, Together AI, Bedrock, Vertex AI, Cohere via LiteLLM)
HTTP REST API + WebSocket surface (ptai serve) for non-MCP integrations
Local web dashboard with live engagement view, findings table, attack chain visualization, SARIF export
Browser automation agent with screenshot capture, DOM analysis, network capture, security header grading (Playwright-driven)
Human-In-The-Loop teleoperation (Ctrl+C twice to take over an engagement mid-run)
MCP client capability to load external MCP servers as tool sources
Public reproducible benchmark harness in benchmarks/. Numbers, code, raw artifacts, all in git.
6 output formats: Markdown, HTML, PDF, SARIF 2.1.0, JUnit XML, compliance mappings (OWASP, CWE, CVE, CVSS v3.1)
2,400+ tests with CI on Python 3.10, 3.11, 3.12, 3.13
MIT licensed, 100% yours

Who uses it for what

AppSec teams. Wire ptai into your CI. Every PR against staging gets an authenticated scan. The build fails on high-severity findings. The fix -> retest -> confirm loop runs on its own.

Consultants. Set up a week-long engagement, point ptai at the target list, and spend your time on the parts that need a human: analyzing findings, picking chains to demonstrate, talking to the client. The report writes itself.

Bug bounty hunters. Run it over breakfast. Come back to a list of validated findings with PoCs ready to paste into HackerOne.

Red teamers. Encode your AD methodology as a YAML playbook. Every new engagement runs it. Same methodology, shared across the team.

Claude Code / Cursor / Codex users. Add ptai as an MCP server. Ask your assistant to run a scan in plain English. Your existing subscription pays for the LLM; ptai supplies the tools.

Developers shipping AI features. Enable --enable-llm-redteam against your chatbot. Get an OWASP LLM Top 10 report in minutes.

Responsible use

pentest-ai is offensive security tooling. It executes real network and host operations against the targets you specify. You are solely responsible for ensuring you have explicit, written authorization to test every target.

By installing or running ptai you agree to the Acceptable Use Policy and the Terms of Service. Testing systems you do not own without written authorization may violate the Computer Fraud and Abuse Act, the Computer Misuse Act 1990, GDPR Article 32, and equivalents in your jurisdiction. Misuse is your sole responsibility.

First-run prompts you to confirm AUP acceptance and persists the choice to ~/.pentest-ai/aup-consent.txt. Set PENTEST_AI_AUP_ACCEPTED=1 in CI to bypass the prompt non-interactively.

On startup ptai loads a scope file. Out-of-scope hosts are refused at tool-invocation time. PoCs are non-destructive by default. Rate limits kick in automatically in stealth mode. Don't be that person.

Out-of-band callbacks (OAST) - privacy

ptai detects blind vulnerability classes (blind SSRF, blind SQLi, blind XXE, blind stored XSS, SSTI, Log4Shell) by emitting payloads that, when fired server-side, ring an out-of-band collaborator. By default, callbacks route to ProjectDiscovery's public oast.fun infrastructure.

What lands on the collaborator and who can read it. Each engagement generates a fresh RSA-2048 keypair in your local ptai process. Interaction payloads (raw HTTP requests, DNS queries, SMTP envelopes received by the collaborator) are AES-CTR-256 encrypted at rest server-side, with the AES key wrapped in RSA-OAEP-SHA256 using your engagement's public key. Only the holder of the matching private key - your local ptai process - can decrypt them. ProjectDiscovery (or whoever runs the collaborator) cannot read interaction contents. However, metadata is server-visible: the fact that an interaction happened, source IP of the calling target, timestamp, and protocol.

When to self-host. PortSwigger explicitly forbids public-Burp-Collaborator use in their bug bounty rules of engagement, and large enterprise programs (Meta, Apple, finance) increasingly require that callback infrastructure terminate on tester-controlled hosts. For paid engagements, run your own Interactsh server (Apache-2.0, single Go binary) and point ptai at it:

ptai start http://target --oast-server https://oast.example.com --oast-token <T>

To disable OAST entirely:

ptai start http://target --no-oast

Blind-vuln classes will not be detected when OAST is off; in-band detection paths (size delta / SQL error markers / metadata signatures / time-based) still run.

Ecosystem

Repo	What
pentest-ai	This repo. The CLI and MCP server. Python product.
pentest-ai-agents	Standalone Claude Code subagent markdown files. Optional, runs without this CLI.

Need shared workspaces, branded PDF reports, SSO, or a managed engagement? The website has Pro / Team / Enterprise dashboards and a one-shot Launch Engagement option. The OSS tool stays OSS, free forever.

Community

Discord: join the server. Chat, get help, share findings, lurk.
Questions, ideas, feedback: GitHub Discussions
Bug reports: GitHub Issues
Show and tell: post the wildest finding ptai gave you in Show and tell

FAQ

Do I need an API key? No on the MCP path. If you drive ptai from Claude Code, Cursor, Codex, or Claude Desktop, your existing subscription is the LLM. You only need a key on the standalone CLI (Path 3), and even there you can run fully local with Ollama. See Install.

Is it really autonomous, or do I babysit it? You stay in the loop. ptai is LLM-coordinated, not autonomous - the curated probes do the detection, the LLM reasons about results, and you own the call. Press Ctrl+C twice mid-run to take over. Fully-autonomous LLM agents finish 21-31% of pentest tasks end to end; human-assisted setups reach 64%, and ptai is built for that second regime.

Is it safe to point at production? Only with written authorization, and only with the guardrails on: intensity=safe skips state-mutating probes, respect_rate_limits honors 429 / Retry-After, and strict_scope refuses off-host requests and stops following redirects. All three default off, so turn them on. See Responsible use.

Why is the Juice Shop number high but the honeypot lower? Juice Shop is the most-documented vulnerable app on the internet, so the LLM and the probe authors both have a head start. The private honeypot measures bugs we wrote ourselves, so its number is lower - and that lower number is the honest signal. We publish both. See Benchmarks.

Does it phone home? No telemetry, and findings stay on your disk. On the MCP path your prompts and the tool output your AI client reads go through that client's API, the same as any session. Blind-vuln detection (OAST) sends callbacks to public oast.fun by default - the contents are encrypted to a local keypair, but the fact a callback happened plus source IP and timestamp are visible to whoever runs the collaborator. Self-host Interactsh or run with --no-oast to avoid that. See Responsible use.

What does it cost to run? On the MCP path, nothing beyond your AI subscription, which handles its own billing. On the standalone CLI, ptai caps spend at $10 per engagement by default; change it with PTAI_PRICE_LIMIT. See Install.

How is this different from just using Claude or PentestGPT? A curated deterministic probe library finds the bugs; the LLM runs the phase loop and reasons about results, it doesn't scan. That is why findings reproduce and ship with a working PoC instead of an LLM guess. See Why it's different and vs the field.

Star history

License

MIT. Do whatever you want with it.

If ptai saved you a Sunday, star the repo. It's the only payment I ask for.