ABVX Agent Skills

June 14, 2026 · View on GitHub

ABVX Agent Skills logo

Small, reviewable, validation-gated agent skills for Codex-style project work.

Validate Security Audit PyPI version Catalog live gh skill ready

ABVX Agent Skills is a small, auditable skillpack for coding agents that helps them write smaller diffs, debug from evidence, compact noisy shell output, and verify work before saying done.

These are not prompt dumps. They are compact SKILL.md workflows with clear triggers, attribution, risk notes, and validation. They are portable, versioned agent capabilities meant to be previewed, inspected, and loaded on demand through the Agent Skills progressive-disclosure model.

Try One Skill In 2 Minutes

Preview before installing:

gh skill preview markoblogo/abvx-agent-skills minimal-diff-builder

Install one skill:

gh skill install markoblogo/abvx-agent-skills minimal-diff-builder --agent codex --scope user

Then ask your coding agent:

Use minimal-diff-builder. Implement the smallest correct fix for this issue.

The newer bet in this pack is LoopOps: useful skills should not compete with stronger base models by restating generic advice. They should capture repo-specific context, tool adapters, verification gates, and supervisor contracts that can promote repeated work into scripts, workflows, and cost-bounded agent loops.

Context

This repository assumes that many public AI skills are net-negative. The bar here is not novelty or stars. The bar is whether a skill adds usable structure without degrading behavior.

Video context: I scraped AI skills from GitHub and tested whether they actually help models

Catalog

Browse the searchable catalog at lab.abvx.xyz/tools/abvx-agent-skills/. The page is powered by the generated catalog data in docs/catalog.json, so the repository remains the source of truth while the published catalog lives on ABVX Lab.

If you want a scan-friendly text catalog for browsing or indexing, use CATALOG.md.

Start With One Job

JobInstallUse when
Write smaller patchesminimal-diff-builderThe agent keeps refactoring too much, widening blast radius, or adding abstractions you did not ask for.
Debug from evidencediagnoseThe agent keeps guessing fixes without reproducing the failure and verifying the result.
Save tokens in shell-heavy workrtk-assisted-shell, shell-output-compaction, token-efficient-executionLogs, diffs, tests, and command output are burning context and hiding the real signal.
Verify frontend workbrowser-verification, design-critique-polishThe agent says "done" without checking real browser behavior, layout, states, or console errors.

LoopOps

LoopOps is the framework layer in this repo: it decides when a repeated prompt should remain a prompt and when it should become a checklist, skill, script, or bounded loop.

See:

LoopOps promotion ladder from prompt to checklist, skill, script, or bounded loop

Start Here

  • Need to save tokens? Start with rtk-assisted-shell, shell-output-compaction, token-efficient-execution, and lean-context-layout. Add compaction-survival if your sessions run long enough to forget their own state.
  • Need to debug a repo? Start with diagnose, repo-debugging-ledger, and graph-guided-code-reading.
  • Need the smallest correct implementation path? Start with minimal-diff-builder, then add delivery-preflight-gate when the task is long or risky enough that baseline verification matters.
  • Need to cut bloat from an existing diff or repo slice? Start with overengineering-review, and switch to minimal-diff-builder when you want the cuts implemented as the smallest correct patch.
  • Need to build frontend? Start with frontend-product-builder, designmd-brand-kit, and browser-verification.
  • Need a small Lottie or SVG-driven motion asset? Start with lottie-motion-builder, then pair with frontend-product-builder when the animation needs to land inside a real UI surface.
  • Need a standalone HTML artifact? Start with html-diagram-artifact for SVG-first architecture explainers, or html-brief-artifact for plans, summaries, reports, and research notes.
  • Need stronger UI taste or design setup? Start with design-register-bootstrap, frontend-taste-layer, and design-critique-polish.
  • Need long-session continuity? Start with handoff, compaction-survival, and token-usage-audit.
  • Need to onboard a new repo? Start with project-context-bootstrap and follow with durable-context-maintenance.
  • Need discovery or product shaping? Start with rapid-grilling, doc-grounded-grilling, and spec-to-prd.
  • Need to turn plans into execution? Start with plan-to-issues, repo-issue-triage, and test-driven-execution.
  • Need safer long delivery runs? Start with delivery-preflight-gate, phase-spec-execution, recovery-loop-3strike, and delivery-baseline-audit.
  • Need a full multi-track workflow? Start with dynamic-workflow-packets.
  • Need to turn repeated prompts into loops? Start with loopops-protocol, then use skillopt-evolve-skills to capture durable lessons.
  • Need to build reusable assistant packs? Start with role-skill-pack-design, workflow-policy-layering, brief-first-execution, and private-vs-publishable-skill-audit.

Skills

These skills are grouped by the job they do. The token-economy layer is intentionally visible first: for many teams, the easiest win is not “a smarter prompt”, but less wasted context.

Token Economy & Context Control

SkillWhat It Does
rtk-assisted-shellRoutes noisy shell workflows through RTK-style filtering. On shell-heavy tasks this can cut command-output tokens dramatically, often in the same range as RTK's reported 60-90% savings on common dev commands.
shell-output-compactionShrinks logs, diffs, and repo search output into counts, slices, and error-first excerpts. Usually the fastest way to turn multi-screen stdout into a small, usable artifact.
graph-guided-code-readingReplaces broad repo reading with entrypoints, symbols, dependencies, and blast radius. On large codebases this can turn “read everything” into a much smaller focus set.
token-efficient-executionCuts waste from repeated reads, broad rewrites, and low-value narration. Best for long coding sessions where the loop, not the final answer, is burning the budget.
token-frugal-modeCompresses final answers without dropping the decisive technical signal. Useful when the session is tight and you want shorter replies without caveman-style degradation.
lean-context-layoutShrinks always-loaded agent docs into a compact startup core and pushes the rest on demand. Best for bloated AGENTS.md, CLAUDE.md, and repo runbooks.
compaction-survivalPreserves the high-value working state before long sessions collapse into compaction. Saves the turns you would otherwise spend reconstructing “what were we doing?”.
token-usage-auditDiagnoses where the budget is really going: startup bloat, shell noise, repeated reads, oversized summaries, or compaction loss. Use this before over-optimizing the wrong layer.

Coding, Debugging & Architecture

SkillWhat It Does
diagnoseRuns a disciplined debugging loop around one reproducible signal, ranked hypotheses, and narrow verification.
repo-debugging-ledgerKeeps a checked-location ledger so debugging does not keep reopening the same code and repeating the same dead ends.
complexity-optimizerFinds safe complexity and performance simplifications without turning the codebase into a refactor festival.
minimal-diff-builderBuilds the smallest correct implementation path using a YAGNI, stdlib-first, native-first, minimal-diff ladder with explicit safety exceptions.
overengineering-reviewReviews code specifically for needless abstractions, replaceable dependencies, dead flexibility, and wrappers over stdlib or platform behavior.
architecture-deepening-reviewReviews deeper module seams, coupling, change surfaces, and testability, not just top-level architecture slogans.
test-driven-executionBuilds features and fixes through one-behavior-at-a-time red-green-refactor loops instead of broad speculative implementation.
system-zoom-outPulls a local code area back into its wider system map so you can reason about callers, modules, boundaries, and blast radius.
agents-best-practicesHardens agent harnesses around permissions, context shape, safety, and evaluation discipline.
skillopt-evolve-skillsImproves agent instructions and skills from real task evidence rather than from theory alone.

Frontend, UX & Product Surfaces

SkillWhat It Does
design-register-bootstrapEstablishes compact design context before implementation: brand vs product register, audience, anti-references, color strategy, and PRODUCT.md / DESIGN.md direction.
frontend-taste-layerAdds a stronger anti-slop design layer to frontend work so outputs stop looking templated, generic, or visually under-committed.
design-critique-polishRuns a focused critique-and-polish pass to rank frontend issues, identify ship blockers, and tighten hierarchy, typography, color, and states.
frontend-product-builderBuilds usable frontends, landing pages, pitch pages, dashboards, and prototypes with a product-first interaction model.
lottie-motion-builderBuilds small production-ready Lottie assets from SVGs, logos, loaders, and UI states with a local preview harness and output verification.
designmd-brand-kitTurns a website or brand surface into an agent-usable design system: structure, identity, and reusable UI cues.
browser-verificationVerifies real browser rendering, responsive layout, and interaction behavior instead of trusting static code inspection.
web-quality-auditAudits accessibility, performance, UX, privacy, and browser security as one practical web quality pass.
prototype-labRapid throwaway builds for testing interaction, logic, and product direction before committing to heavier implementation.

HTML Artifacts & Visual Deliverables

SkillWhat It Does
html-diagram-artifactCreates standalone HTML/SVG diagrams for architecture, request paths, component relationships, and system explainers with minimal prose and browser-verifiable dark mode.
html-brief-artifactCreates standalone HTML briefs for plans, status updates, PR summaries, incident notes, and research explainers without drifting into a full frontend build.

Project Context & Onboarding

For design-heavy repos, pair this section with design-register-bootstrap from the frontend section.

SkillWhat It Does
project-context-bootstrapDetects the stack, asks the right project questions, and turns a weakly documented repo into a compact, agent-usable context surface.
durable-context-maintenanceKeeps repo-local context current after architecture, workflow, and test-flow changes so agents stop rediscovering the same facts.

Discovery, Planning & Delivery

SkillWhat It Does
rapid-grillingQuickly sharpens vague ideas through one-question-at-a-time alignment before heavier planning starts.
doc-grounded-grillingStress-tests a plan against repo docs, ADRs, design assets, and domain language so discovery stays grounded in reality.
spec-to-prdTurns clarified context into a durable PRD for product, client, and internal roadmap work.
plan-to-issuesBreaks PRDs and plans into thin end-to-end slices that agents or humans can actually pick up.
repo-issue-triageMoves bugs and enhancements through a compact state machine so backlog items become actionable instead of vague.

Research, Knowledge & Reusable Methods

SkillWhat It Does
evidence-ledger-researchKeeps claims, sources, calculations, and open questions in a disciplined evidence ledger.
loopops-protocolChooses when repeated agent work should stay a prompt or be promoted into a skill, checklist, script, workflow, or cost-bounded loop.
book-to-skillConverts books, papers, and long documents into reusable, progressive-disclosure agent skills.
role-skill-pack-designDesigns compact role/workflow skill packs with base layers, difference layers, boundaries, and rollout order.
workflow-policy-layeringSeparates workflow from authority, escalation, forbidden actions, and validation so assistant specs stop contradicting themselves.
brief-first-executionStarts non-trivial work with one live brief for scope, non-goals, risks, verification, and done criteria.
private-vs-publishable-skill-auditAudits private skill packs before publication and extracts only the reusable layer.

Workflow, Handoffs & Multi-Track Work

SkillWhat It Does
dynamic-workflow-packetsOrchestrates large coding, research, audit, or client-search tracks without losing verification and risk gates.
handoffProduces compact continuation briefs for long-running work, agent resumes, and human handoffs.

Long-Run Delivery Control

SkillWhat It Does
delivery-preflight-gateRuns the minimum useful baseline checks before a long implementation loop starts, so pre-existing breakage does not poison later verification.
phase-spec-executionBreaks larger delivery into explicit phases with acceptance criteria, verification commands, and lightweight state updates.
recovery-loop-3strikeBounds execution failure handling to one evidence-bearing retry, one focused fix-spec, and then an honest blocker handoff.
delivery-baseline-auditRe-checks declared deliverables and final verification against the starting baseline and full working tree before calling the task complete.

Structured Data & Spreadsheet Work

SkillWhat It Does
spreadsheet-workbook-forensicsRepairs and edits spreadsheets where workbook structure, formulas, and cell-level verification matter.

Install

Fastest path for most users:

pip install abvx-agent-skills
abvx-skills install

Install with GitHub CLI agent-skills support:

gh skill install markoblogo/abvx-agent-skills minimal-diff-builder

Target a specific host or scope when needed:

gh skill install markoblogo/abvx-agent-skills minimal-diff-builder --agent codex --scope user
gh skill install markoblogo/abvx-agent-skills diagnose --agent cursor --scope project

gh skill is currently a GitHub CLI preview feature. Use GitHub CLI v2.90.0+. The command set and flags are documented in the official gh skill manual and the GitHub changelog announcement for GitHub CLI agent skills.

Published package pages:

Current distribution channels:

Install one skill into Codex:

git clone https://github.com/markoblogo/abvx-agent-skills
cp -R abvx-agent-skills/skills/dynamic-workflow-packets ~/.codex/skills/

Install all skills:

git clone https://github.com/markoblogo/abvx-agent-skills
cp -R abvx-agent-skills/skills/* ~/.codex/skills/

Start a new agent session after installation so the skill descriptions are discovered.

Install one packaged skill into Codex:

abvx-skills install dynamic-workflow-packets

Install to a custom destination:

abvx-skills install --destination ./tmp-skills

Install via Homebrew tap:

brew tap markoblogo/tap
brew install abvx-agent-skills

homebrew-core is not the current install path for this project. The upstream submission was closed under the repository's notability policy, so the maintained Homebrew channel is the ABVX tap.

Smoke-test the published package from PyPI:

python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install abvx-agent-skills
abvx-skills list
abvx-skills validate

Safety And Auditability

Before installing a skill, inspect it:

gh skill preview markoblogo/abvx-agent-skills minimal-diff-builder

Validate local or packaged skills:

abvx-skills validate
gh skill publish --dry-run

Run the static security audit:

abvx-skills audit-security ./skills --no-llm

This repository is intentionally optimized for inspection before trust: compact skill files, reviewable metadata, structural validation, and a publish dry-run that catches naming and metadata drift before release.

Onboarding Paths

Demos

Distribution

If you are listing the repo in curated skill directories, agent catalogs, or install surfaces, use docs/outreach/submission-kit.md for positioning and docs/outreach/targets.md for target tracking.

For the current first-wave outreach set, use docs/outreach/first-wave-submissions.md.

Repository Profile

Each public skill includes:

  • SKILL.md - executable agent instructions
  • SKILL_CARD.md - intended use, attribution, risks, evaluation, and version
  • agents/openai.yaml - Codex UI metadata

The project follows the open Agent Skills shape: SKILL.md plus optional scripts/, references/, and assets/. For Codex compatibility, top-level frontmatter is kept conservative: name, description, license, metadata, and supported fields only.

The HTML artifact skills intentionally keep their deliverables single-file and dependency-light. Use them for explainers and briefs, not as substitutes for production frontend implementation.

Contribute

How To Contribute Your Own Skills

Use this repo when a workflow has repeated often enough that it deserves a sharper portable behavior layer, not when you just have a long prompt.

Contribution path:

  • Submit your own skill: draft it against docs/abvx-skillpack-profile.md, mirror the shape of an existing skill, and open a PR with the smallest useful slice.
  • Request a missing skill: open a Skill Request when the repeated workflow is real but the right skill does not exist yet.
  • Autopsy a broken skill: open a Skill Autopsy when an internal or external skill added noise, abstractions, or fake process and should be reduced into something stronger.

Good submissions usually have:

  • a narrow trigger, not a vague domain
  • one clear behavior change
  • explicit anti-patterns or stop conditions
  • honest verification instead of broad motivational prose

Use docs/solo-dev-quickstart.md and docs/team-rollout-playbook.md as examples of opinionated packaging aimed at real adoption paths rather than generic documentation.

Validate

python scripts/validate.py

Or validate the packaged skills through the CLI:

abvx-skills validate

Run a static security audit with SkillSpector:

pip install git+https://github.com/NVIDIA/SkillSpector.git
abvx-skills audit-security ./skills --no-llm

Evaluate reports against the repo policy and baseline:

python scripts/evaluate_skillspector.py \
  --reports-dir artifacts/skillspector \
  --policy .abvx/skillspector-policy.yaml \
  --baseline .abvx/skillspector-baseline.json

Validate a local skills directory:

abvx-skills validate ~/.codex/skills

Structural validation and security audit are separate gates. The validator checks required files, frontmatter, directory/name alignment, TODO placeholders, cards, UI metadata, and basic secret patterns.

Benchmarks

Benchmark scaffolding now lives under benchmarks/. It documents how to measure skill impact without publishing fake precision. Until the repo has stable reproducible runs across tasks and models, benchmark numbers should be treated as pending evidence rather than marketing copy.

Release

Build and check the package locally:

python -m pip install --upgrade build twine
python -m build
python -m twine check dist/*

Publish flow:

  • Run the publish GitHub Actions workflow with repository=testpypi for a dry run against TestPyPI.
  • Create a GitHub release, or run the same workflow with repository=pypi, to publish to PyPI.
  • Configure trusted publishing for both pypi and testpypi environments in the package index before the first release.
  • Keep the released version aligned with pyproject.toml and the skill inventory documented above.

Philosophy

  • Keep always-loaded context small.
  • Prefer procedural rules over vague advice.
  • Make skills easy to audit in diffs.
  • Attribute upstream inspiration.
  • Pair useful automation with risk gates and verification.

See docs/abvx-skillpack-profile.md for the repository standard.

Attribution

Several skills are inspired by public work from the broader agent tooling ecosystem. See ATTRIBUTION.md.

License

MIT. See LICENSE.