README.md

June 28, 2026 · View on GitHub

re:factory

CI codecov Python 3.11+ License: MIT Runner: Claude Code Runner: Bob Shell Runner: OpenAI Codex

Describe what you want — re:factory builds it, tests it, and keeps improving it. Design an idea from scratch or point at an existing project for continuous improvement. Runs with Claude Code, Bob Shell, and OpenAI Codex.

All state is local — per-project in .factory/ (add to .gitignore), global in ~/.factory/. See Architecture for the full deep-dive.


Quick Start

Prerequisites: Python 3.11+, uv, and Claude Code (installed and authenticated).

git clone https://github.com/akashgit/remote-factory.git
cd remote-factory
uv sync

Then start with one of the two main workflows:

# Design — brainstorm an idea, refine it, then build
uv run factory ceo "my idea" --mode design

# Improve — point at an existing project for continuous improvement
uv run factory ceo /path/to/project --mode improve --focus "issue # or whatever you want to improve or fix"

# Co-improve — if you want to iterate on the implementation plan before implementation starts for an improvement
uv run factory ceo /path/to/project --mode design --focus "issue # or whatever you want to improve or fix"

See the full setup guide for authentication and environment variables.


What Do You Want to Do?

I want to…Command
Start from a raw ideauv run factory ceo "my idea" --mode design
Improve an existing projectuv run factory ceo /path/to/project --mode improve --focus "issue number or whatever you want to improve or fix
Co-improve an existing projectuv run factory ceo /path/to/project --mode design --focus "description of whatever you want to improve or fix
Create a new factory modeuv run factory ceo /path/to/factory --mode create --focus "mode description"

Design Workflow

Use design mode when you want to brainstorm before building. Start a conversation with the CEO to refine an idea, then build:

# From a raw idea — discuss and refine into a buildable spec
uv run factory ceo "distributed task runner" --mode design

# From a spec file — read and discuss before building
uv run factory ceo ~/ideas/my-app-spec.md --mode design

Design mode also works on existing projects. The CEO studies the backlog, eval scores, open issues, and experiment history, then discusses what to work on before executing:

uv run factory ceo ~/factory-projects/my-app --mode design

# Seed the conversation with a topic
uv run factory ceo ~/factory-projects/my-app --mode design --focus "auth layer"

You can also pass a spec file or URL directly — uv run factory ceo spec.md — and re:factory builds without the design conversation.


Improve Workflow

Improve mode is re:factory's continuous improvement loop for existing projects. Point it at a codebase and it autonomously observes the project state, generates hypotheses for improvements, builds and tests changes, and keeps or reverts each experiment based on eval scores.

uv run factory ceo ~/factory-projects/my-app --mode improve

Each cycle: observehypothesizebuildreviewmeasuredecide (keep or revert) → archive. The Strategist picks work from the backlog using FEEC priority (Fix > Exploit > Explore > Combine).

When you know exactly what you want, --focus pins a single target — one hypothesis, one experiment, done:

uv run factory ceo ~/my-app --mode improve --focus "add dark mode toggle"
uv run factory ceo ~/my-app --mode improve --focus 42                       # GitHub issue
uv run factory ceo ~/my-app --mode improve --focus "owner/repo#42"          # Issue shorthand

Post-Cycle Refinement

After a build or improve cycle finishes in foreground mode, the CEO stays active — it doesn't exit. Ask for changes directly:

"Fix the typo in the header" "Add error handling to the upload endpoint" "Make the tests more thorough"

Each request runs through the full experiment pipeline: the Refiner scopes it → Builder implements → review + eval + E2E gate → keep/revert verdict. No shortcuts — every refinement is a tracked experiment with its own PR.

You can also invoke refinements directly with --refine:

uv run factory ceo ~/my-app --refine "add rate limiting to the API"

There's no cap on refinements. Advisory warnings appear at 5 and 10 to flag context growth, but the user decides when to stop.


Create New Modes

Create mode lets you build new factory modes — new workflows, new pipelines, new factories. Pass a description via --focus to tell the CEO what mode to create. It's fully interactive — the CEO researches existing patterns, synthesizes a workflow spec, gets your approval, then implements everything: workflow definition, SKILL.md, CLI wiring, and tests.

uv run factory ceo /path/to/factory --mode create --focus "a mode that validates PRs with multi-stage checks"

The pipeline: 3 parallel researchers (existing patterns, intent analysis, best practices) → Strategist synthesizes a workflow spec → you approve (like design mode) → Builder implements → QA verifies end-to-end → PR.

Point it at the factory repo itself to extend re:factory with custom pipelines.


Eval System

Every change is measured by an 11-dimension composite score across three tiers: Hygiene (tests, lint, types, coverage), Growth (API surface, experiment diversity, observability), and Project (user-defined domain metrics). On first run, uv run factory discover auto-detects your project's language and framework to generate the eval profile. See Eval System for scoring details, weights, and guards.


Verified Skill Generation

Workflow graphs (Pydantic definitions) are converted to SKILL.md prose files that the CEO follows at runtime. This conversion goes through a verified pipeline to prevent information loss:

Workflow (Pydantic) → templatize → review agent → guard → split
                         │              │           │        │
                    {{slot::default}}   opus    structural   SKILL.md +
                    + annotations     refines    diff check  annotations.yaml

The pipeline produces two artifacts per workflow:

  • SKILL.md — clean prose the CEO reads at runtime
  • SKILL.annotations.yaml — structured metadata per node for programmatic verification

Regenerate all skills after changing workflow definitions:

uv run factory workflow export-skills

A regression test (test_annotations_match_source) runs in CI to catch drift between workflow definitions and exported skills.


Built with re:factory

ProjectWhat it doesMode
SWE-bench solverAutonomous agent that resolves GitHub issues, improved via failure analysisResearch
HMMT math solverMulti-agent team that solved HMMT Feb 2025 Combinatorics Problem 7Research
Text/Sketch → CADNatural language and sketches to executable CadQuery Python code for 3D modelsResearch
HLS design space explorerPer-function AI agents + ILP solver for HLS optimization — 92% execution time reductionBuild
PluckiOS app that extracts structured data from screenshots using on-device AIBuild + Improve
SDG HubAgent-maintained open-source framework for synthetic data generationBuild + Improve
OpenSkies Airline Corpus85-document fictional airline corpus for RAG/fine-tuning evaluation with cross-document consistency validationDesign + Improve
re:factory itselfRuns on itself — continuously improved via its own experiment outcomesMeta

Built something with re:factory? Open a PR to add it here.


CLI Quick Reference

# Core workflow
uv run factory ceo "idea" --mode design         # Design from a raw idea
uv run factory ceo <path> --mode improve        # Improve an existing project
uv run factory ceo <path> --refine "..."        # Single targeted refinement
uv run factory ceo <path> --mode create --focus "description"  # Create a new factory mode
uv run factory ceo <path> --loop                # Continuous improvement loop
uv run factory tmux <path> --loop               # Loop in detached tmux session

See uv run factory --help for the complete list.


Runners

re:factory supports multiple CLI backends. Default is Claude Code — switch with --runner or FACTORY_RUNNER:

# Direct
CODEX_API_KEY="..." uv run factory ceo /path --runner codex
BOBSHELL_API_KEY="..." uv run factory ceo /path --runner bob

# Via config.toml profile (persistent)
uv run factory ceo /path --profile codex

Configure profiles in ~/.factory/config.toml:

[credentials.codex]
FACTORY_RUNNER = "codex"
CODEX_API_KEY = "..."

[credentials.bob]
FACTORY_RUNNER = "bob"
BOBSHELL_API_KEY = "..."

Run uv run factory config show to see resolved config, or uv run factory config edit to open the file. See Setup Guide for full details.


LLM Tracing (LangFuse)

LangFuse provides LLM observability and tracing — track agent invocations, token usage, and execution flow across all factory runs.

Quick Start

# Start LangFuse services
scripts/langfuse-setup start

# Set the env vars the factory needs
export LANGFUSE_HOST=http://localhost:3000
export LANGFUSE_BASE_URL=http://localhost:3000
export LANGFUSE_PUBLIC_KEY=pk-lf-dev-local-key
export LANGFUSE_SECRET_KEY=sk-lf-dev-local-key
export TELEMETRY_PLATFORM=langfuse

The dev credentials above match the docker-compose setup. Add them to your ~/.bashrc or ~/.zshrc to persist across sessions.

Viewing Traces

  1. Start LangFuse: scripts/langfuse-setup start
  2. Run the factory: uv run factory ceo /path/to/project
  3. Open http://localhost:3000 in your browser
  4. Login: dev@localhost.local / devpassword123

CLI Commands

scripts/langfuse-setup start    # Start LangFuse services
scripts/langfuse-setup stop     # Stop services
scripts/langfuse-setup status   # Show status and credentials

Requirements

  • Docker or Podman — any of docker compose, docker-compose, or podman-compose works

Disabling Tracing

To disable tracing without stopping LangFuse:

export LANGFUSE_TRACING_ENABLED=false

For LLM connection setup, trace structure details, and troubleshooting, see infra/langfuse/README.md.


Install as a Claude Code Plugin

re:factory is also distributed as a fully-bundled Claude Code plugin — agents, skills, and slash commands packaged together. A GitHub Actions workflow rebuilds the plugins branch of this repo on every push to main, so it always tracks the latest generated artifacts.

From inside Claude Code:

/plugin marketplace add akashgit/remote-factory#plugins
/plugin install factory@remote-factory
/reload-plugins

Once installed, the plugin exposes:

  • The /factory:implement slash command (entry point for the multi-agent pipeline).
  • Namespaced subagents — invoke with factory:ceo, factory:researcher, factory:builder, etc.
  • The bundled skills under .agents/skills/ (e.g. pipeline-subagents, implement).

The plugin still shells out to the factory CLI for the heavy lifting, so you'll need uv and the factory package installed locally as described in Quick Start.

To update later: /plugin marketplace update remote-factory. To remove: /plugin uninstall factory@remote-factory.


Plugin Agents

If you'd rather skip the marketplace and just register the specialist agents as standalone Claude Code (or Codex) subagents, use the built-in installer:

uv run factory install                   # Install all 9 agents to ~/.claude/agents/
uv run factory install --runner codex    # Or install Codex TOML agents to ~/.codex/agents/
claude --agent factory-ceo "improve this project"
claude --agent factory-researcher "study the auth system"

This path only ships the agent prompts (no skills, no slash commands) and is independent of the plugin marketplace install above.


Documentation

DocWhat's in it
Setup GuideInstallation, authentication, environment variables
Getting StartedLifecycle walkthrough, research mode details, factory.md config
ArchitectureThree-layer system, agent roles, state machine, data flow
Eval SystemHygiene/growth/project tiers, scoring, guards, precheck
Configurationfactory.md reference — all sections and options
ACE Self-ImprovementHow re:factory evolves its own agent playbooks
ContributingDev setup, code style, testing, PR workflow

Development

uv sync --all-groups              # Install all deps including dev
uv run pytest -v                  # Full test suite
uv run ruff check .               # Lint
uv run mypy factory/              # Type check

License

MIT — Akash Srivastava