OpenAI Codex CLI Best Practices
April 30, 2026 · View on GitHub
简体中文 | English
OpenAI Codex CLI Best Practices
Codex CLI is OpenAI's open-source terminal coding agent, written in Rust — sitting alongside Claude Code and Gemini CLI as the "big three" terminal agents. This guide is verified against the official docs (developers.openai.com/codex) and the openai/codex repo at v0.125.0 (April 2026).
⚠️ Note: this is the new Codex CLI released in 2025 (open-source, terminal agent), not the original Codex model that was deprecated in 2023. The product family also includes Codex App (desktop) and Codex Web (cloud agent at
chatgpt.com/codex); this doc covers the CLI only.
Core Concepts
| Concept | Description | Use Case |
|---|---|---|
| AGENTS.md | Project/global instruction file | Like Claude Code's CLAUDE.md; auto-loaded on launch |
| Approval Mode | When does Codex pause to ask | untrusted / on-request / never |
| Sandbox Mode | What files/network it can touch | read-only / workspace-write / danger-full-access |
| Profile | Named bundle of config | --profile work to swap model/permissions |
| Subagent | Child agent for bounded tasks | TOML-defined, can have its own model/sandbox |
| Skill | Reusable workflow (SKILL.md) | Package a recurring task as a named capability |
| MCP Server | External tool integration | STDIO or HTTP/OAuth |
| Plan Mode | Plan before acting | /plan or Shift+Tab |
Getting Started
Installation
# npm (recommended)
npm install -g @openai/codex
# or Homebrew (macOS)
brew install --cask codex
# or download a GitHub Release binary (macOS arm64/x64, Linux x64/arm64)
# https://github.com/openai/codex/releases/latest
System requirements (from official install docs): macOS 12+, Ubuntu 20.04+ / Debian 10+, Windows 11 (WSL2), 4 GB RAM (8 GB recommended), optional Git 2.23+.
Authentication
codex # On first launch, choose "Sign in with ChatGPT"
Two options:
- ChatGPT account login (officially recommended) — uses your ChatGPT Plus / Pro / Business / Edu / Enterprise quota; opens a browser for OAuth.
- API key — for CI, corporate proxies, or precise per-token billing; needs extra setup (see
developers.openai.com/codex/auth).
Your First Run
cd /your/project
codex
Inside the TUI, try these (from the official quickstart):
> Tell me about this project
> Find and fix bugs in my codebase with minimal, high-confidence changes
> Build a classic Snake game in this repo
💡 Official advice:
git commitor create a checkpoint before any non-trivial task. If Codex's changes aren't what you want, you can roll back instantly.
AGENTS.md — Project Instruction File
Drop an AGENTS.md at your repo root (or run /init to scaffold one):
# Project
Next.js 14 + TypeScript admin dashboard.
# Layout
- src/app/api/ route handlers
- src/components/ UI components (PascalCase)
- src/lib/db/ Drizzle data access layer
# Build / Test / Lint
- Dev: pnpm dev
- Test: pnpm test (Vitest — must pass after every change)
- Types: pnpm typecheck
- Lint: pnpm lint --fix
# Conventions
- TS strict mode
- All DB access through Drizzle, no raw SQL
- Don't bypass husky pre-commit hooks
# Don't
- Don't touch src/legacy/
- Don't add runtime deps without asking
Discovery order (important):
~/.codex/AGENTS.override.md ← temp global override
~/.codex/AGENTS.md ← personal global
<git root>/AGENTS.md ← team-shared
<subdir>/AGENTS.md ← module-local (overrides parents)
Files are concatenated from shallow to deep; closer-to-cwd files win. Default 32 KiB cap per file, tunable via project_doc_max_bytes.
Prompting Tips
OpenAI's recommended four-element prompt template (from developers.openai.com/codex/learn/best-practices):
Goal: what to change or build
Context: which files, docs, errors matter
Constraints: standards, architecture, safety rules
Done when: the completion bar
1. Be precise, not verbose
❌ optimize the login flow
✅ Goal: switch src/app/api/login/route.ts from bcrypt.compareSync to async compare
Context: called by src/components/LoginForm.tsx; error shape { code, message } must stay
Constraints: don't touch the schema, don't add deps
Done when: pnpm test passes; 100 concurrent logins show no event-loop blocking warnings
2. Reference files with @, don't paste
@src/services/payment.ts @src/services/order.ts
Read both, analyze the retry logic in the failure path, propose a fix, wait for my OK before editing.
Letting Codex pull files itself saves tokens and avoids accidental truncation.
3. Tune reasoning effort
Push harder for hard problems, dial back for easy ones:
- low — small edits, formatting
- medium / high — most refactors and bugfixes
- xhigh — architectural, multi-file, deeply coupled
Switch in the TUI via shortcut, or set the default in config.toml.
4. Plan first, act second
/plan
Walk src/api/, unify all routes onto the new error-handling middleware
Or hit Shift+Tab to enter Plan Mode. Codex lays out the plan and asks clarifying questions before writing code — strongly recommended for multi-file work.
5. Make it test and self-review
After your changes:
1) Run pnpm test, paste any failing output
2) Run pnpm typecheck, confirm no new type errors
3) Use /review to self-audit the diff and flag risks
Quoting the official guide verbatim: Don't stop at asking Codex to make a change. Ask it to create tests when needed, run checks, confirm results, and review work before accepting.
Approval & Sandbox (the two big safety knobs)
Codex's distinguishing design — two independent dimensions.
Sandbox (what it can touch)
| Mode | Behavior |
|---|---|
read-only | Read only; any write/exec/network needs approval |
workspace-write ⭐ default | Read + edit inside workspace + run routine commands; out-of-workspace writes or network need approval |
danger-full-access | No restrictions — not recommended |
Enforcement (verified against codex-rs/cli/src/debug_sandbox.rs):
- macOS — Seatbelt (built-in)
- Linux / WSL2 — Landlock kernel LSM + seccomp filtering (requires Linux 5.13+ with Landlock enabled)
- Windows — Restricted-token sandbox (active in PowerShell)
Tip:
codex sandbox seatbelt|landlock|windows -- <cmd>is the debug subcommand to verify whether a single command would survive the sandbox.
Approval (when it asks)
| Mode | Behavior |
|---|---|
untrusted | Only known-safe read ops auto-run; everything else asks |
on-request ⭐ default | Auto-runs inside sandbox; only asks when going outside |
never | Never asks (CI / scripts — pair with a tight sandbox) |
Common Combos
# Local dev — just run `codex` (defaults are already workspace-write + on-request)
codex
# Explicit form (equivalent to the above)
codex --sandbox workspace-write
# Just look around, don't edit
codex --sandbox read-only
# Add a few writable dirs outside the workspace (without going full-open)
codex --add-dir ../sibling-repo --add-dir /tmp/scratch
# Non-interactive in CI / scripts
codex exec --sandbox workspace-write --ask-for-approval never "run tests and fix failures"
# Open the sandbox fully (you understand the consequences)
codex --sandbox danger-full-access --ask-for-approval never
# Full YOLO: skip approvals AND drop the sandbox (only when externally sandboxed, e.g. Docker)
codex --yolo
# equivalent to: codex --dangerously-bypass-approvals-and-sandbox
⚠️
--full-autois deprecated and removed (v0.125.0 still keeps it insidecodex execonly to print a migration warning). Use--sandbox workspace-write— that's the official replacement; approval defaults toon-requestalready.🚨 CI gotcha: the default approval mode is interactive — CI will hang on the prompt and time out.
codex exec+--ask-for-approval neveris the canonical CI shape.
Advanced
1. config.toml — Persistent Preferences
~/.codex/config.toml (user) / .codex/config.toml (project):
model = "gpt-5.5"
approval_policy = "on-request"
sandbox_mode = "workspace-write"
[features]
web_search = "cached" # cached by default; --search switches to live
multi_agent = true
[profiles.review]
model = "gpt-5.5"
approval_policy = "untrusted"
sandbox_mode = "read-only"
[profiles.ci]
approval_policy = "never"
sandbox_mode = "workspace-write"
[mcp_servers.github]
command = "npx @modelcontextprotocol/server-github"
enabled = true
Switch profiles:
codex --profile review # read-only review pass
codex exec --profile ci "..." # CI / scripted run
2. Slash Commands Cheat Sheet
| Command | Use |
|---|---|
/init | Scaffold an AGENTS.md in the repo |
/model | Switch model |
/plan | Enter Plan Mode |
/review | Have Codex review the current diff / branch / commit |
/compact | Compress conversation history; free up tokens |
/agent | Switch between active subagent threads (/multi-agents alias) |
/side | Open a side conversation in an ephemeral fork — won't pollute main thread |
/permissions /approvals | Adjust approval mode on the fly |
/resume /fork | Resume / branch from a prior thread |
/new /clear | Start a new conversation in the same session / clear and restart |
/rename | Rename the current thread |
/diff | Show git diff (including untracked files) |
/mention | Attach a file to the conversation |
/skills | List / use skills |
/memories | View / generate / reset long-term memory |
/mcp | Inspect MCP server status (/mcp verbose for details) |
/plugins /apps | Browse plugins / apps |
/status | Session details and token usage |
/goal | Set / view goal for a long-running task |
/fast | Toggle Fast mode (on/off/status) |
/debug-config | Print config layers and requirements diagnostics (use this when config.toml edits don't take effect) |
/feedback | Send logs to OpenAI maintainers |
Complete list (46 commands) lives in source
codex-rs/tui/src/slash_command.rs::SlashCommand. Just press/in the TUI for autocomplete.
3. Non-interactive Execution (exec)
Treat Codex as a script tool:
# Single prompt
codex exec "replace all console.log with logger.debug"
# Pipe in
git diff main..HEAD | codex exec "review this diff for bugs"
# Pin model + skip approvals
codex exec -m gpt-5.5 --ask-for-approval never "add unit tests for new files"
# Stream events as JSONL (one JSON object per line)
codex exec --json "..." | jq -c .
# Capture only the final message to a file (handy for scripting)
codex exec -o /tmp/answer.txt "..."
4. Subagents — Parallel Subtasks
.codex/agents/explorer.toml:
name = "explorer"
description = "Read-only codebase explorer; gathers evidence before any change."
model = "gpt-5.3-codex"
sandbox_mode = "read-only"
developer_instructions = """
Stay in exploration mode.
Trace execution paths, cite files and symbols with file:line.
Never propose changes — just report findings.
"""
Invoke from the main thread:
Spawn an explorer subagent and have it map all auth logic under src/api/, then report back.
Each subagent runs in its own sandbox + model and returns its summary to the main thread. Great for "explore + execute" splits on complex changes.
5. MCP — External Tools
codex mcp add github --command "npx @modelcontextprotocol/server-github"
codex mcp list
STDIO and HTTP/OAuth servers supported. Common picks: GitHub, Linear, Slack, databases (Postgres / Supabase). Rule of thumb: only add tools that remove a real manual loop — don't pad the list.
6. Skills — Reusable Workflows (note: a folder, not a single file)
Package recurring tasks as a Skill directory. Codex reads the description to decide when to fire and only loads the body on demand ("progressive disclosure"):
~/.agents/skills/release-notes/ ← personal (canonical)
.codex/skills/release-notes/ ← project, Codex-only
.agents/skills/release-notes/ ← project, cross-tool (Claude Code/Codex)
└── SKILL.md # required: name + description + steps
└── references/ # optional: long docs, specs, schemas
└── scripts/ # optional: helper scripts
└── examples/ # optional: I/O examples
└── agents/openai.yaml # optional: Codex-specific metadata
Note:
~/.codex/skills/still works but is deprecated; new installs should use~/.agents/skills/.
How to invoke (from shanraisshan battle-tested patterns):
$skill-creator # explicit trigger with $ prefix
/skills # list all available skills
> draft this week's release notes # description match auto-fires; no $ needed
Skill writing rules (consensus across high-star community repos):
- The description field is a trigger, not a summary — write "when should I fire?" not "what is this?"
- A skill is for the model, not human readers — skip filler
- Give goals + constraints, don't railroad the model with prescriptive steps
- Add a
## Gotchassection per skill with Codex's failure modes in this domain — highest-signal content
Ready-made skill libraries: ComposioHQ/awesome-codex-skills, VoltAgent/awesome-agent-skills (cross-tool).
7. Hooks — Inject Custom Scripts into the Agent Loop (beta)
Enable [features] codex_hooks = true and Codex will invoke your shell scripts at 6 event points:
| Event | When | Typical Use |
|---|---|---|
PreToolUse | Before any tool call | Block dangerous commands, validate args |
PermissionRequest | When asking to escalate | Auto approve/deny by rule |
PostToolUse | After a tool call | Auto lint / format / test |
SessionStart | Session boot | Inject project-specific context |
UserPromptSubmit | User submits prompt | Insert disclaimer, run safety scan |
Stop | Session ends | Upload logs, clean tmp files |
💡 The hooks JSON schema reuses Claude Code's
hooks.jsonformat directly (the source engine is literally namedClaudeHooksEngine), so migrating from Claude Code is zero-change.
Config goes in .codex/hooks.json. Reference impl: shanraisshan/codex-cli-hooks.
The two highest-leverage uses:
- PostToolUse → run prettier / ruff / clang-format — Codex edits, hook formats, CI stays green
- PreToolUse → intercept
rm -rf/git push --force/ DB DROP — belt-and-suspenders safety
8. Memories — Cross-Session Long-Term Memory (beta)
Enable [features] memories = true and Codex carries facts you've confirmed across sessions ("this project uses pnpm — don't suggest npm").
/memories # TUI: view / generate / reset
Stored under ~/.codex/memories/ — per-user, not per-project. Safety toggles (from the source MemoriesToml struct):
[memories]
disable_on_external_context = true # mark thread "polluted" when external data is touched (MCP / web search) — prevents leaking
# (legacy alias `no_memories_if_mcp_or_web_search` still works)
generate_memories = true # default true; set to false to stop generating memories from new threads
use_memories = true # default true; set to false to skip injecting memories into prompts
If a session touches untrusted content (parsed an unfamiliar PDF, ran an unaudited MCP), run /memories → Reset immediately.
9. Plugins / Marketplace — Distribution (v0.121.0+)
Bundle skills + apps + MCP servers as a distributable plugin (.codex-plugin/plugin.json):
codex plugin marketplace add user/repo # GitHub shorthand
codex plugin marketplace add ./local-marketplace # local dir works too
codex plugin marketplace upgrade
codex plugin marketplace remove <name>
# In the TUI
/plugins # browse installed / available plugins
Community marketplace index: hashgraph-online/awesome-codex-plugins.
10. Fast Mode — 1.5× speed, 2× credits (gpt-5.4 only)
/fast on # enable
/fast status # current
/fast off # disable
Best when "I know roughly what to change, I just need a fast typist." Pro subscribers can also use gpt-5.3-codex-spark for near-realtime micro-iteration.
11. Web Search
On by default in cached mode (OpenAI's pre-indexed snapshot). For real-time, add --search:
codex --search "Latest React 19 useActionState best practices"
12. Image Input
codex -i screenshot.png "match this design in src/components/Pricing.tsx"
PNG / JPEG; you can also paste screenshots directly into the TUI composer. Combined with Chrome DevTools / Playwright MCP, Codex can read browser console output itself:
codex mcp add chrome-devtools --command "npx chrome-devtools-mcp"
codex mcp add playwright --command "npx @playwright/mcp"
13. Local Open-Source Models (--oss)
Source utils/oss/src/lib.rs confirms Codex natively supports two local providers:
# Ollama route
codex --oss --local-provider ollama -m qwen2.5-coder
codex --oss --local-provider ollama -m deepseek-coder-v2
# LM Studio route
codex --oss --local-provider lmstudio
Fully offline, zero API cost. Trade-off: local models are noticeably weaker than GPT-5.5. Best for: sensitive on-premise data, no-network environments, bulk low-complexity tasks.
Model IDs follow Ollama / LM Studio naming (run
ollama listto see what's local). Codex doesn't maintain a separate catalog.
14. CI Integration (Official GitHub Action)
openai/codex-action is the official Apache-2.0 action with a built-in restricted sandbox proxy. The canonical PR auto-review workflow:
# .github/workflows/codex-review.yml
name: Codex PR review
on:
pull_request:
types: [opened, synchronize]
jobs:
review:
runs-on: ubuntu-latest
permissions:
contents: read
pull-requests: write
steps:
- uses: actions/checkout@v5
with:
ref: refs/pull/${{ github.event.pull_request.number }}/merge
- run: git fetch --no-tags origin "${{ github.event.pull_request.base.ref }}"
- id: codex
uses: openai/codex-action@v1
with:
openai-api-key: ${{ secrets.OPENAI_API_KEY }}
prompt: |
Review only the changes in PR #${{ github.event.pull_request.number }}:
git diff ${{ github.event.pull_request.base.sha }}...${{ github.event.pull_request.head.sha }}
Be concise. Flag bugs, missing tests, and security risks.
- if: steps.codex.outputs.final-message != ''
uses: actions/github-script@v7
with:
script: |
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: `${{ steps.codex.outputs.final-message }}`
})
⚠️ The sandbox blocks network by default — if your review prompt needs to run tests or install deps, do npm ci in a step before invoking codex-action.
15. Codex as an MCP Server (Reverse Integration)
codex mcp-server # expose Codex as an MCP server
Source confirms this exposes codex() and codex-reply() MCP tools. Use case: let Claude Code, Cursor, or any other agent call Codex as "a colleague" for parallel work or cross-verification.
16. Rules / Execpolicy (Advanced Safety)
Starlark DSL defining command allow/prompt/forbid lists (prefix_rule() + host_executable()). Drop rule files into .codex/rules/*.rules, validate with codex execpolicy check:
prefix_rule(
pattern = ["git", ["push", "force-with-lease"]],
decision = "prompt",
justification = "force pushes need a human eyeball",
)
prefix_rule(
pattern = ["rm", "-rf", "/"],
decision = "forbidden",
justification = "absolutely not",
)
codex execpolicy check --rules .codex/rules/safety.rules git push --force
# {"matchedRules":[{...}],"decision":"prompt"}
Best for enterprise / team setups — finer-grained than approval modes. See codex-rs/execpolicy/README.md.
High-Signal Prompting & Workflow Patterns
Distilled from high-star community guides (shanraisshan/codex-cli-best-practice etc.):
Prompting
✅ "Prove to me this works" — make Codex run the tests + git diff main..HEAD itself
✅ "Knowing everything you know now, scrap this and implement the elegant solution"
(use after a mediocre fix; pushes Codex past local minima)
✅ Codex debugs well by itself — paste the error, say "fix", don't micromanage
Planning
✅ /plan for explicit planning on multi-step tasks (Codex auto-plans too, but explicit is more controllable)
✅ Phase-gated plans, each phase has tests — never let it touch 30 files in one shot
✅ Spin up another Codex (or Claude Code) as a "staff engineer" to review your plan
✅ Write detailed specs, kill ambiguity — better input → better output
AGENTS.md
✅ Heuristic: a fresh dev should be able to launch codex → "run the tests" → it works first try
If not, your AGENTS.md is missing build/setup/test commands
✅ Aim for ~150 lines (hard limit is 32 KiB byte-wise) — longer ≠ better
✅ Behavior rules (approval / sandbox / model) belong in config.toml, NOT AGENTS.md
✅ AGENTS.override.md for personal preferences — keeps team's AGENTS.md clean
Multi-Agent & Parallelism
✅ Multi-agent = throw more compute at the problem; offload chores to subagents
✅ Test-time compute: one agent writes, another finds bugs — separate context windows help
✅ Use git worktrees for parallel agents to avoid clobbering each other
Debugging
✅ Have Codex run the service whose logs you want to watch as a background task
✅ Stuck? Screenshot the issue and feed it in (image input is the most underrated capability)
✅ Agentic search (glob + grep) beats RAG — code drifts faster than indexes refresh
Common Pitfalls
| Symptom | Root Cause | Fix |
|---|---|---|
| Browser pops up demanding login | Token expired | Re-run codex (or codex login) and complete OAuth |
429 Too Many Requests | Rate-limited | Wait for quota; for scripts, switch to API key + throttle |
| CI job hangs and times out | Default approval is interactive | codex exec --ask-for-approval never |
config.toml edits don't take effect | Wrong path / config layering overrides yours | Run /debug-config in the TUI to see the resolved config stack and requirements |
npm i -g fails with EACCES | Global dir owned by root | Install Node via nvm/fnm; never sudo npm |
which -a codex shows multiple paths | npm + brew + binary all installed | Remove dupes; hash -r your shell |
| Codex "can't see" your edits | Outside workspace / wrong root | codex --cd <project> to set the root |
| Long thread getting slow / pricey | Context bloat | /compact or /fork to trim |
| Two agents clobber the same file | Main + subagent both writing | Give the subagent a git worktree |
| Linux sandbox refuses to start | Kernel < 5.13 or Landlock not enabled | Update the kernel (check uname -r); on older distros fall back to --sandbox read-only |
Codex CLI vs Claude Code
These two are the closest analogs, but they optimize differently:
| Dimension | Codex CLI | Claude Code |
|---|---|---|
| Vendor | OpenAI | Anthropic |
| Open-source | ✅ Apache-2.0 (Rust) | ❌ Closed-source CLI |
| Sandbox | OS-kernel level (Seatbelt / Landlock) | App-layer + 26 hook events |
| Default account | ChatGPT subscription | Claude API / Pro |
| Config files | AGENTS.md + config.toml | CLAUDE.md + settings.json |
| Plan mode | /plan or Shift+Tab | /plan |
| Subagents | TOML files | Markdown files |
| MCP | ✅ | ✅ |
| Strong suit | Terminal tasks, CI, token efficiency | Big refactors, cross-file dependency reasoning, code style |
Community rule of thumb (echoed across builder.io / datacamp / multiple comparison posts):
Codex for keystrokes, Claude Code for commits. Codex shines at "fast iterations + sandboxed execution"; Claude Code shines when one change touches 12 files and the dependency graph matters.
Already paying for ChatGPT and budget-sensitive → Codex comes practically free. Doing a big refactor where code quality is paramount → Claude Code is still the pick. Many teams use both.
Templates
The templates/ directory provides:
AGENTS.md— generic project instruction templateconfig.toml— user-level config with profilesagent-explorer.toml— read-only exploration subagentSKILL.md— Skill template (with Gotchas + trigger-style description)
References
Official
- Repo: openai/codex (Apache-2.0, Rust)
- Docs: developers.openai.com/codex
- Install guide: docs/install.md
- Best practices: Best practices
- Sandboxing internals: Sandboxing
- Slash commands: Slash commands
- Official GitHub Action (CI integration): openai/codex-action
- Official skills catalog: openai/skills
High-quality community indexes (verified high-star)
- RoggeOhta/awesome-codex-cli — 280+ resources (subagents, skills, plugins, MCP, IDE integrations, CI), categorized
- shanraisshan/codex-cli-best-practice — 50 battle-tested prompting tips + complete
.codex/reference impl (aligned with v0.125.0) - ComposioHQ/awesome-codex-skills — 38 commonly-used skills (dev tools, data analysis, Composio 1000+ SaaS integrations)
- VoltAgent/awesome-codex-subagents — 136+ subagents across 10 domains
- hashgraph-online/awesome-codex-plugins — first Plugin marketplace index
- agents.md — cross-tool AGENTS.md standard (60k+ adopting projects; works in Codex / Claude Code / Gemini CLI)
Mainstream workflow frameworks
| Framework | Stars | Flow |
|---|---|---|
| Superpowers | 171k | Brainstorm → write plan → subagent-driven dev → TDD → review → ship branch |
| Spec Kit | 92k | /speckit.constitution → specify → plan → tasks → implement |
| oh-my-codex | 27k | $deep-interview → $ralplan → $ralph |
| Compound Engineering | 16k | /ce-ideate → brainstorm → plan → work → review → compound |
Compared to Claude Code
In this repo: workflows/tool-selection.en.md
Verified through: 2026-04-30 (codex CLI v0.125.0). Every CLI flag, subcommand, slash command, and config field in this guide was cross-checked against the
codex-rssource code; models and pricing change quickly — the OpenAI docs are always the source of truth.