news.md
June 28, 2026 · View on GitHub
🔥🔥🔥 News (Pacific Time)
-
June 28, 2026 (latest): New
accept-editspermission mode — the middle ground betweenautoandaccept-all— plus an exposedplanmode and corrected permission docs. Previouslyautoasked before every file edit whileaccept-allran everything (destructive shell commands included), with no setting in between — awkward for a coding session where you trust the model to edit files but want a prompt before it runs commands.accept-editsfills that gap: it auto-runsWrite/Edit/NotebookEditbut still prompts for any non-allow-listed Bash, so agit push --forceorrmis never run silently (and the host-destroying hard denylist —rm -rf /,mkfs,ddto a raw disk, fork bombs — still applies at execution time, in every mode). Implemented as one branch in_check_permission; reads and Bash keep theautorules. Two cleanups shipped alongside: the already-implementedplanmode (read-only analysis, writes refused except to the plan file) is now listed in the/permissionsmenu and tab-completion instead of being reachable only via/plan; and the system prompt's "Safe vs Unsafe" section — which wrongly told the model thatautoauto-approves checkpoint-protected edits, and that unsafe ops are asked "even underaccept-all" — was rewritten to match the real_check_permissionbehavior (autoprompts for all edits;accept-alldoes not prompt; only the hard denylist blocks unconditionally). README, the 7 i18n READMEs,docs/guides/features.mdanddocs/guides/reference.mdall list the five modes. See docs/guides/reference.md. -
June 28, 2026: Memory staleness is now anchored to verification, not file mtime — and the agent is told how to keep memories fresh. Two parts. (1) The bug (PR #150).
MemorySearchrewrites a memory file to bumplast_used_at, which advanced its mtime; both the retrieval recency score and the⚠ stalewarning were derived from that mtime, so a single read of a stale, never-re-verified memory reset its recency to ~1.0 and suppressed its "verify against current code" warning — the "stale-but-confident" failure the design warns against, and worst for the most-retrieved (most-likely-acted-on) memories. The fix adds alast_verifiedfrontmatter field (defaults tocreated); staleness/recency now come fromverified_epoch(last_verified→created→ mtime fallback for legacy files), never raw mtime.touch_last_usedpreserves mtime and never writeslast_verified, so a read can't look like a write. A newMemoryVerifytool /mark_verified()is the only thing that refreshes the clock, called after the agent re-checks the claim against the environment — integral to the fix, since removing the broken implicit refresh (read = fresh) requires providing the correct explicit one. 7 regression tests encode the bug (tests/test_memory_staleness.py); existing memory tests stay green. (2) Follow-up — activating the dormant half. Merging #150 alone left the explicit-refresh path unused: the memory system prompt told the model to verify a memory before acting on it but never mentioned theMemoryVerifytool, so no memory ever got re-verified and still-correct old memories would keep the stale flag and decay in ranking forever. The prompt now explicitly instructs: after confirming a claim still holds, callMemoryVerify(the only thing that clears the flag / restores ranking); if it no longer holds,MemorySave(overwrite) orMemoryDeleteinstead. And the always-injected memory manifest — which still sorted and aged by mtime — is now anchored toverified_epochtoo, matching howMemorySearchranks (legacy files without date fields fall back to mtime, so they're unchanged). See docs/guides/features.md · docs/guides/reference.md. -
June 23, 2026 (v3.5.83): Documentation slimmed to its essentials, a native desktop app brought into the repo, and the version-string format unified. Three threads of housekeeping. (1) README / docs trim. The top-level README had grown a verbose multi-paragraph News block and several reference dumps that duplicated the guides. Each News item is now one sentence + a
[Details](docs/news.md)link (the full write-ups stay here); the 59-model Atlas Cloud list moved todocs/guides/usage.md(Option D) leaving only the 3-line example in the README; and the FAQ dropped to its three highest-value entries (MCP, Ollama tool calls, macOS PATH) with the rest pointed atdocs/guides/faq.md. Net README ~551 → ~516 lines with no content lost — everything trimmed already lived indocs/. (2) Native desktop app (desktop/). A thin Electron shell that launchescheetahclaws --web --no-authas a localhost sidecar, parses itsChat UI: http://…/chatready line, and points aBrowserWindowat it — so the production web UI becomes a native window with nothing reimplemented. It couples to the rest only through that CLI contract (verified:npm run smokelaunches the real server against this repo and confirms/chat//healthall serve), andscripts/build-app.shcan freeze the server with PyInstaller into a self-contained.dmg/.exe/.AppImageneeding neither Node nor Python on the user's machine. Surfaced from three entry points (top-of-README callout, Web UI section, Documentation table). Remaining for a shippable installer: code signing / notarization. (3) Version-string format unified. Historical release notes mixedv3.05.xandv3.5.x; all ~166 occurrences across the README, the 7 i18n translations,docs/news.md, the guides, the demo/cast generators, and the recorded.cast/.svgbanners are now the canonicalv3.5.x. Version bumped3.5.82→3.5.83inpyproject.toml. Full suite green (2449 passed, 3 skipped); the desktop sidecar smoke test passes against this repo's web server. Not a breaking change — no runtime behavior changed. -
June 16, 2026: All internal modules move into a single
cheetahclawspackage. Previously the importable modules lived flat at the top level (config.py,daemon/,kernel/,mcp_client/,providers.py, …). That works when you run from the repo dir but breaks once CheetahClaws is installed and launched from its entry point: a generic top-level name likeconfigordaemongets shadowed by whatever else is onsys.path— another project'sconfig/directory, the PyPIpython-daemonpackage — andcheetahclawsdies at startup withImportError: cannot import name … from 'config' (unknown location). (An earlier pass that merely dropped acc_prefix from four of these modules re-introduced exactly this collision, which the prefix had originally been added to prevent — so this change supersedes it.) The fix is the standard one: own a single namespace. All 21 single-file modules and 20 sub-packages now live undercheetahclaws/and are imported ascheetahclaws.<name>; the entry scriptcheetahclaws.pybecamecheetahclaws/cli.py, with a deliberately lightcheetahclaws/__init__.py(definesVERSION, lazily proxies CLI entry symbols via PEP 562__getattr__so importing a submodule never drags in the heavy CLI) and acheetahclaws/__main__.pyforpython -m cheetahclaws. Imports were rewritten across all 448.pyfiles — 1269from NAME+ 126import NAME+ 41 dottedimport NAME.substatements, 118 stringpatch/mock/import_moduletargets, subprocess-margv paths, the modular plugin f-string loaders, the voice/video back-compat shims, and embedded driver-script strings — all prefixed withcheetahclaws., using whole-word matching so RPC method names, filenames, and unrelated tokens were left alone.pyproject.tomlnow ships a singlecheetahclaws*package (nopy-modules) with entry pointcheetahclaws.cli:main;agent_templates/moved into the package so it ships as data. Triage of the move surfaced and fixed seven regression classes — kernel/daemon subprocess-margv paths, thetest_packagingimport contract, the voice shim's submodule registration, the daemon e2e launcher, tests that patched the package object instead of theclimodule, tests with hardcoded repo-root data paths, and asys.modulesstub-restore leak betweentest_researchandtest_setup_wizard. Breaking only for code that imports CheetahClaws internals directly —import kernel→from cheetahclaws import kernel,from mcp_client.client import get_mcp_manager→from cheetahclaws.mcp_client.client import get_mcp_manager; thecheetahclawsCLI,python -m cheetahclaws, the Web UI, and all bridges are unaffected. Verified end-to-end:python -m cheetahclaws --versionandfrom cheetahclaws import configboth work from outside the repo (the original crash), a built wheel containscheetahclaws/*with all data files (web, prompts, agent_templates) and zero bare top-level modules, and the full suite is 2449 passed, 3 skipped, 0 failed. -
June 6, 2026 (v3.5.82): macOS install reliably puts
cheetahclawson PATH, and local Ollama models that emit tool calls as text now actually execute them. Two fixes reported in issue #131. (1) Install / PATH on macOS. On macOS the installer creates a dedicated venv (~/.cheetahclaws-venv) andsources it, so the post-install verificationif command -v cheetahclawssucceeded inside the script's own activated shell — it printed "cheetahclaws is on PATH" and short-circuited past the entire rc-file block, including thetouch ~/.zshrcthat was supposed to create the file. Result:~/.zshrcwas never created/updated, and in a fresh terminal (no venv active) the binary was unreachable, so users had to hunt for the install location by hand. The verification step no longer trusts the venv-pollutedcommand -v: it confirms the binary at the expectedBIN_DIR, then (for venv installs) symlinks only thecheetahclawsentry point into~/.local/bin— pipx-style, so the venv'spython/pipnever get prepended to PATH and can't shadow the user's own — creates the right rc file if missing (~/.zshrcfor zsh,~/.bash_profilefor bash on macOS,config.fishfor fish), and appends the exposure dir to PATH there. The fish branch now also writes fish (set -gx PATH …) syntax instead ofexport, and the reload hint points bash-on-macOS at.bash_profile(scripts/install.sh). (2) Ollama tool calls (the "model just keeps talking" bug). The Ollama streaming path (stream_ollama) only read tool calls from Ollama's structuredmessage.tool_callsfield, whereas the OpenAI-compatible cloud path (stream_openai_compat) also recovers tool calls a model emits as text via_find_native_tool_marker+_extract_native_tool_calls. Many local models — Qwen-coder, Gemma, Mistral — emit calls as<tool_call>{…}</tool_call>/<|tool_call|>…/[TOOL_CALLS][…]insidecontent; on the Ollama path that markup was streamed straight to the screen as chat and never executed, so the agent loop saw no tool calls and ended the turn — exactly the reported "tool-calling-style chat that never runs."stream_ollamanow mirrors the cloud path: when a native marker appears in the streamed content it buffers from that point (so the user never sees raw markup), and at end-of-stream parses the buffer into real tool calls (falling back to surfacing the buffered text if parsing fails, so nothing is silently swallowed). Note: Ollama's native/api/chatdoes not accept atool_choiceparameter, so the fix is the text-format recovery, not a request-param change. Existing provider + cache-token suites stay green. See docs/guides/usage.md · docs/guides/faq.md. -
June 5, 2026 (v3.5.82): User-controllable token / cost budgets — set a spend cap; on hit the session auto-saves and you can resume or raise it. The quota engine (
quota.py: per-session + per-day token/cost counters, enforced before each model call) already existed but had no friendly surface — you had to know four config keys (session_token_budget/session_cost_budget/daily_token_budget/daily_cost_budget) and there was no way to see how close you were, no warning before the wall, and the hard stop printed a bare[Quota exceeded]. This adds the UX layer on top of the unchanged engine: a/budgetcommand — no args shows usage vs every budget as colored bars + percentages;/budget \$5sets a session cost cap (the$means USD),/budget 200ka session token cap (parses200k/1.5m/200000),/budget daily \$20//budget daily 2mthe daily caps, and/budget clearremoves all. A--budget \$5/--budget 200kstartup flag sets the session cap at launch. Proximity warnings fire at the end of any turn that crosses ≥80% (yellow) / ≥95% (red) of a cap, so the wall never arrives by surprise. On hit the agent now yields aQuotaPauseevent (instead of a plain text line): the REPL auto-saves the session (session_latest.json+ daily backup, the same path/resumereads) and prints a friendly next-steps block — raise the same cap or remove it (/budget clear) then resend, or restart later and/resume. So a long task that runs out of budget is never lost: you analyze, adjust, and continue. Tight enforcement (no surprise overshoot): the check projects the next request's input (compaction.estimate_tokens) and stops before the call if it would cross the cap, and clamps that call'smax_tokensto the remaining headroom (quota.output_room) — so a single tool-heavy turn can't blow 40k→49k past the budget the way a pure "already-spent ≥ limit" check let it. One budget per scope: setting a cap replaces the other unit for that scope (/budget \$5after/budget 200kswitches the session cap to cost rather than stacking), so a leftover token cap can't silently keep blocking after you switch to a$cap. Unit-matched hint:QuotaExceeded/QuotaPausecarry which cap broke (key/scope/unit/limit), so the "raise it" suggestion is in the right unit — a token cap shows/budget 40k, a daily cost cap shows/budget daily \$40— instead of a generic$amount that wouldn't lift a token cap. New helpersquota.parse_budget/fmt_amount/usage_vs_limits/warnings/output_room; command incommands/core.py:cmd_budget;QuotaPauseinagent.py; REPL handling +--budgetincheetahclaws.py; 42-casetests/test_budget.py(isolated quota dir, incl. a regression that the hint matches the breached unit and that switching units clears the stale cap). The daemon's conservativeserve-mode defaults (200k tok / $2 per session, 2M / $20 per day) are unchanged — interactive stays unlimited by default, the server stays guard-railed. See docs/guides/features.md · docs/guides/reference.md. -
June 5, 2026 (v3.5.82): Adaptive Markdown streaming — live output that stays correct on every device. In-place Rich Live redraw is great on capable terminals but breaks elsewhere: it was disabled wholesale over SSH (so SSH users got raw tokens with no formatting), and where it did run it could leave duplicate or stale frames — on macOS Terminal (which can't erase above the scroll boundary), over laggy network PTYs, or with wide CJK / emoji text whose display width a naive line-count gets wrong. The renderer now selects a streaming tier per device in
ui.render.auto_stream_mode(config):live— full in-place redraw, only on terminals known to handle cursor-up (local TTYs, and modern emulators even over SSH: iTerm2, WezTerm, Windows Terminal, VSCode, kitty, Alacritty, Ghostty, detected viaTERM_PROGRAM/TERM/WT_SESSION/KITTY_WINDOW_ID/ALACRITTY_WINDOW_ID/WEZTERM_PANE);commit— append-only progressive Markdown, the safe default for unknown-SSH / Apple Terminal / pipes / non-TTY, where each completed block (split on blank lines, respecting open code fences so a fenced block renders atomically) is rendered and printed permanently and the cursor is never moved, making a duplicate frame structurally impossible regardless of terminal, latency, or character width;plain— raw tokens, only whenrichis unavailable. The append-only floor is provably duplication-free;liveis progressive enhancement on top. Override with/config stream_mode=live|commit|plain(legacy boolean/config rich_live=true|falsestill works →live/commit). Implemented inui/render.py(set_stream_mode/auto_stream_mode/_safe_commit_point/_commit_stream/_commit_flush), wired in at REPL start incheetahclaws.py, with a 26-case test suite intests/test_stream_modes.py(device routing, code-fence-aware block boundaries, append-only commit, and a regression asserting commit mode emits zero cursor sequences even on a TTY with CJK text). Two related UX items shipped alongside:/contextis now a visual grid — a Claude-Code-style 20×10 cell grid of context-window usage, colored and broken down by category (system prompt / system tools / memory files / skills / messages / free space) with per-category token counts and percentages, adapting to the model's real context window and falling back to#/.on non-UTF-8 terminals (commands/core.py:cmd_context); anddeepseek-v4-flashis registered at its 1M context window inproviders._MODEL_CONTEXT_LIMITS(overriding the 128K deepseek provider default, which still applies todeepseek-chat/deepseek-v4-pro), so the prompt%,/context, and the compaction trigger all reflect the true 1M window. See docs/guides/features.md · docs/guides/reference.md. -
June 4, 2026 (v3.5.81): Claude-Code-style quiet output — hide tool execution, show one summary line per turn. Long analysis turns used to scroll the terminal with a
⚙ Bash(...)line and a✓ → N lines (… chars)line for every tool call, and the permission prompt dumped the entire inline script (e.g. a 60-linepython3 << 'PYEOF'heredoc). A new quiet mode (on by default) suppresses the per-tool lines — the spinner conveys live activity and a single summary line is emitted at the tool→text boundary, sitting just above the reply (Read 2 files, ran 3 shell commands), the way Claude Code does. Errors and denials still surface so a mid-turn failure is never silent. In quiet mode the permission prompt also collapses a multi-line command to one line (Run: python3 << 'PYEOF' … (+59 行)) instead of printing the whole script./verboseoverrides quiet (full per-tool lines + inputs + token counts); toggle with/quiet, or launch with--show-tools(alias--no-quiet). The startup banner gains anOutput: quiet/Output: fullline so the active mode is visible at a glance. Live status line: the spinner now shows elapsed time plus a running output-token estimate (Thinking… (7s · ↓ 435 tokens)) — char-based, since providers only report real usage at the end — and each quiet turn closes with a real-usage footer✻ Worked for 7.2s · ↑ 1.2k · ↓ 435built from the trueTurnDonecounts. Implemented inui/render.py(turn-level tool accumulator +turn_summary_line(), spinner token meter,print_turn_stats()), wired through the REPL event loop incheetahclaws.py, with the/quiettoggle incommands/config_cmd.py. See docs/guides/features.md. -
June 4, 2026: Context-window override — the prompt % and compaction now follow a settable context length. The prompt's context-usage
%(and the compaction trigger) derive from the model's context window, which previously could only be a hardcoded provider default — andmax_tokens(the OUTPUT cap) doesn't change it, so/config max_tokens=…left the%unchanged (a common point of confusion). New per-session keycontext_window(/config context_window=<N>,0= model default) overrides it, kept deliberately distinct frommax_tokens. A single parser (providers.context_window_override) feeds the prompt%,/context, the compaction trigger, and the per-call output-token cap, so all four stay consistent; it is bidirectional — a smaller value forces earlier compaction, a larger value corrects a stale default. The value is read live each prompt, so switching model orcontext_windowupdates the%with no restart./configwarns when the value exceeds the model's real window (which would disable compaction and let the API reject oversized prompts). No-op when unset, so existing behavior is unchanged. See docs/guides/reference.md. -
June 4, 2026: Rich Live streaming — long responses stay live via a bounded tail window. Large streamed responses that would overflow the terminal's redraw area could leave duplicate or stale frames behind on some emulators (macOS Terminal, etc.), because Rich Live redraws the whole accumulated output in place and the cursor can't reach content that has scrolled into the scrollback. Building on the per-response fallback from PR #133, Rich Live now keeps the live region bounded to the viewport: a short response is shown in full, but once it would overflow, only the last screenful of rendered lines (a tail window) is redrawn — so the Live region can never exceed the terminal and cannot leave stale frames. The complete output is committed once when the response finishes (including on Ctrl-C, since the REPL flushes on interrupt), so the head that scrolled out of the window is never lost. Plain streaming is kept only as a safety net (precise render failed, or the terminal is too small to bound a window). A cheap per-line wrap estimate short-circuits the expensive full
render_lines()measurement while a response stays well under the limit, so normal responses pay no extra Markdown re-render per chunk. Adds focused tests covering full-frame streaming, the full→tail transition, tail-window commit-on-flush, realSegmentsrendering, and both safety-net fallbacks. See docs/guides/features.md. -
May 31, 2026: QQ bot bridge —
/qqconnects cheetahclaws to QQ groups + C2C private chats (PR #121). Uses the officialqq-botpyWebSocket + HTTP SDK (pip install "cheetahclaws[qq]"). botpy's async client runs on a dedicated asyncio event loop inside a daemon thread, bridged to the synchronous main thread via thread-safe queues. Handleson_group_at_message_create(group @-mentions, prefix stripped) andon_c2c_message_create(private). Since QQ has no message-edit API, replies stream as new messages every ~2 s (2000-char chunking) instead of updating a placeholder; passive replies reference the originalmsg_id/event_idwithin QQ's 5-minute window, then fall back to active pushes. Per-target FIFO job queues, slash-command passthrough,!jobs/!retry/!cancelremote control, image input, and permission prompts scoped to the originating chat (no cross-chat approvals). A supervisor reconnects with exponential backoff (2 s → 120 s). Secret handling matches the hardening standard below:$QQ_SECRET(recommended) > REPL arg (deprecated, warns + scrubs history) > config; env-supplied secrets never touch~/.cheetahclaws/config.json./qq <appid>,/qq,/qq stop|status|logout. Two follow-up fixes over the original PR: image downloads moved off the event loop intoloop.run_in_executor(a blockingurlopenwould freeze the WebSocket heartbeat for up to 30 s), and the secret no longer gets written to disk unconditionally. See docs/guides/bridges.md. -
May 12, 2026 (v3.5.80): (security-hardening branch): Two-round security hardening sweep — CRITICAL + HIGH findings from the in-repo code review. Lands a cluster of fixes that close real attack surfaces opened by the recent rapid feature growth. Zero regressions across the full 2347-test suite.
Bot tokens off
argv/ readline history.cmd_telegramandcmd_slacknow accept a single-arg form (/telegram <chat_id>//slack <channel_id>) and read the bot token from$TELEGRAM_BOT_TOKEN/$SLACK_BOT_TOKEN. Env-supplied tokens never get persisted to~/.cheetahclaws/config.json; only tokens that actually came in via the deprecated REPL-arg path are saved on disk. Newbridges.scrub_token_from_history(token)walksreadline.get_history_itembackwards and removes any in-memory entry that embeds the token the moment we know its value. Bridge supervisors get atoken=/channel=kwarg so the env-sourced token can flow to the worker thread without ever sitting on the config dict —_slack_start_bridge(config, *, token, channel). Telegram already passed the token explicitly to_tg_supervisor. WeChat is unaffected (QR-scan token, never in argv).Web UI CSRF — double-submit cookie. Server mints
ccsrf=<24B>; Path=/; SameSite=Strict; Max-Age=86400(non-HttpOnly) on every connection that arrives without one._handle_connectiongates POST/PUT/PATCH/DELETE on a matchingX-CSRF-Tokenrequest header (rejection:403 csrf token mismatch). Exempt:/api/auth/{bootstrap,register,login,logout,api/auth}— they establish the session that later carries the cookie. Newweb/static/js/csrf.jsmonkey-patcheswindow.fetchso every state-changing request automatically echoes the cookie value; loaded as the first script inchat.html, the inline terminal script in_build_html, andlab.html. Test harness (tests/test_web_api.py:_client) gains anhttpxevent hook that mirrors the browser behaviour. SameSite=Strict on the JWT cookie remains the first-line defence; CSRF is the second line.Web terminal session ownership.
_PtySession(owner_uid=...)records the creator's JWTsubat/api/sessiontime._check_pty_owner(session, cookie)is consulted at/api/stream//api/input//api/resize— any other authenticated user trying to reach a knownsidgets403 not session owner. Password-only mode (no JWT) keepsowner_uid=Noneand skips the check, preserving the shared-secret model. Closes the trivial-sid-hijack hole in multi-user web deployments.Bash hard-denylist. Eight regexes in
tools/shell.py:_BASH_HARD_DENYrefuse host-destroying patterns regardless ofpermission_mode—rm -rf /and its--recursive/--forcevariants,rm -rf /*,mkfs.*,dd of=/dev/{sd,hd,nvme,vd,mmcblk,xvd},> /dev/{sd,hd,...},chmod -R 777 /,chown -R <user> /, and the classic:(){ :|:& };:fork bomb. Hits the Bash tool, the REPL!cmdescape, and every bridge's!cmdpath. Plus NUL-byte + control-char + 64 KB length rejection on every Bash invocation.Filesystem credential denylist.
tools/security.py:_check_path_allowednow refuses access to a small denylist by default — SSH private keys (~/.ssh/id_*),~/.aws,~/.gnupg,~/.kube,~/.docker,~/.netrc,~/.pgpass,/etc/shadow,/etc/gshadow,/etc/sudoers*,/root. Public-by-convention SSH files (config,known_hosts,authorized_keys) remain readable. SetCHEETAHCLAWS_FS_NO_SANDBOX=1to bypass when intentionally auditing your own secrets. Independent ofallowed_root, which still works as the strict-mode toggle for multi-user daemon deployments.Plugin loader hardening. Two new env switches in
plugin/loader.py:CHEETAHCLAWS_DISABLE_PLUGINS=1(kill switch) andCHEETAHCLAWS_PLUGIN_ALLOWLIST=a,b,c(whitelist). EXTERNAL-scope plugins (loaded via$CHEETAHCLAWS_PLUGIN_PATH) print a one-time stderr warning on first load so a stolen env-var-set doesn't silently execute. Module path resolution now usesPath.resolve()+relative_to(install_dir)to confine a malicious manifest's"tools": ["../../etc/passwd_loader"]style entry.MCP env sanitisation.
mcp_client/client.py:_sanitized_mcp_envstrips a fixed set of process-hijack keys (LD_PRELOAD,LD_LIBRARY_PATH,LD_AUDIT,DYLD_INSERT_LIBRARIES,DYLD_LIBRARY_PATH,PYTHONPATH,PYTHONSTARTUP,PYTHONHOME,PYTHONEXECUTABLE,NODE_OPTIONS,NODE_PATH,BASH_ENV,ENV) from anyenvmap an.mcp.jsonconfig supplies. Dropped keys print a one-line stderr notice. Bypass:CHEETAHCLAWS_MCP_TRUST_ENV=1. Closes a real local-priv-esc path on a host with multiple MCP server configs of varying trust.macOS daemon peer-cred.
daemon/auth.py:get_peer_uidnow branches onsys.platform: Linux keepsSO_PEERCRED, macOS / *BSD goes through ctypes-loadedgetpeereid(2). Closes a long-standing TODO that effectively reduced macOS Unix-socket auth to token-only (a stolen daemon-token implied full RCE without peer-uid validation).Smaller fixes folded in. Web JWT secret loader rewritten with
O_CREAT \| O_EXCL+ 0o600 + post-write mode verification (refuses to read a world-readable secret file; auto-falls-back to in-memory secret if chmod can't be enforced; override withCHEETAHCLAWS_WEB_SECRET). Terminal one-time password fromsecrets.token_urlsafe(6)[:6](~30 bits, online-bruteable) tosecrets.token_urlsafe(32)(~190 bits).config.save_configstripspermission_mode=accept-allbefore persisting — once-confirmed escape hatches no longer outlive the session that set them.session_store.save_sessionwrapped in a module-levelLock+ explicitBEGIN IMMEDIATE/ROLLBACKso two threads writing the samesession_idno longer silently drop one set of changes.agent_runner.pyerr_msginitialised before the try block (defends against aNameErroron first iteration if_handle_permission_requestreturns"error");quota.QuotaExceededmatched byisinstanceinstead of class-name string.compaction.compact_messageswrapsstream_auxiliaryin try/except + falls back to the original messages instead of crashing the agent loop.providers._recover_args_from_textcaps the regex scan window to the last 32 KB of accumulated text (was scanning ~100 KB+ on every tool call).context.get_git_info+get_claude_mdget TTL caches (30 s / 10 s, keyed by cwd) so the per-turngit rev-parse / status / logand CLAUDE.md re-read stop showing up in profiles.mcp_client/client.pyreader loops usedict.pop()instead ofin+index so a late response after a timeout doesn't race the request side.tool_registry._cache_keyaddssession_iddimension so aRead(/etc/...)cached for one session never leaks to another.session_store.search_sessionsLIKE-fallback path escapes%/_/\before interpolation.Frontend XSS audit. Existing
_esc(textContent-→-innerHTML) and_renderMd(HTML-tag-strip → marked) cover all user/model content paths. One deep-trust hole closed:web/static/js/settings.js:_renderModelspreviously injected server-supplied model names directly into anonclick="app.selectModel('${full}')"attribute — now usesdata-model+ a delegated click handler, so a malicious model registry entry cannot break out of the JS string literal.Defaults you can flip.
CHEETAHCLAWS_BRIDGE_TERMINAL=0hard-disables the bridge!cmdshell entirely (default1, owner-bound bychat_idwhitelist anyway).CHEETAHCLAWS_FS_NO_SANDBOX=1lifts the credential denylist.CHEETAHCLAWS_DISABLE_PLUGINS=1/CHEETAHCLAWS_PLUGIN_ALLOWLIST=…/CHEETAHCLAWS_MCP_TRUST_ENV=1control plugin + MCP behaviour. Full reference in docs/guides/security.md. All 12 CRITICAL + 10 HIGH items from the review now closed (4 of those 22 turned out to be review misjudgements —_all_errorsinit, permission double-answer race,_broadcastiter race, and theQuotaExceededclassname check was a real fix but the surrounding "shell injection in REPL!command" was reclassified as user-typed-input not RCE). Architecture refactor items (cheetahclaws.py/providers.pyGod-object split, sentinel state machine) deliberately left for a separate decision — they're shape changes, not bug fixes. -
May 12, 2026 (
daemon/f-4-followups-f-6-9branch): Daemon foundation roadmap finished — all nine F-1…F-9 items in RFC 0002 now LANDED. Closes the remaining four scope items end-to-end (≈1500 LoC of code + ≈900 LoC of tests + docs). Drilldown:F-4 #2 — Bridge
notifyforwarding. The subprocess-runner reader loop'snotifyIPC branch used to drop the payload on the floor (F-6/7/8 didn't exist yet). Now it routes throughdaemon.bridge_supervisor.notify(kind, text). The runner can target a specific bridge viamsg["bridge"](e.g."telegram") or omit it for a"*"broadcast.agent_runner_notifyevents on the bus carry{name, run_id, bridge, delivered, text[:500]}so observers can audit deliveries. Empty-text frames are silently dropped (common during agent shutdown).F-4 #3 — Restart policy. New
RestartPolicydataclass:mode(none|on-crash),max_restarts,backoff_base_s,backoff_cap_s,backoff_jitter_s. Frozen + a purenext_delay(restart_count)so the decision matrix is unit-testable.agent.startaccepts the five fields flat (validation rejectscap < basewhich would clamp every attempt down to a useless ceiling). On a crash the reader'sfinallyarms athreading.Timer(delay, _do_restart, ...); the Timer respawns via a swappable spawner hook (_RESTART_SPAWNERfor tests) and carriesrestart_countforward.stop()cancels the Timer before the kill ladder, and the same_unregister(name, expected=handle)identity check protects against a Timer-fired respawn racing past a deliberate stop. Bus events:agent_runner_restart_scheduled,agent_runner_restart,agent_runner_restart_failed,agent_runner_restart_exhausted.F-6 / F-7 / F-8 Phase 1 — Telegram / Slack / WeChat in daemon. Single
daemon/bridge_supervisor.pyowns lifecycle for all three kinds, gated per-bridge by feature flags (CHEETAHCLAWS_ENABLE_F6/7/8, default off, REPL is byte-for-byte unchanged until the operator opts in). The Phase 1 worker invokes today'sbridges/<kind>.py:_<kind>_supervisorunchanged — same HTTP code, same reconnect/backoff, just owned by a daemon thread instead of a REPL one. Outboundbridge.notify(kind, text)dispatches via the per-kind sender (_tg_send/_slack_send/_wx_send); F-4 #2 plugs straight into it. Persistence in the F-2bridgesSQLite table (kind,enabled,config_jsonwith secrets redacted,last_poll_at,last_error);bridge.listmerges live workers with rows from previous daemon runs so disabled bridges remain visible indaemon status. Wire surface:bridge.{start,stop,list,send,status}RPCs indaemon/bridge_methods.py. F-7 depends on F-6 (shared scaffolding); F-8 the same. WeChat keeps a clear-error path for missing token/base_url since the QR-login handshake is still REPL-driven (/wechat login).F-6 Phase 2 — Inbound refactor. When
bridge.start daemon_phase2=Trueis passed, the legacy supervisor is bypassed for a slim daemon-driven loop: (a) outbound subscriber on the event bus, filterssession_outboundevents bysession_id(tg:<chat_id>/sl:<channel>/wc:<user_id>) +target_bridges, callshandle.senderfor delivery; (b) per-kind inbound poller (_phase2_telegram_inbound/_phase2_slack_inbound/_phase2_wechat_inbound) that re-uses today's HTTP helpers but publishessession_inboundon every new phone message instead of callingsession_ctx.run_query. The agent driver — REPL, Web, or a future automation client — subscribes tosession_inbound, runs the agent, callssession.reply(session_id, text, target_bridges?)for outbound chunks. Three new RPCs indaemon/session_methods.py:session.send,session.reply,session.list_recent. Permission requests born inside a bridge-driven turn route only back to the originating bridge via the existingPermissionStoreoriginator stamp (<kind>:<session_id>).F-9 — Cost-guardrail defaults + per-runner quota-pause. Headless
cheetahclaws servenow sets four conservative defaults (session_token_budget=200_000,session_cost_budget=\$2,daily_token_budget=2_000_000,daily_cost_budget=\$20) via_apply_serve_defaults; REPL--in-processkeepsNone(unlimited) for back-compat. Newsystem.statusRPC returns{budgets, runners, bridges}sodaemon statusprints the live ceilings.agent.resume(budget_overrides, name?)merges overrides intodaemon_state.configand (whennameis supplied) callsrunner_supervisor.resume(name)to deliver aresumeIPC frame to a paused runner. The hook itself: a new pre-iterquota.check_quotaraises into_on_quota_exceeded; the base impl is a no-op (REPL keeps today's behaviour whereagent.runcatches internally and yields a quota text), while_PipeAgentRunneroverrides it to ship apaused_budgetIPC frame, set status, and block on_resume_event.wait(). Supervisor reader publishesquota_warn+ flipsagent_runs.status='paused_budget'. On resume, runner sendsresumedIPC, supervisor publishesagent_runner_resumed+ flips status back torunning. Control loop'sstophandler also sets_resume_eventso a stop arriving while paused unblocks cleanly.Post-implementation audit fixed 5 real bugs in the new code. (1)
_phase2_wechat_inboundused wrong field names (messages/fromUserName/msgId/syncKeyinstead ofmsgs/from_user_id/message_id/get_updates_bufperbridges/wechat.py:411). (2)_phase2_slack_inboundinitialized cursor toNone, so the first poll would replay the channel's recent backlog — fixed to seed at current wall-clock time (matchesbridges/slack.py:_slack_poll_loop). (3)_phase2_telegram_inboundlong-polled withtimeout=25 s, meaningstop()had to wait up to 25 s for the HTTP call to return before observingstop_event— dropped to 5 s. (4)_unregister(name)was identity-blind; a Timer-fired_do_restartracing withstop()could see its freshly-spawned successor handle silently popped (orphaning the subprocess). Added an optionalexpected=handleidentity check applied at every terminal stop site (runner_supervisor + bridge_supervisor have the symmetric fix). (5)_safe_cfgonly matchedtoken/secretkeys; sincebridge.startmergesdaemon_state.configinto the bridge config, provider API keys (anthropic_api_key, etc.) andpassword/auth_*fields could bleed through to bridges SQLite rows and SSE events — extended to(token, secret, api_key, apikey, password, passwd, auth). Two new regression tests pin both.Full repo suite (three independent runs): 2347 passing, 3 skipped (env-gated live LiteLLM tests), 0 failed, ~3:32 each. ~90 new daemon-specific tests across
test_daemon_runner_{restart_policy,notify_routing,quota_pause}.py,test_daemon_{bridge_supervisor,bridge_methods,bridge_phase2,session_methods,f9_budgets}.py. RFC 0002 +docs/architecture.md §Daemonupdated to reflect all of F-1 → F-9 landed. Details: RFC 0002. -
May 12, 2026 (
fix/litellm-provider-followupbranch):litellm/provider follow-up to PR #119 — make litellm a real optional dep, fix ledger / streaming, and wire it into the CLI / Web UI path. PR #119 (RheagalFire) introducedkernel/runner/llm/litellm_provider.pyso CheetahClaws could route to 100+ LLM providers behind one SDK, but a careful re-review against the merge surfaced four classes of integration gap that the 12 mocked unit tests didn't catch. The follow-up branch (fix/litellm-provider-followup, 2 commits, 9 files, +1093/-229) fixes all of them and lands the docs the original PR was missing. (1) Dependency classification — description said optional, diff put it in core. Pyproject's[project] dependencieshad grown alitellm>=1.60.0,<2.0.0line, andrequirements.txt's core block matched; everypip install cheetahclawswas force-pulling litellm and its transitive chain (tokenizers,tiktoken, pinned pydantic versions). Moved to[project.optional-dependencies]under a newlitellmextra, also added toall;requirements.txtnow only documents the optional install via a comment. Backed up by atest_litellm_is_optional_dependencyregression. (2) Not reachable through either user path.kernel/runner/llm/__main__.py:_select_provideronly knewmock/scripted/anthropic, and the top-levelproviders.PROVIDERSregistry (which the CLI + Web UI consult to resolve--model <X>) had nolitellmentry at all, so end-to-end the new class was reachable only by direct Python import. Added alitellmbranch to_select_provider(readsCC_LLM_API_KEYas an optional explicit override), aPROVIDERS["litellm"]entry withtype: "litellm", and a newstream_litellm()generator inproviders.pymirroringstream_openai_compat's shape — yieldsTextChunkper delta thenAssistantTurnat end. The dispatcher inproviders.stream()branches onprov["type"] == "litellm".bare_model("litellm/openai/gpt-4o")strips only the first/, leavingopenai/gpt-4o— exactly whatlitellm.completion(model=...)expects. (3) Streaming silently zeroed the ledger.stream()returnedtokens_input=0,tokens_output=0,tool_calls=(),finish_reason="stop"unconditionally. The kernel runner emitschargeIPC messages from those fields and gates RFC 0022 tool dispatch onresponse.is_tool_use, so every streamed call bypassed quota and lost any tool_use the model emitted. Fix passesstream_options={"include_usage": True}tolitellm.completionand reassembles the chunk list withlitellm.stream_chunk_builder(chunks, messages=...)so the synthesized final response carries real token counts, tool_calls, and finish_reason. Two regression tests pin the contract (test_stream_emits_deltas_and_returns_usage,test_stream_preserves_tool_calls); a third (test_cost_unknown_set_when_chunk_builder_fails) covers the fallback when the builder returns None on very old litellm versions. (4)cost_microhard-coded to 0 — quota free pass. Both__call__andstream()returnedcost_micro=0regardless of model. Switched tolitellm.completion_cost(completion_response=resp, model=model)which uses litellm's per-model price table (covers 100+ providers, kept in sync upstream); convert USD → micro-USD via the same* 1_000_000factorAnthropicProvideruses. Oncompletion_costraising (unknown model) or returningNone, the response carriesmetadata["cost_unknown"]=Trueso the ledger can distinguish a real $0 (Ollama, free NIM tier) from an unpriced call. Exception mapping.try: ... except Exception: raise ProviderUnavailable(...)swallowed every error class into "their fault" — 401s, malformed requests and connection timeouts all looked the same to the runner. New_map_exceptionreadsself._litellm.exceptions.{AuthenticationError, BadRequestError, NotFoundError, UnsupportedParamsError}and re-raises those asProviderInvalidRequest("your fault"); everything else staysProviderUnavailableso the runner may retry. Reads exception classes off the already-importedself._litellmmodule (instead offrom litellm import exceptions) so the mapper stays testable without a real SDK installed. Lazy import. Top-levelimport litellmviolated the module-level contract inkernel/runner/llm/__init__.py("imported lazily so the absence of an SDK doesn't break this module's import") — every place that imported the runner's LLM package was implicitly importing litellm. Refactored to an_ensure_litellm()first-use pattern matchingAnthropicProvider._ensure_client, with atest_module_imports_without_litellmthat strongly verifies the property (the local dev env doesn't have litellm installed — the test passes). Self-review caught 5 more bugs before pushing. (a)_parse_tool_callscalledtc.function.nameoutside the try block — a malformedtool_callwithfunction=Nonewould crash the whole response instead of the single bad call; fixed bygetattrchain +continue-on-empty-name. (b)json.loads("null")andjson.loads("[1,2]")returnNone/list, which tripLlmResponse.__post_init__'sisinstance(tc["input"], dict)validator; fixed by coercing non-dict to{}. (c) Same JSON-non-dict bug inproviders.stream_litellm's streaming tool-call assembly; sameisinstanceguard. (d) The streaming fallback (whenstream_chunk_builderreturnsNone) emittedmetadata={}instead of{"cost_unknown": True}, breaking ledger consistency. (e)tests/e2e_litellm_provider.py's fixture'stry/except ImportErrorwas dead code once the import was lazy — would confusingly fail on real assertions rather thanpytest.skipifCC_LITELLM_E2E=1was set on a box without litellm. Replaced with an explicit_ensure_litellm()probe +pytest.skiponProviderUnavailable. 6 new defensive tests pin all five fixes. Tests. 23 unit tests intests/test_litellm_provider.py(was 12 mocked-only) — covers lazy import, registry wiring (both_select_providerandproviders.PROVIDERS), cost computation withcost_unknownfallback, streaming usage + tool_calls preservation, exception class mapping (AuthenticationError→ProviderInvalidRequest), and 6 defensive tool-call parsing regressions. Newtests/e2e_litellm_provider.pymirrors the 3 live-API tests the PR body claimed but never committed (basic call, streaming, system prompt steering); skipif-gated onCC_LITELLM_E2E=1AND per-provider credentials so CI / dev runs don't accidentally bill. Full non-e2e suite: 2222 / 2222 passing, zero regressions (up from 2154 baseline). Docs. New section indocs/guides/recipes.mdunder Section 1, between the vLLM/custom/walkthrough and Section 2 — covers Bedrock SigV4, Azure deployment routing, Vertex service-account JWTs with concrete env-var setup, plus a 5-row troubleshooting table mirroring the existing vLLM one (litellm not installed,drop_paramsmasking,cost_unknownsemantics, Bedrock 401 region mismatch, Azure 403 staleapi_version). README gains apip install ".[litellm]"line in Optional extras, three Supported Models table rows (Bedrock / Azure / Vertex via litellm), and a dedicated LiteLLM (AWS Bedrock / Azure / Vertex AI) subsection under Closed-Source API Models with concrete invocation examples and an explicit pointer towardcustom/for plain OpenAI-shaped endpoints so users don't pull litellm when they don't need it. i18n READMEs (CN/JP/ES/DE/PT) intentionally left for the maintainer's translation cadence. Branch:fix/litellm-provider-followup(2 commits —abc3357code + tests + recipes,f5f364dREADME), open for review againstmain. -
May 11, 2026 (
daemon/f-4branch): F-4 skeleton —agent_runnerbecomes a supervised subprocess (RFC 0002). The fourth piece of the daemon foundation roadmap lands as a feature-flagged skeleton on thedaemon/f-4branch. Today each/agent <template>runner lives in a Python thread inside the REPL / web server process — one rogue runner can OOM-kill or hang the whole thing. F-4 makes each runner its ownpython -m agent_runner --pipesubprocess underdaemonsupervision so a leak, infinite loop, segfault, orkill -9on the runner becomes an observable event (agent_runner_crashon the daemon event bus) instead of a process-wide failure. Components: (1)daemon/runner_supervisor.py(~650 LoC) —start/stop/stop_all/get/list_all, 3-phase stop (IPCstop→ SIGTERM after 2 s → SIGKILL after another 3 s, bounded ≤ 5 s as required by the RFC acceptance criteria), background reader thread per runner pumpingiteration_done/permission_request/notify/logIPC messages, crash classification on EOF, and best-effort writes to F-2'sagent_runs+agent_iterationsSQLite tables (INSERT OR IGNOREmakes iteration re-delivery idempotent;last_iterationUPDATE never regresses). (2)daemon/runner_ipc.py— thin re-export ofkernel.runner.ipc.JsonLineChannelso the kernel-side and daemon-side runners share one wire-format implementation (avoids the duplicate-fix-twice trap). (3)daemon/agent_methods.py— four JSON-RPC methodsagent.start/agent.stop/agent.list/agent.statusregistered alongside the F-3monitor.*family, with full param validation (TypeError→-32602 INVALID_PARAMSviadaemon.rpc). (4)agent_runner.pygains a--pipeentry point:_pipe_mainreads init from stdin, builds a_PipeAgentRunnersubclass that bridgessend_fn→ IPCnotifyand_persist_record→ IPCiteration_done, then drives the existing_run_loopbody so all stagnation-detection / circuit-breaker / dup-summary logic from the threaded path is preserved unchanged. (5)start_runner/stop_runner/stop_allnow dispatch onagent_runner_subprocessconfig key orCHEETAHCLAWS_ENABLE_F4=1env var; default off, Windows always thread-mode. Self-review caught and fixed 3 real bugs before pushing: (a) reader-thread race (started before_register+ DB insert) reordered; (b) malformed-message orphan (anulliteration field unwound the reader → finally classified crashed but subprocess kept running) — wrapped per-message dispatch in try/except + hard-kill in finally if proc still alive; (c) pre-handshakelog+exitIPC on template-not-found that supervisor misread as the ready reply, switched to stderr + non-zero exit so the handshake EOF surfaces a clean error. Tests: 27 new (test_daemon_runner_supervisor.py19 +test_daemon_agent_methods.py10 — handshake, graceful stop ≤ 5 s, SIGKILL escalation on hung runner, external SIGKILL crash detection, IPC shim identity, 9 SQLite persistence cases incl. duplicate-delivery idempotency, 2 malformed-input safety-net regressions, RPC param validation for all 4 methods, end-to-end list → status → stop with an inline runner). 104 / 104 passing across F-4 + daemon + kernel + existing agent_runner tests, zero regressions. Still TODO before flipping from "skeleton" to "MERGED": permission routing throughdaemon/permission.py(currently auto-approves), bridgenotifyforwarding (waiting on F-6/7/8), restart policy, e2e test with the realpython -m agent_runneragainst a tiny template. Branch:daemon/f-4. RFC:docs/RFC/0002-daemon-foundation-roadmap.md. -
May 10, 2026 (v3.5.79): Web Chat UI session organization + headless-bridges slash handler + stale-session reaper crash fix. Three threads of work merged into a single release. Bridges / headless deploys (#84 follow-up): Telegram / Slack / WeChat
/help,/monitor,/model,/statusproduced zero response in Docker /--webdeploys because_start_headless_bridges()only wiredrun_queryandagent_stateon the sharedsession_ctx— neverhandle_slash. The bridge poll loops gate onif slash_cb:and fell through tocontinuebefore the📩 Telegram:log line, so the failure was invisible indocker compose logs -f. Fix: extracted the slash handler (originally inlined inrepl()) into a module-level factory_make_bridge_slash_handler(state, config, run_query); both REPL and headless paths now use it (single source of truth, no future drift between modes). Stale-session reaper crash:web/api.py:reap_stale_chat_sessions()calledremove_chat_session(sid)without theuser_idthe function now requires for ownership-check parity — every reaper tick raisedTypeError, killing the daemon thread, so staleChatSessionobjects accumulated forever in the in-memory cache. Fix: capture(sid, user_id)pairs from the cachedChatSessionobjects under_chat_lock, then apply outside the lock. Web UI session organization: five-feature bundle layered on top — folders + drag-drop + Move-to context menu, ChatGPT-style active-folder context (click a folder name →+ Newand direct-typing both drop new sessions into that folder, with aChat · in <Folder>topbar breadcrumb), batch select with Select-all-respecting-search-filter, batch delete + combined-Markdown export (chats-N-sessions.md), and a 4-px draggable sidebar divider with localStorage persistence. Backend adds afolderstable,chat_sessions.folder_idnullable FK, in-placePRAGMA table_info+ALTER TABLEmigration ininit_db(), and 5 new HTTP endpoints (GET/POST /api/folders,PATCH/DELETE /api/folders/{id},PATCH /api/sessions/{id}/folder). Also rolled in: issue #111 (handle_slash_sync/handle_slash_streamno longer double-broadcast to WS) and--web --model Xpersistence. Tests: +16 new acrosstest_web_api.py(folder CRUD, batch ops, reaper regression) and the newtest_bridge_slash_handler.py(5 cases pinning the headless handler contract). Full suite: 2154 / 2154 passing, zero regressions. User-side guide:docs/guides/web-ui.md. -
May 10, 2026: Web Chat UI fixes — slash commands no longer reply twice;
--web --model Xactually applies the model. Two related issues that surfaced when wiring a self-hosted vLLM endpoint into the Chat UI. (1) Issue #111 — slash commands duplicated in Chat UI but not in terminal.web/api.py:handle_slash_syncwas both returning events inline in the HTTP response and broadcasting the same events to the WS subscribers of the same client;chat.jsthen iterateddata.eventsAND fired_handleEventfromws.onmessage, rendering every reply twice. Same bug inhandle_slash_streamfor SSE-streamed long commands (/brainstorm,/worker,/agent,/plan). Both helpers now deliver events through a single channel — HTTP/SSE only — so_handleEventruns exactly once per event. Background-thread events (sentinel flows, agent runs) are unaffected: by the time the worker thread emits,_broadcastis already restored to the live WS broadcaster infinally. (2)--web --model Xwas silently ignored. The CLI override branch only ran in the interactive-REPL path; theif args.web:branch loaded config straight from disk and started the server, sopython cheetahclaws.py --web --model custom/qwen2.5-72bwould happily boot but every request handler reloaded~/.cheetahclaws/config.jsonwith the previous model name (e.g.gemma-4-31B-it), producing a confusing404: model does not existagainst the new endpoint. Fix:cheetahclaws.pynow persistsargs.modelto config before callingstart_web_server, matching the documented behavior;provider:model→provider/modelnormalization is identical to the REPL path. User-side guide:docs/guides/web-ui.md(Troubleshooting + Architecture notes updated). -
May 10, 2026: Small-context local models survive large workloads — 4-part fix: ctx cap, auto-fanout, stagnation-stop, output paths under
~/.cheetahclaws/. Repro that motivated the work: running/agent → 1 (Research Assistant)on a 6.6 MB PDF (AutoRedTeamer.pdf— ~70k tokens of extracted text) withcustom/qwen2.5-72b(32k ctx). Old behavior: 400 BadRequest "context length 32768"; the agent_runner kept polling the template every 2 s; the model produced 1500+ identical "task complete" summaries before anything stopped it. New behavior, four cooperating layers: (1) Per-model context-window registry + dynamic max_tokens cap (providers._MODEL_CONTEXT_LIMITS+get_model_context_window+dynamic_cap_max_tokens) — covers Qwen 2.5/3, Llama 3.x, Mistral/Mixtral, Phi, Gemma, DeepSeek local variants;_fetch_custom_model_limitnow backfillsPROVIDERS["custom"]["context_limit"]so compaction sees the live/v1/modelsvalue; per-call shrink based on actual prompt size keepsinput + output + 1024 safety ≤ ctx.compaction.get_context_limitgains an optionalconfigarg so custom-endpoint detection works on the very first turn. (2) Auto-fanout for oversize tool outputs (multi_agent/fanout.py) — when a single tool result (Read on a huge PDF, Grep over a giant tree, WebFetch of a long article) exceeds 0.4 × ctx_window, split into chunks at paragraph boundaries with token-overlap, dispatch parallel sub-LLM map calls (one per chunk, default cap 5 subagents), merge with a single reduce call; substitutes the merged summary in conversation history instead of letting the next API call overflow. Hooked at the tool-result append site inagent.py; transparent UX prints[Auto-fanout: <Tool> returned ~N chars (>threshold) → dispatching K parallel sub-summaries]. Configurable:auto_fanout_enabled/_threshold/_max_subagents/_chunk_overlap_tokens. (3) Stagnation-stop inagent_runner.py— when the model emits the same summary N iterations in a row (default 3, whitespace/case-normalized), stop the loop with a clear notification instead of burning thousands of API calls; configurable viaauto_agent_dup_summary_limit(0 disables). (4) Agent output paths under~/.cheetahclaws/—/agentwizard now resolves relative output filenames (e.g.research_notes.md) to absolute paths under~/.cheetahclaws/agents/<name>/output/instead of CWD;AgentRunnerexposesrunner.output_dir, eagerly mkdir'd; Summary block + post-start info show the resolved path in green; absolute paths pass through unchanged. Tests: +47 new (fanout 23, ctx cap 18, dup-stop 13, output paths 8). Full suite: 2139 passing, zero regressions. User-side guide:docs/guides/extensions.md. -
May 9, 2026: Read tool auto-redirects on overflow — defense-in-depth for the case where model ignores the template instruction. Re-running the same
/agent + autodan.pdffailure showed two real-world problems with the prior fix: (1) The user was running the pip-installed binary (/home/shangdinggu/anaconda3/bin/cheetahclaws), not the source tree. New tools / templates added to source had no effect. (2) Even if the user reinstalled, qwen2.5-72b would likely still callReadinstead ofSummarizeLargeFile— models default to familiar tools no matter what the template says. The fix moves the routing decision into the Read tool itself. (a) New_maybe_redirect_to_summarizehelper (tools/files.py). WhenReadorReadPDFwould return content too large to safely fit in the next API call, it instead returns a short redirect message like[ReadTooLarge: file is too large — call SummarizeLargeFile with file_path='X' instead] PREVIEW: …. The model sees the redirect, callsSummarizeLargeFile, gets a chunked-and-merged summary back. The raw content never enters the API call. (b) CJK-aware token estimation. CJK content tokenizes at ~1 token per character (vs ~2.8 chars/token for English). New_is_cjk_heavy()heuristic: ≥20% CJK characters → use 1:1 char-to-token estimate. A 24K-char Chinese file is 24K tokens, not 8.6K, and now triggers redirect on a 32K-context model. (c) Conservative ceiling for unreliable provider declarations.custom/<model>provider declares 128K context by default but the underlying model is often 32K (qwen2.5-72b, llama 3 8B, etc.). Newsafe_ctx = min(declared_ctx, 30000)caps the threshold at 30K tokens regardless of provider claims — the redirect now fires on the user's exact ~25K-token PDF case (would NOT have fired with the unconditional 128K ceiling, which is exactly the bug). (d) Wrapped Read registration (tools/__init__.py). New_read_with_overflow_checklambda calls_maybe_redirect_to_summarizeafter_readreturns; for results <8KB it skips (not worth the check). ReadPDF gets the same treatment inline in_read_pdf. Why this works even on the old install: as soon as the user updatestools/files.pyandtools/__init__.py, the redirect fires regardless of whether SummarizeLargeFile / template changes are present. The redirect's prose tells the model exactly which tool to call and with what args. Tests: 14 new pytest cases (tests/test_read_overflow_redirect.py) — CJK detection (English / Chinese / Japanese / mixed-minority / empty), threshold logic (small file → no redirect; user's exact failure case → redirect with right pointer; CJK at lower char count triggers vs same chars in English; conservative ceiling protects against overconfident provider; preview included for context). Plus 2 integration tests viaexecute_tool("Read", ...)confirming the wrapper applies the redirect end-to-end. 2077 targeted regression tests pass (2063 prior + 14 new), zero regressions across the whole repo. -
May 9, 2026: Multi-agent map-reduce
SummarizeLargeFiletool — solves the "file too big for model context" problem at the source. Re-running the same/agent + autodan.pdffailure case showed the SAFETY_BUFFER bumps were still band-aids — even with 2500-token buffer the prompt re-tokenization sometimes ate ~1K, leaving no margin. The real fix: when a file is too big for the model's context, chunk it and run multiple sub-LLM agents in parallel then merge. This makes file size irrelevant. (a) NewSummarizeLargeFile(file_path, focus="")tool (tools/files.py). Reads any-size file (PDF / txt / md / code), estimates tokens, and: if it fits in(model_ctx - 8.5K_reserved)tokens → single-shot summary; otherwise → splits into N chunks (number adaptive to file size: 200KB on 32K-context model → ~4 chunks; 200KB on 200K-context → 2 chunks), summarizes each chunk in parallel viaThreadPoolExecutor(up to 8 workers), then a reduce step merges all chunk summaries into one unified output. Per-chunk failures are logged inline as[chunk N: error]markers so one flaky source doesn't sink the whole job. Returns the final summary as the tool result. Registered withread_only=True, concurrent_safe=True. (b)/summarize <path> [focus]slash command (commands/advanced.py:cmd_summarize). Thin wrapper around the same helper for direct user invocation — handy for quickly summarizing a paper or large code file without spinning up a full/agentflow. (c)research_assistant.mdtemplate updated. Step 2 of "each iteration" now tells the agent to preferSummarizeLargeFileoverReadfor academic papers (handles chunking + never overflows context regardless of length). Falls back toReadfor tiny (< 5KB) files. (d) Quick band-aid:SAFETY_BUFFER1000 → 2500 in_try_reduce_output_cap_from_error. Even with the new tool, output-cap auto-reduction is still useful for the rare case whereReadis called on a moderately big file. The 2500-token (~7.6% of 32K) buffer now absorbs the +1K vLLM decoder-priming variance we observed in the wild. Tests: 18 new pytest cases (tests/test_summarize_large_file.py) — token estimator parametrized cases, chunk planner adaptiveness (small file → 1 chunk; size scales monotonically; larger context → fewer chunks; chunks have overlap; chunks cover all content), file reader dispatch (text / missing / directory rejected), full pipeline (small → single-shot, big → map-reduce with N≥3 map calls + 1 reduce), tool registration + schema check. 2063 targeted regression tests pass (2045 prior + 18 new), zero regressions. Golden prompt fixture regenerated for the new/summarizecommand in the help index. -
May 9, 2026: Two follow-up fixes after re-running the same
/agentfailure case. The previous patch wasn't enough — running the user's exact scenario again still showed: 1st callprompt 24577 + cap 8192 = 32769 fail→ my auto-reduction fired → 2nd callprompt 24778 + cap 7991 = 32769 fail again. The prompt grew by 201 tokens between attempts (provider re-tokenized differently on retry), exactly eating the 200-token safety buffer. AND the agent_runner's consecutive-failure detector kept resetting because agent.py alternates between[Failed ...]and[Circuit breaker ...]markers, so signature-matched counter went 1 → 1 → 1 → 1 forever. (a) BumpedSAFETY_BUFFER200 → 1000 in_try_reduce_output_cap_from_error. ~3% headroom on a 32K window absorbs provider-side tokenization variance. User's case: new safe cap =32768 - 24577 - 1000 = 7191, which actually fits even after the prompt grows. (b) agent_runner now counts ANY failure, not just signature-matched. New parallel counterconsecutive_any_failuresincrements on ANY[Failed]/[Circuit breaker]marker regardless of signature; trips at 4 consecutive iterations. The[Failed → Circuit breaker → Failed → ...]alternation now stops the agent at iteration 4 instead of looping forever. Updated stop-message clarifies whether the trip was "same identical failures" or "consecutive mixed failures". 8 existing tests updated for new buffer + 2045 targeted regression tests pass. -
May 9, 2026: Three fixes for the context-overflow + circuit-breaker doom loop. User report:
/ssj 15 → Research Assistantpointed at a large PDF, modelqwen2.5-72b(32K context), output cap 8192, prompt 24577 input tokens → total 32769 → 1 token over the limit. Every API call returned the same BadRequestError. The retry loop hit the same error 5 times in 60s → circuit breaker opened (120s cooldown). After cooldown the agent runner retried with the SAME config → re-opened the breaker → cycle continued forever, generating hundreds ofcircuit_open_skiplog lines. Three coordinated fixes break the loop. (a)agent.pyauto-reduces output cap on context overflow. New_try_reduce_output_cap_from_errorparses the explicit token counts from the error message (max=32768, requested=8192, prompt=24577) and computes a safe new cap =model_max - prompt_tokens - 200_buffer. In the user's case:32768 - 24577 - 200 = 7991, which fits. The retry uses the new cap WITHOUT consuming the attempt budget; bounded to ONE auto-reduction per turn so a true overflow (prompt itself too big to fit any reasonable output) eventually surfaces. Tolerant regex matches both OpenAI-style and Anthropic-style overflow messages. Falls through to existing_force_compactpath if numbers can't be parsed or the safe cap < 256. (b)agent_runner.pystops after N consecutive identical failures. Track each iteration's failure signature (the[Failed ...]or[Circuit breaker ...]marker text from agent.py's output, capped at 80 chars). When 3 in a row match, stop the agent with a clear notify message naming the underlying error. Prevents the doom loop where a fundamentally broken request (context too big for compaction to fix, missing API key, unauthorized model) keeps re-running every 2s for hours. (c)agent_runner.pyhonors circuit-breaker cooldown. When iteration text contains[Circuit breaker OPEN ... Cooldown: Xs], parse Xs and wait that long (capped at 5 min) instead of the configured 2s interval before next iteration. Avoids 60+ wasted iterations per single 120s cooldown. Tests: 8 new pytest cases (tests/test_context_overflow_recovery.py) — parser reproduces user's exact failure → 7991 cap, no-op when current cap already fits, give-up when safe cap < 256, OpenAI vs Anthropic phrasing tolerance, regex match for circuit-breaker cooldown extraction, regex match for [Failed / [Circuit breaker markers in real outputs. 2045 targeted regression tests pass (2037 prior + 8 new), zero regressions. -
May 9, 2026: /brainstorm v2: programmatic backstops + ranked synthesis +
--bgbackground mode. Three coordinated additions that make brainstorm output usable even when the lead model is weak (qwen2.5 etc.) and let users keep working while the debate runs. (a) Programmatic action-plan filter (commands/advanced.py). Two new helpers_extract_ban_keywords(opening)and_filter_action_plan(synthesis_md, ban_keywords). After_lead_synthesisreturns, the action plan is regex-scanned (case-insensitive substring) against a built-in default ban list —consult an advisor,diversify your portfolio,monitor regularly,考虑,咨询,定期监控,多元化,咨询财务顾问,分散投资,关注市场动态and dozens more, English + Chinese — PLUS topic-specific bans extracted from quoted strings ("..."/「...」) in the lead's own opening. Matched items are dropped with a_(programmatic self-check removed N action(s))_note appended. Deterministic — runs regardless of whether the lead model actually executed its prompt-side SELF-CHECK instruction. The user-reported failure case where qwen2.5 banned "consult an advisor" in the opening but still wrote "明天与财务顾问讨论" as Action Plan item #10 is now caught at the code level. (b) Ranked synthesis enforcement. The_lead_synthesisprompt's## Consensussection is renamed to## Ranked Consensuswith a mandatory**Ranked by: <metric>**header (metric extracted from the user's topic — "highest expected return" / "best refactor impact" / etc.) and items must be numbered with a→ Why this rank: <one sentence>line. Programmatic backstop_consensus_is_rankedregex-checks for ≥2 numbered items in the section; if missing, ONE fallback LLM call asks the lead to rank. If the fallback also fails to produce a ranking, the original ships unchanged (no crash). (c) Background mode--bg(or--background). New flag spawns a daemon thread, returns the REPL immediately. Stage progress (Lead opening,Round 2/3 (cross-examination),Synthesis) prints from the thread and interleaves with the user's typing — acceptable trade-off for a freed REPL. New/brainstorm statussubcommand shows all in-flight bg brainstorms with their current stage + elapsed time + output path. Implementation uses recursion: when--bgis set, the thread re-enterscmd_brainstormwith_bg_recursion=Truemarkers in config that bypass the interactive prompts (which would block on stdin) and suppress the TODO-generation sentinel (no REPL is listening for it). Module-level_BG_BRAINSTORMSdict is mutex-locked so/brainstorm statusreads a clean snapshot. Finished brainstorms older than 1h are pruned fromstatusto keep the list useful; running ones never prune regardless of age. Tests: 27 new pytest cases (tests/test_brainstorm_v2_advanced.py) — ban-keyword extraction (defaults + opening-quoted), action-plan filter (English + Chinese + no-section + all-clean), ranking detector (proper / unranked-bullets / no-section / single-item),_ensure_consensus_is_ranked(no-op when ranked + LLM call when not + keep-original on LLM failure),--bgflag parsing (7 cases including--backgroundalias + flag-position-tolerance +--bgmodenot matching), bg registry (register/set_stage/complete/snapshot + sort + 1h-prune-finished + keep-running-regardless). 2037 targeted regression tests pass (2010 prior + 27 new), zero regressions across the whole repo. Doc:docs/guides/brainstorm.mdadds--bgrow to the flag table + new "Programmatic backstops on the synthesis" section + tip "use --bg for long debates so you can keep working". -
May 9, 2026: /brainstorm
--ground: pre-fetch real /research data so personas debate against facts. Closes the biggest remaining gap in the brainstorm pipeline. Until now/brainstormwas pure-reasoning (no_tools=Trueon every persona) — fine for design / refactor / strategy questions, but useless for data-hungry topics like stocks / current events / recent news where personas would confidently invent prices and tickers from training memory. New--ground(or--ground=Nfor top-N cap, clamped to[3, 50], default 15) runsresearch.aggregator.research()on the topic BEFORE the debate starts, formats the top results as a compact### GROUNDING DATAmarkdown block, and inlines that into the snapshot every persona / lead opening / lead synthesis sees. Persona round-1 instructions gain "you MUST cite specific results by[N]when your claim relates to one — do not invent figures the data doesn't show." Lead opening detects the grounding block and anchors the agenda to it ("forbid any claim that contradicts the grounding data without citing it"). Lead synthesis takes a newgrounding=kwarg and the prompt requires every consensus claim to trace to either a[N]result OR a specific persona claim — un-traceable claims must be DROPPED. Failure-tolerant: any exception from the research aggregator (network, missing API keys, all sources 429) is caught silently —_fetch_groundingreturns""and the brainstorm continues un-grounded with a logged warning. Cost: 10-30s for the fetch, but cached for 24h via the existing/researchSQLite cache so back-to-back runs on the same topic are basically free. Composes cleanly with--rounds,--lead,--models. SSJ interactive flow gains a newGround in /research data first? [y/N]prompt right after Rounds; defaultNso existing usage is unchanged. Tests: 18 new pytest cases (tests/test_brainstorm_grounding.py) — 8 flag-parse cases including bound-clamping + four-flag composition, brief-formatting shape + sort + char-budget + empty-results, three fetch-graceful-degradation paths (raises / empty brief / happy path), backward-compat for_lead_synthesis(grounding=). 2010 targeted regression tests pass (1992 prior + 18 new), zero regressions across the whole repo. Doc:docs/guides/brainstorm.md"Data-hungry topics" section rewritten with examples + tip "always pass --ground for any topic touching the real world". -
May 9, 2026: /brainstorm output-quality guards — fix 5 real bugs surfaced from a live transcript. Reviewing
brainstorm_outputs/brainstorm_20260509_000935.mdexposed five concrete failures the structural changes alone didn't catch. (a) All persona letters wereP—letter, name = get_identity(persona_name[0].upper())and persona dict keys arep1/p2/…, so every Agent ended up labeledP("Agent P quoting Agent P attacking Agent P"). Letters now come from a stablepersona_identitymap keyed by index →A, B, C, D, E…(capped at Z). (b) Same persona's NAME re-rolled every round becauseget_identitywas called fresh and Faker is random — round 1's "Riley Torres" became round 2's "Alex Lopez".persona_identityis sealed once before the rounds loop. (c) Round 2+ challenges were verbatim copy-paste — qwen2.5 saw the first persona's CHALLENGE block in history and cloned it (8 of 10 round-2/3 challenges in the failing transcript were >95% identical). New_extract_challenge_blocks+_jaccard_similarity+_is_redundant_challenge(threshold 0.7) guards: when a round-2+ persona's CHALLENGE is too similar to a prior one, the lead force-regenerates ONCE with explicit "pick a different target / different angle" nudge; if still redundant, the contribution is kept but tagged_[lead note: contribution flagged as redundant]_so the synthesizer can ignore it. (d) Lead synthesis self-contradicted itself — listed "consult an advisor" inWhat Was Fillerthen included "明天与财务顾问讨论" as Action Plan item #10._lead_synthesisnow takes the lead's ownopeningtext as context and the prompt explicitly forces a SELF-CHECK before writing the action plan: "if any action matches a banned escape hatch, REWRITE or DELETE." (e) Weak lead models silently produce flat output — qwen2.5-72b leading qwen2.5-72b is the same model on both sides with no real moderation. New_is_weak_lead_modelfamily check (qwen / qwq / gemma / phi-3 / mistral-7b / llama-3.2 / kimi-7b / minimax-text / abab / etc.); when triggered, prints a one-line warning suggesting--lead claude-opus-4-7or the free--lead nim/deepseek-ai/deepseek-r1. Never silently overrides — just informs. Plus a newdocs/guides/brainstorm.md"When NOT to use /brainstorm" section: the panel runs withno_tools=Trueso it can't pull live data — bad fit for stocks / current events / repo-specific code; good fit for architecture decisions / refactor strategy / risk assessment / API design. Tests: 28 new pytest cases (extraction + Jaccard + redundancy + weak-lead + synthesis-with-opening). 297 targeted regression tests pass. -
May 9, 2026: /brainstorm round 2+ becomes adversarial cross-examination. Previous round-2+ prompt asked personas to "engage with what others said" but that was too soft — weak models defaulted to "agree-and-extend" or just continued their own line, producing N rounds of polite parallel monologues instead of a real debate. Three coordinated changes flip round 2+ into mandatory adversarial mode. (a) Persona round-2+ prompt rewrite (
commands/advanced.py:call_persona). Each persona MUST: quote a specific claim from another agent verbatim (by letter), attack a specific weakness (data wrong / mechanism doesn't produce outcome / confounder ignored / claim un-falsifiable / contradicts stronger claim), AND propose a falsifiable counter-claim with a specific number/date/named entity. Structured format### [CHALLENGE → Agent X]so weak models can follow. Politeness ("great point", "I agree, and would add", restating without attacking) is explicitly FORBIDDEN. Synthesis is the lead's job, not the persona's. (b) Round-aware lead probe (_lead_probe). Round 1 keeps the existing concrete-vs-vague check. Round 2+ uses a different probe that fires on DODGES — a polite agreement, a synthesis, or a defense-only reply that doesn't quote and attack another agent earns a probe demanding "Agent X said '...'. Attack it or accept it — your call, but commit. Quote and refute, don't dodge." (c) Lead opening warns about cross-examination upfront. Opening prompt now ends with explicit rule: "in any round after the first, each expert MUST quote a specific claim from another expert and either attack with a counter-claim OR explicitly accept it. Polite agreement counts as a dodge." UI label changes too —── Round 2/3 (adversarial cross-examination — agents must attack each other's claims) ──. Tests: 3 new round-aware probe cases (round-2 polite-agreement gets probed; round-2 real challenge passes; round-1 still uses old vague check — captured so a future round-2 change can't regress round 1). 269 targeted regression tests pass. -
May 8, 2026: /ssj brainstorm: interactive Rounds prompt. Tiny UX follow-up to the multi-round /brainstorm landing —
/ssj→ 1 (Brainstorm) now asksRounds [1=monologues, 2=critique (default), 3-6=more debate] >right after the existing "How many agents?" prompt, so SSJ users can dial in debate depth without remembering the--rounds NCLI flag. Behaviour: when the user invokes/brainstorm --rounds 3 …directly via the slash-command line, the explicit value wins and the prompt is skipped (no double-asking). Telegram / web bridge sessions still skip the prompt entirely (no interactive input channel) and use the documented default of 2 rounds. -
May 8, 2026: /brainstorm: real multi-round debate + tighter post-Write contract. Two follow-up fixes after the lead-moderator landing. (a) Multi-round debate (
commands/advanced.py). Previous flow ran every persona exactly once — even with the lead moderator, that's three monologues stapled together, not a debate. New--rounds Nflag (default2, capped to[1, 6]) wraps the persona iteration in an outer rounds loop. Round 1 is initial positions (existing prompt). Round 2+ uses a different system prompt that explicitly forbids repeating: "Read the prior debate. Pick 1-2 specific claims from OTHER agents that you disagree with, can sharpen, or that change your view. Quote and engage. Do NOT re-list your round-1 ideas." Lead probes still fire after each persona in each non-final round. The synthesis prompt's transcript is rebuilt frombrainstorm_historydirectly so adding new header rows can't mis-slice it again. Composes with--lead <model>and--models a,b,c:/brainstorm --rounds 3 --lead claude-opus-4-7 --models gpt-5,nim/deepseek-ai/deepseek-r1 redesign auth. (b) Tighter TODO prompt (cheetahclaws.py). The previous "do not echo / do not Read" prompt didn't stop qwen2.5 from Write → echo content as text → Bashlsto verify (with truncated path due to vLLM streaming) → echo content again. New prompt is numbered STRICT RULES: call Write EXACTLY ONCE; do NOT call Read; do NOT call Bash to verify; do NOT echo file content after Write; after Write succeeds, your turn ENDS. Both REPL and Telegram handlers updated. Tests: 9 new pytest cases (--roundsparser including bound-clamping + non-numeric rejection + three-flag composition). 266 targeted regression tests pass. The Bash-args truncation symptom (ls /srv/.../cheetahclacut mid-path) is a vLLM hermes-parser streaming bug at the model server, not fixable on the client side; the tighter prompt avoids the Bash call entirely. -
May 8, 2026: Three fixes for /monitor + /research stability — multi-word topics + aggregator deadlock + REPL Ctrl+C. Two distinct bugs reported on a
/ssj→ 17 (Trend Track) flow with the topic "Agent OS Benchmark". (a) Topic truncated to first word (commands/monitor_cmd.py:_parse_subscribe_args). The previous parser didargs.split()and treated the FIRST whitespace token as the topic, dropping the rest. So/subscribe research:7d:Agent OS Benchmark dailybecame topic=research:7d:Agent+ the rest was either silently dropped or mis-classified as flags. The new rule: walk left-to-right, peel off--flagtokens into channels, then if the LAST remaining token is in_VALID_SCHEDULESit's the schedule — everything before joined by single spaces is the topic. Correctly handlesai_research,ai_research weekly,custom:quantum computing weekly,research:7d:Agent OS Benchmark daily,research:7d:Agent OS Benchmark(default schedule), and edge cases. 12 new pytest cases (tests/test_subscribe_parser.py). (b) Aggregator deadlocked on slow source then killed REPL on Ctrl+C (research/aggregator.py:190). Thewith concurrent.futures.ThreadPoolExecutor(...)context manager callsshutdown(wait=True)on__exit__, which BLOCKS waiting for any in-flight worker to finish. Whenas_completed(timeout=...)fires its TimeoutError because one source is hung on a stuck socket, control unwinds into the__exit__and joins the hung thread. Then the user Ctrl+Cs to escape, the KeyboardInterrupt fires during the join, and Python's atexit hook_python_exitALSO joins the same threads — double-blocking, then atexit kills the process and the user is dumped to bash. Fix: switch to manualtry/finallywithshutdown(wait=False, cancel_futures=True)(Python 3.9+) so partial results return immediately; the hung worker keeps running as a daemon thread and dies silently with the process. Both_cf.TimeoutErrorandKeyboardInterruptpaths now mark unfinished sources with a status entry ("timeout (aggregator deadline exceeded)"or"interrupted by user") instead of dropping them silently. (c) REPL: Ctrl+C during a slow slash command killed the process (cheetahclaws.py:1368). The REPL didresult = handle_slash(user_input, state, config)with NO try/except, so a KeyboardInterrupt during/monitor run,/research,/trading backtest, etc. unwound the call stack all the way tomain()→sys.exit()→ atexit. Fix: wrap the REPL slash dispatch intry / except KeyboardInterrupt → print '(command interrupted)' → continueso Ctrl+C cancels the command and returns to the prompt. Also wrapped the SSJ inner re-dispatches at lines 1420/1430 (__ssj_passthrough__and__ssj_cmd__) so Ctrl+C from inside a slow SSJ-launched command bounces back to the SSJ menu instead of killing the REPL. 257 targeted regression tests pass. -
May 8, 2026: /brainstorm gets a real lead moderator + read-only tool dedup. Two coordinated changes that turn /brainstorm from "round-robin echo chamber that produces filler advice" into "moderated debate with a structured master plan", and stop weak models from re-Reading the same file twice. (a) Lead moderator (
commands/advanced.py). Three new in-process stages (no main-agent invocation, no tool calls — the whole pipeline lives insidecmd_brainstorm): (i) Opening — lead frames the agenda, names the concrete artifact this debate must produce (e.g. "specific tickers with thesis, not 'consider semiconductors'"), and lists 2-3 cheap escape hatches that will be REJECTED ("consult an advisor", "diversify", "monitor regularly"). The opening becomes the persona system-prompt's "DEBATE ANCHOR" so every persona writes against the same bar. (ii) Probe — after each persona speaks, lead reads their contribution and either repliesNO_PROBE(concrete enough) or asks one ≤25-word follow-up that demands a specific commitment; the persona then gets one more swing answering the probe. (iii) Synthesis — lead produces the final master plan with four named sections (Consensus / Dissents / Concrete Action Plan / What Was Filler), with the consensus matrix tagging each claim with the agent letters that backed it. New--lead <model>flag lets you point lead at a stronger model than the default (/brainstorm --lead claude-opus-4-7 --models gpt-5,deepseek-r1 redesign auth). Composes cleanly with the existing--models a,b,cflag. (b) Eliminates the duplicate-Read bug. The previous flow returned a sentinel that asked the main agent to Read the brainstorm file and synthesize — qwen2.5 + vLLM cheerfully Read it twice and echoed the entire 4 KB master plan as text twice (also writing a different much shorter content via Write — a separate tool-call truncation issue). The new sentinel inlines the lead's master plan directly in the TODO-generation prompt, so the main agent only writes the TODO file. No Read, no rewrite. The old_save_synthesisstep is now a no-op (everything is written insidecmd_brainstorm). (c) Read-only tool dedup (agent.py). Defense-in-depth even outside brainstorm: when the model fires Read/Glob/Grep/WebFetch/WebSearch with identical args twice within a singlerun(), the 2nd call is short-circuited —execute_toolis skipped (saves time),ToolStart/ToolEndUI yields are suppressed (no⚙ Read(...)printed twice), a brief[deduped Read: already in context]text marker is yielded so the user still knows what happened, and a synthetic[deduped]reminder is appended as the tool_result so the model sees "you already called this; use the content already in your context" — both nudging the model AND keeping the OpenAI/Anthropic tool_calls ↔ tool_response pairing valid. Write/Edit/Bash are explicitly NOT deduped (those can be intentional rewrites). Tests: 19 new pytest cases (8 lead helpers + 4 dedup integration via fake provider stream + 7 flag-parse). 245 targeted regression tests pass. -
May 8, 2026: /ssj brainstorm hot-fixes — absolute path in synthesis prompt + tool dispatch hardened against empty args. Two bugs surfaced when a user ran
/ssj→ 1 (Brainstorm) oncustom/qwen2.5-72b. (a) commands/advanced.py:244 — synthesis prompt leaked a relative path. The brainstorm synthesizer was injectingout_file(aPathresolved relative to cwd) into the model's prompt asbrainstorm_outputs/brainstorm_<ts>.md. The model — obeying the system prompt's "always use absolute paths" rule — invented an absolute prefix and guessed wrong (in this case…/PR/cheetahclaws/brainstorm_outputs/…, a stale sibling source tree it had never been told existed). Read failed, the synthesis ran on no actual evidence. Fix:out_file.resolve()before formatting + an explicit "use this path verbatim, do NOT prepend any directory" line. (b) tools/init.py:459-471 — permission-prompt description usedinputs['file_path']notinputs.get(...). When a weak model fired a tool_call with empty arguments (qwen2.5 + vLLM hermes-parser is a documented offender — see "Be agentic on every model" entry above), the wrapper raisedKeyError: 'file_path'before the registered ToolDef's friendly"Error: missing required parameter 'file_path'"lambda ever ran. The user sawError executing Write: KeyError: 'file_path'and the model couldn't self-correct. Fix:.get(..., '<missing path>')for Write/Edit/NotebookEdit description,.get('command', '') or ''for Bash, so the inner ToolDef's friendly error always reaches the model. Bash's_is_safe_bashalready tolerates empty input. Tests: 9 new pytest cases (tests/test_tool_dispatch_robustness.py) — empty args on Write/Edit/Read/Bash/NotebookEdit must return a friendly string and never leakKeyErrorto the agent loop. 226 targeted regression tests pass. -
May 8, 2026: NVIDIA NIM free-tier provider + 429 cascade fallback + multi-model
/brainstorm. Three small, focused additions — borrowed selectively from sibling forks (Falcon for NIM, Dulus for the multi-model debate idea) — that lower the barrier to entry for users without paid API keys and tighten epistemic diversity in brainstorming. (a) NIM provider (providers.py). Newnimentry registered againsthttps://integrate.api.nvidia.com/v1(build.nvidia.com — free signup, no payment info), curated 10-model chain (deepseek-r1, deepseek-v3.1, llama-3.3-70b, llama-3.1-405b, nemotron-70b, mixtral-8x22b, qwen2.5-72b, qwen2.5-coder-32b, phi-3-medium, gemma-2-27b). All listed inCOSTSas $0 so the UI doesn't show "unknown" for free-tier usage. Invocation:cheetahclaws --model nim/<vendor>/<model>— the double-prefix preserves NIM's upstream<vendor>/<name>form throughdetect_provider+bare_model. (b) 429 cascade fallback (agent.py). When a NIM model returns rate-limit (ErrorCategory.RATE_LIMIT), the agent loop callsnim_next_model()to pick the next model in the curated chain and retries — without consuming a regular retry slot. Capped at_NIM_FALLBACK_LIMIT = 3swaps per turn so a fully-throttled tier can't busy-loop; after the cap, falls through to the standard exponential-backoff retry path. Disabled by settingnim_auto_fallback=Falsein config. Other providers (anthropic / openai / etc.) are not affected — the swap is gated bydetect_provider() == "nim". (c) Multi-model/brainstorm(commands/advanced.py). New--models a,b,cflag distributes models round-robin across personas (/brainstorm --models claude-opus-4-7,gpt-5,nim/deepseek-ai/deepseek-r1 redesign auth) so a 5-persona session alternates 1, 2, 3, 1, 2 instead of running every persona on the same model. Single-model brainstorm is an echo chamber — different model families have different training data and blind spots, so multi-model debate buys real epistemic diversity. Each persona's section in the output Markdown is tagged with the model that produced it (## 🏗️ Architect _(via gpt-5)_) so the synthesizer can weight by source. Borrowed in spirit from Dulus'sRoundtableAgent; the existing /brainstorm flow is unchanged when--modelsis omitted. Tests: 21 new pytest cases (tests/test_nim_provider.py12 +tests/test_brainstorm_models_flag.py9) covering provider registration, chain cycling (cycle-through + wraparound + unknown-model head fallback), 429 swap-then-succeed, fallback-cap-then-fallthrough, fallback-disabled honor, non-NIM no-leak, flag parsing across--models a,b,c/--models=a,b,c/ flag-at-end / provider-prefixed IDs / single model. 217 targeted regression tests pass, zero regressions. Skipped by design: ia-web-parser'sWebToolParser— Cheetahclaws' existing_extract_native_tool_callsalready covers 4 marker formats (Gemma official + asymmetric, Hermes, Mistral) plus channel-tagged form and args recovery, so the streaming-vs-buffered UX delta wasn't worth the duplication. -
May 8, 2026 (earlier): "Be agentic on every model" pass — explore-first prompt + qwen overlay + runtime auto-nudge. A user reported
cheetahclaws --model custom/qwen2.5-72breplying "please tell me which file you mean" when handed a directory path, instead of justls-ing it. Three coordinated defenses, layered so any one of them is enough to fix the failure mode on any model: (a)prompts/base/default.md— new "Investigate Before Asking" section + softened Stop Conditions. Every model now gets explicit "default to action over conversation" framing: a directory is not "missing information", it's an invitation to enumerate;AskUserQuestionis reserved for genuine post-exploration ambiguity (intent that nols/Glob/Read could disambiguate), never as a substitute for a tool call. (b)prompts/overlays/qwen.md— new family overlay (10 lines, cites the Qwen function-calling guide). Qwen / QwQ chat-tuned models hedge by default ("could you specify…"); the overlay overrides that with "treat every concrete noun the user names — path, filename, URL, function, command, error string — as an instruction to investigate it with a tool, not echo it back as a question." Registered in_OVERLAY_RULESfor allqwen/qwqmodel IDs regardless of runtime (DashScope / Ollama / vLLM / OpenRouter all match). (c)agent.pyruntime auto-nudge — model-agnostic safety net. New_looks_like_investigation()heuristic detects absolute-path tokens in the user message (URL-stripped to avoid false positives onhttps://host/path); if the heuristic fires AND the model's first reply is text-only with zero tool calls, the loop injects a one-shot[system reminder] use your tools, don't ask for what was givenmessage into history and continues. Bounded to one nudge perrun()invocation so it can never cause a loop — second text-only reply always falls through to break. The nudge fires on conversion to the OpenAI/Anthropic format as a normal user-role message and is invisible in the rendered UI (yielded events drive the display, notstate.messages). Tests: 13 new pytest cases (tests/test_agent_nudge.py) — heuristic positives/negatives across English + Chinese + URL-only + relative-path + bare greeting; loop integration via fake provider stream verifying nudge fires, doesn't fire without path, fires at most once. 89 prompt + 196 targeted regression tests pass, zero regressions. Docs updated:prompts/README.mdoverlay table + Known Gaps,docs/architecture.mdoverlays tree + agent-loop step (h),docs/contributor_guide.mdoverlay enumeration. The three layers compose: strong models (Claude/Gemini) read the new default rule but already behaved this way; mid-tier models (GPT/DeepSeek/Kimi) get a clearer prompt-level instruction; weak models (qwen2.5/QwQ) get prompt + overlay + runtime nudge stacked. Even on a model that ignores the prompt entirely, the runtime nudge gives one free retry before the user has to intervene. -
May 8, 2026 (earlier): Agent-OS layer (
kernel/) reaches v1.0 — 27 RFCs shipped, 1771 tests passing, zero regressions on the legacy REPL/bridges path. What started as a daemon foundation (RFC 0001/0002) is now a single-node agent operating system: AgentProcess + EventLog (0003), Capability model (0005), per-agent ResourceLedger withfirst_breachsignal (0006), priority Scheduler with admission filter (0007), RLIMIT + bubblewrap Sandbox (0008), Mailbox + topic pub/sub (0009), AgentRegistry (0010), AgentFS unified VFS (0011), Observability + Prometheus exposition (0012), and a frozen 58-method JSON-RPC contract with CI drift guard (0013). On top of that substrate: F-4 Subprocess agent runner (0016), WorkerLoop scheduler↔supervisor glue (0017), Bridge mirror that wires Telegram/WeChat/Slack intokernel.mboxwithout touchingbridges/(0018), LLM runner MVP (0019), DialogueOrchestrator for multi-turn (0020), Tool Dispatch + Permission Routing (0021), LLM Tool Calling Integration (0022), defense-in-depth tools — Exec (argv-only, RLIMITed, env scrubbed; 0023), Glob+List (0024), Fetch (SSRF + DNS-rebind + redirect-leak defended; 0025) — three streaming layers (IPC chunks 0026, LLM token streaming 0027, Exec line streaming 0028, Fetch body streaming 0029), and three new built-in inspectors (Diff 0030, AST 0031, Git 0032). All kernel code lives inkernel/and is gated behind--enable-kernel— default CheetahClaws CLI / REPL / bridges / web UI are byte-for-byte unchanged. Operators introspect viacheetahclaws kernel summary | info | agents | proc <pid> | events | queue | registry | methods | prometheus. Kernel SQLite schema is forward-only (v1 → v7). RFC 0014 multi-tenant + RFC 0015 cluster remain explicitly parked. Full overview:docs/agent-os.md. Each design note indocs/RFC/. -
May 8, 2026: F-2/F-3 follow-ups + CI unblock (
feature/fix-f2). Two-commit branch on top of #101's daemon foundation (F-2 SQLite persistence + F-3 monitor in daemon). (a) CI unblock (fix(ci)). Main has been red since9c01237d(the trading-agent #99 merge) —tests/test_packaging.py::test_required_module_imports[modular.trading.ml](the regression test added for issue #97) caught thatmodular/trading/ml/features.pyandmodular/trading/portfolio.pyimport numpy at module top while numpy is in the[trading]extra, not core deps. Sopip install .(no extras) shipped a wheel whereimport modular.trading.mlblew up. PR #100 and #101 both inherited the red. Fix: deadimport numpy as npremoved fromfeatures.py;stacker.pydefers numpy to insidetrain()andpredict_proba()past the early-return paths so the diagnostic-only callers (train(too_few_rows),predict_proba(missing_model)) still work without the heavy stack;portfolio.pygates the numpy import behindtry/exceptso module import succeeds and runtime callers raise on first use as before.test_trading_advanced.pyandtest_trading_discovery.pygetpytest.mark.skipifmarkers on tests that genuinely need numpy / scipy / sklearn / pandas at runtime — skip cleanly on lean CI installs, run as before on full installs. Verified in a clean venv with only[web,autosuggest](the exact CI install): 1075 passed, 11 skipped; with[all]extras: 1086 passed, no regressions. (b) F-2/F-3 follow-ups (fix(daemon)). Five issues found during the #101 review that the merged code didn't address: (i)daemon/cli.py:cmd_servestartedmonitor.scheduler.start(...)before the listener bound — order matters because if a due subscription fires before the daemon is reachable, an LLM/network error in fetch/summarize/deliver surfaces in the log before the user sees the listening line, and external clients can't yet act on the resultingmonitor_reportSSE event; moved past the bind + discovery write. (ii)monitor/scheduler.pyhad no defense against the daemon coming up after REPL/monitor startfired — both schedulers would race onlast_run_atand double-fire subscriptions; added_foreign_daemon_running()step-aside check at every loop tick (REPL-side instances bow out when a daemon registers ownership), withowned_by_daemon=Trueflag the daemon passes to opt out of the check on its own scheduler. (iii)EventBus.publishwassynchronous=FULL(SQLite default) → every event was anfsyncper commit, ~305 μs each; for streaming agent output (text_chunkevents at dozens/sec) that's a real disk-IO concern.daemon/schema.pynow setsPRAGMA synchronous=NORMALon init + every thread-local connection — safe under WAL (only the most recent transactions can be lost on hard kernel crash, which for a 24h-pruned event log is fine), microbenchmark drops to 39 μs/publish (~8×). (iv) The PR description said the JSON files were "kept readable for one release as fallback", but no fallback read path actually exists —jobs.pyandmonitor/store.pymigration is fundamentally one-way once theschema_metamarker is set. Updated docstrings +docs/architecture.mdto make the one-way semantics explicit and tell users how to redo a migration if needed. (v)docs/RFC/0002-daemon-foundation-roadmap.mdF-2/F-3 marked OPEN → MERGED #101 + follow-ups (#fix-f2), with a new "Follow-ups" subsection under each. Branch:feature/fix-f2. -
May 8, 2026: Two production fixes — Gemma 4 native tool-call interceptor + issue #97 (
pip install .shipping a broken wheel). Two unrelated bugs that both blocked end users on the v3.1 release. (a) Gemma 4 native tool-call interceptor (providers.py). When users run cheetahclaws againstgemma-4-31B-itvia vLLM, the model emits its native<|tool_call>call:NAME{json}<tool_call|>format instead of the Hermes/JSON envelope vLLM's--tool-call-parser hermesexpects. vLLM doesn't recognise the format → leaves it indelta.content→ cheetahclaws yields it asTextChunk→ terminal shows raw<|tool_call>call:Research{topic:<\|"\|>...<\|"\|>}<tool_call\|>garbage instead of a coherent answer. The interceptor instream_openai_compatnow watches the streamed text for any of four native tool-call openers (Gemma official<|tool_call|>, Gemma 4 asymmetric<|tool_call>, Hermes<tool_call>, Mistral[TOOL_CALLS]); on detection it (i) yields the pre-marker text as a cleanTextChunk, (ii) stops yielding text and switches into buffer mode, (iii) at end-of-stream tries three parser branches against the buffer (Gemma'scall:NAME{json}, JSON envelope withname/arguments, Mistral's array form) and adds successful matches totool_calls. Also normalises Gemma's<|"|>→"quote escaping. If no parser matches, falls back to yielding the buffered raw text so users see something rather than a silent stall. Tests: 16 new pytest cases (tests/test_native_tool_intercept.py) covering marker detection (4 variants), 3 parser branches, robustness (empty buffer / unparseable garbage / multi-call buffer), and end-to-end streaming via mocked OpenAI client (verifies pre-marker text yielded as TextChunk +<|tool_call>tokens NOT in any TextChunk + tool_call appears in AssistantTurn). (b) Issue #97 —pip install .produces a broken wheel (pyproject.toml, deletedmemory.py,tests/test_packaging.py). Reported by @albertcheng on Windows + Python 3.13:cheetahclaws.execrashed at startup withModuleNotFoundError: No module named 'prompts'. Root cause: a name collision inpyproject.toml—memorywas listed in BOTHpy-modules(referring to a 11-line backward-compat shimmemory.pythat re-exports from thememory/package) ANDpackages(the realmemory/directory). Python's import system always prefers the package directory over a same-named .py file, so the shim was dead code; setuptools ≥ 75 on Windows treats this dual-registration as a hard error and silently drops unrelated packages from the wheel build — which is howprompts/went missing. Fix: deleted the deadmemory.pyshim, removedmemoryfrompy-modules, and replaced the manualpackages = [...]list with[tool.setuptools.packages.find]+ wildcardincludepatterns so future sub-packages auto-discover. This also caught a separate latent bug — the four sub-packages added in the v3.1 trading discovery layer (modular.trading.alt_data,modular.trading.broker,modular.trading.discover,modular.trading.ml) were missing from the manualpackages = [...]list and would have been excluded from production wheels even after a successful build. Tests: 29 new pytest cases (tests/test_packaging.py) — config sanity (no module/package name collision allowed;memory.pyshim must not be re-introduced; pyproject.toml must usefindnot manual list), discovery walk (every top-level dir with__init__.pyis reachable fromfind's include patterns or explicitly excluded), and the exact issue #97 failure reproduction (parametrised import test for 24 modules includingprompts,prompts.select, all four newmodular.trading.*sub-packages, and thecheetahclawsentry point — fails the build if any can't be imported). Verified locally: rebuilt wheel after fix contains all 31 packages includingprompts/and the four new sub-packages. 1005 passing (976 baseline + 16 native-tool-intercept + 29 packaging = 1005), zero regressions. CONTRIBUTING.md updated with explicit packaging discipline notes: never put a name in bothpy-modulesandpackages, sub-packages auto-discover viafind, only top-level packages need a newincludepattern. -
May 8, 2026 (later):
/tradingv3.1 — automatic candidate discovery + composite ranking + anomaly detector + market monitor with bridge alerts. Closes the biggest gap in v3: previously you had to feed the agent symbols (/trading analyze NVDA); now it actively scans a universe and finds candidates for you. Four orthogonal discovery scanners ship: (a)insider_cluster— SEC EDGAR Form 4 cluster detector, flags tickers with ≥3 officer / 10%-holder filings in 30 days, surfaces SEC URLs so user can verify direction; (b)earnings_beat— yfinance earnings_dates surprise extractor, requires ≥10% beat AND post-print continuation (filters out the pop-and-fade pattern); (c)momentum_quality— factor intersection over the newfactors.py(momentum = 6m return + 50d>200d trend confirmation; quality = ROE − 0.3·D/E + 2·op-margin; both min-max normalised + composite-scored); (d)sector_rotation— ranks SPDR Select sector ETFs by 1m+3m return, surfaces top holdings of the leaders. The orchestrator (discover/orchestrator.py) merges per-symbol hits across all four sources with weighted aggregation (insider 1.0, earnings 0.9, mom-qual 0.7, sector 0.5) AND a +0.5 confluence bonus when ≥2 sources flag the same ticker. New CLI:/trading discover [insider|earnings|momentum-quality|sector|all] [--universe sp100|sectors] [--add-watchlist N]— the--add-watchlistflag auto-promotes the top N hits to your watchlist for downstream/trading scan//trading analyze. New/trading rankcomposite-ranks candidates by 0.5×factor + 0.3×discovery + ±0.1 calibration-tilt; output is a triage table for "which names deserve a real/trading analyze". New/trading factors [SYMS]shows raw momentum/quality/low-vol scores with a 24h disk cache at~/.cheetahclaws/trading/factors_cache.json(S&P 100 takes ~1-2 min to scan, parallel ThreadPoolExecutor with 4 workers). New/trading anomaly [SYMS]runs three independent checks per ticker: volume spike (today vs 90d median ratio ≥ 2×), price gap (open vs prior close ≥ 3%), volatility regime z-score (5d realised vol vs 90d distribution ≥ 2σ). New/trading monitor scanruns one full monitoring cycle — anomaly detection + stop-loss/take-profit hits on open paper trades + earnings within 3 days for any held position + new SEC Form 4 filings since last scan (delta detection persisted in~/.cheetahclaws/trading/monitor_state.db);--notify [telegram] [slack] [wechat]dispatches structured alerts (severity-tagged: critical/warning/info) through cheetahclaws's existing bridge layer. Honest framing on "real-time" in the docs: yfinance is 15-20min delayed for free tier, so polling more often than every 5-10 min is wasted effort; three scheduling options documented (manual, external cron,/monitorintegration). Newuniverse.pyships hardcoded S&P 100 (~7-8% drift/year, refresh quarterly) + 11 SPDR Select sector ETFs + curated top-10 holdings per sector ETF for sector_rotation. The discovery layer also fixes a real gap in the system prompt: the LLM didn't know what/trading discoveretc. existed, so when users asked "can you find me good stocks" it confabulated; the dynamic_render_commands_blockfrom earlier session now picks up the new subcommands automatically. Tests: 21 new pytest cases intest_trading_discovery.pycovering universe resolution, factor scan + score with stubbed yfinance, insider cluster threshold logic, momentum-quality intersection, sector rotation top-sector picking, orchestrator multi-source merge + bonus, anomaly triple-check (volume/gap/vol-regime), ranker factor+discovery combination, monitor alert rendering + dispatch + end-to-end scan with stubbed market data. 960 passing (939 baseline + 21 new), zero regressions; golden system-prompt fixture regenerated. Honest disclaimer in PLUGIN.md and trading.md: discovery reduces search cost, not generates alpha — the named factors (momentum, value, quality) are well-known and largely priced in by quant funds; what users get is a 100-ticker → 15-ticker triage list to spend tokens on, plus structured discipline (anomaly detection, stop monitoring, earnings calendar) that's hard to do by hand. Form 4 transaction direction is NOT yet parsed from XML (we count filings, not buys vs sales); URLs included so user verifies in 5 seconds. Insider direction parsing is on the roadmap but requires reliable XML scraping of SEC archives across version drift. -
May 8, 2026:
/tradingv3 — paper-trade tracker, calibration, managed$Xportfolios, alt-data, MV optimizer, ML stacker, walk-forward, broker abstraction. A two-stage upgrade that turns the trading module from "ask LLM about a stock" into a measurable research substrate. Stage 1 (the discipline layer): every/trading analyzerecommendation is auto-recorded as a paper trade (~/.cheetahclaws/trading/paper_trades.db) — long and short signals account correctly./trading calibrationaggregates closed trades by confidence + signal and reports hit rate + mean return + a t-stat vs zero baseline; if 30+ closed trades show HIGH conviction not outperforming LOW, the agent's confidence label is noise and the diagnosis fires./trading verifyenforces hard risk rules (single-name 5% / sector 25% / total exposure 80% / stop 1-10% / earnings blackout 3 days → cap 2.5%) reading the live paper book — fixes the "LLM forgets its own rules" problem. The analyze prompt now auto-injects macro context (SPY/QQQ trend + VIX regime + 10y headwind, 30-min cached), earnings calendar warnings (🚨 if reporting within 7 days), and the current paper-book exposure so the LLM doesn't double-down on a sector already at 30%./trading walkforwardruns rolling out-of-sample chunks with a STABLE/MIXED/FRAGILE/INCONCLUSIVE verdict, replacing the dishonest aggregate backtest./trading scandoes a coarse heuristic sweep (RSI / 50d / 200d) over the watchlist before spending tokens on a real analyze. Stage 2 (the autonomous + alpha-research layer):/trading reviewruns a multi-agent debate on existing positions and emits structuredACTION ID=… DECISION=HOLD|ADD|TRIM|EXIT …rows for each./trading manage start hundred 100creates a virtual$100portfolio backed by a SQLite-cleanly-namespacedPaperBroker;/trading manage step hundredruns one mean-variance rebalance cycle (scipy SLSQP, long-only, single-name + sector caps),/trading manage report hundredprints a markdown PnL report with equity curve — this is the canonical "I give the agent $100, check in a week" workflow./trading optimizeexposes the same MV solver standalone. The alt-data layer auto-injects three sources LLM analysis can actually add value on: SEC EDGAR Form 4 insider transactions (urllib, no API key, free), LLM-scored yfinance news headlines via the auxiliary cheap model (-10..+10per headline aggregated to BULLISH/MIXED/BEARISH), and Google Trends search interest (soft-fails ifpytrendsnot installed). The broker layer has a tinyBrokerBackendprotocol with two backends —PaperBrokerworks out of the box,IBKRBrokeris a stub with full setup docs (pip install ib_insync+ IB Gateway config +connect()); the abstraction means switching from paper to live is one line when the user is ready./trading ml trainbuilds a LightGBM (or sklearnGradientBoostingClassifierfallback) classifier on closed paper trades — features are LLM signal one-hot + confidence ordinal + position size + stop / take profit + sector one-hot, label is "did this trade beat zero"; reports cross-validated AUC and feature importance, persists to~/.cheetahclaws/trading/ml/stacker.pkl. The_CMD_METAregistry is also auto-populated frommodular/-loaded commands now (closed a pre-existing bug where/trading,/video,/voice,/ttswere callable but invisible to/help, tab-completion, and the system-prompt slash-command index — the LLM literally couldn't see its own subcommands). Tests: 46 new pytest cases acrosstest_trading_pipeline.pyandtest_trading_advanced.pycovering paper-trader CRUD, long/short PnL math, Phase-5 parser permissiveness, calibration aggregation, verifier 8-branch enforcement, macro/earnings/insider/sentiment/trends soft-fail behavior, MV optimizer constraints, broker buy/sell/avg-cost round-trip, IBKR stub setup-required diagnostic, end-to-end$100→step→status→report lifecycle with mocked quotes, ML feature engineering + train + predict, and the position-review prompt format. 939 passing (893 baseline + 46 new), zero regressions; golden system-prompt fixture regenerated. Also fixed a banner-rendering bug where the welcome box's right border was missing on every middle line (cheetahclaws.pynow computes inner width from plain-text length and pads each row to close with│regardless of model-name length). Honest disclaimer in the docs and PLUGIN.md: this is a research and discipline tool, not a money printer — public-data + LLM analysis does not have predictive edge over quant funds; the value is information aggregation, programmatic risk discipline, and empirical accountability. Run paper for ≥3 months with green calibration + walk-forward before considering an IBKR live account; small accounts (<$1k) have unfavorable fixed-cost economics in real life regardless of strategy. -
May 7, 2026:
/themeslash command — 15 console color presets + post-merge UX fixes (PRs #92, follow-up). Adds a curated palette system toui/render.pyand a new/themecommand:/themelists all 15 presets (default,dracula,nord,gruvbox,solarized,tokyo-night,catppuccin,matrix,synthwave,midnight,ocean,monokai,cheetah,mono,none); each row renders aninfo / ok / warn / errswatch in the row's own theme colors so the listing is a real palette preview, not 15 identical lines in the current theme./theme <name>mutates the sharedCANSI dict in-place so every existingclr() / info() / ok() / warn() / err()call site (~25 files) switches palette without touching any call site, and persists the choice viasave_config()so the next launch re-applies it (early incheetahclaws.py:main(), before the first output).- Per-theme color roles. Each palette declares 4 semantic colors —
accent(info / cyan / blue),ok(success / green / diff +),warn(yellow / magenta),err(red / diff -) — plus a Richcodestyle. Picking 4 hexes per theme meansinfo()andok()are always visually distinguishable, andrender_diffkeeps semantic colors (green = add, red = remove) under every theme. The original PR collapsed cyan/green/blue to a single accent color, makinginfo()andok()indistinguishable and turning diff additions into the accent color (purple under dracula, yellow under gruvbox, magenta under synthwave) — the follow-up split them apart. CODE_THEMEis now actually consumed._make_renderable()inui/render.pypassescode_theme=CODE_THEMEtorich.markdown.Markdown, so Rich code-block syntax highlighting tracks the active theme (the original PR setCODE_THEMEbut never plumbed it through — it was dead code).nonetheme is genuinely uncolored (clears every key inC, includingreset, to""soclr()returns plain text).monois genuinely grayscale (4 distinct gray levels for accent/ok/warn/err — the original PR hardcodedC["red"] = "\033[38;5;196m"regardless of theme, breaking both).- Tests: 9 new pytest cases (
tests/test_theme.py) covering schema validation, unknown-theme rejection,info/okdistinguishability across all themes, diff-color distinguishability,none-as-plain-text,CODE_THEMEtracking,apply_themeidempotency across state, and the Rich Markdowncode_themeround-trip. 893 passing, zero regressions on the 884 pre-existing.
-
May 7, 2026 (v3.5.78): Research lab Phase A — unattended multi-day research; WeChat smart-reply +
/draft; reliability + UX hardening. Three feature areas + a reliability pass.- Phase A: research lab as a 24/7 workflow (
research/lab/{resume,iterate,backlog,daemon}.py). The orchestrator in v3.5.77 was single-shot; this commit makes it iterative + queueable./lab resume <run_id> [<stage>]— rebuildsLabStatefrom SQLite (reconstruct_state): RQs from therqartifact, survey/outline/results/draft from theirlab_artifactsrows, latest experiment code fromexperiment_code_v<N>, latest sandbox result fromlab_experiments(synthesises aSandboxResultsince the original tempdir is gone),skip_experimentfrom PI's "skip experiment" decision message. Optional<stage>rewinds: artifacts produced at or after the target stage are intentionally dropped from the in-memory state so the orchestrator regenerates them; earlier artifacts are kept. Old artifact versions remain in storage for audit./lab iterate <run_id>— 3-reviewer self-review (defaultlab_iterate_reviewers=3) reads the finalreportartifact, scores 4 dimensions on 1-10 (novelty,rigor,clarity,evidence). Aggregated per-dim mean → overall = mean of 4 dims. The lowest-scoring dim picks which stage to rewind to viaDIMENSION_TO_STAGE:novelty→QUESTIONING,rigor→IMPLEMENTATION,clarity→DRAFTING,evidence→EXPERIMENT. Loops untilscore ≥ target_score(default 7.0),max_iterations(default 5), plateau (|delta| < 0.3for 2 consecutive), or run budget. Every iteration audited in newlab_iterationsSQLite table (target, score, breakdown, delta, revise_stage, status). Score parser is permissive (regex matches\d+(?:\.\d+)?) + clamped to [0,10] so11/10doesn't poison the average./lab backlog add <topic> [--iterate] [--target=N] [--max=N] [--prio=N]+list / remove / clear— newlab_backlogSQLite table (auto-incrementing id, priority desc + added_at asc ordering).claim_next_backlog()is atomic (SELECT...LIMIT 1+UPDATE...status='running'in one txn) so two daemons against the same DB don't double-process. Parser rejects unknown tokens after the flag block (/lab backlog add "..." --max=5 startno longer silently appendsstartto the topic)./lab daemon start / stop / status— singleton-protected single-worker loop (run_backlog_worker) that claims items, runsrun_one_lab_session, optionally runsiterate_until_converged, marks done|failed. On startup the daemon callsreset_running_backlog()to unstick rows a previous crashed daemon left inrunning. Stop is cooperative — current stage finishes before exit./lab models— prints the resolved per-role model (PI / questioner / surveyor / designer / engineer / analyst / writer / reviewer × 3 / lay_reader = 11 roles) + which env-var drove the choice + ● for explicit overrides + warning when reviewers span <N model families (same-source rubber-stamping kills the meta-loop signal).- Human-readable output paths.
~/.cheetahclaws/research_papers/<run_id>/was opaque — replaced with~/.cheetahclaws/research_papers/<YYYY-MM-DD>_<HH-MM>_<topic-slug>_<run_id_short>/(e.g.2026-05-08_14-30_post-transformer-architectures-comparative-survey-2026_b16036de/). Slug is ASCII-alnum + hyphen, ≤60 chars cut at word boundary; CJK-only topics fall back tountitled(run_id-suffix still keeps it unique)./lab migrate-paths [--apply]is idempotent, dry-run by default, never overwrites existing targets, lists unknown legacy dirs separately.
- WeChat smart-reply panel (
bridges/wechat_smart_reply.py,..._store.py). Inbound from a whitelisted contact triggers the auxiliary cheap model to draft 3 candidate replies → push panel tofilehelper(文件传输助手) with a 2-letter ID like[AA]. User replies with1/2/3to send, freeform text to customise,xto skip,qto list pending panels,AA 1to address a specific panel. Confirmed sends append towx_reply_historyand feed style mimicking on subsequent panels. SQLite at~/.cheetahclaws/wx_smart_reply.db(auto-fallback to in-memory on init failure); contacts JSON at~/.cheetahclaws/wx_contacts.json(mtime-hot-reloaded; missing file = empty store). 6 new config keys (wechat_smart_reply,_whitelist,_groups,_groups_at_only,_timeout_s,wechat_self_uid). Architectural fix: bot owner's own uid is auto-recorded on first non-filehelper, non-group inbound, andis_smart_reply_target()always returns False for that uid — so your own messages reach the agent even if you accidentally put yourself in the whitelist (which the iLink ClawBot architecture makes easy to do). /draft <message>slash command (commands/advanced.py,bridges/draft_cache.py). Semi-automatic reply path for the iLink-ClawBot architecture where the bot is a separate account (so the bot can't see your main-account inbound). Auxiliary model drafts 3 candidates; optionally tone-conditioned via@<uid_or_label>againstwx_contacts.json. When invoked from a bridge channel, candidates are echoed back to the originating WeChat / Telegram / Slack uid + stashed inbridges.draft_cache(per-uid, 10-min TTL, one-shot). The bridge inbound handler (inbridges/wechat.py) checks digit-only replies against the cache before the smart-reply path, so1/2/3after a/draftreturns just the chosen line — no agent invocation, no smart-reply panel triggered.- Reliability + UX hardening.
research/http.py429-aware backoff: separate schedules for 5xx/timeout (0.5/1/2/4s) vs 429 (10/30/60/120s); honoursRetry-Afterheaders (seconds or HTTP-date form), capped at 180s. Default retry budget bumped 2 → 4 (academic APIs hit 429 routinely on busy queries)._parse_retry_after+_backoff_secondshelpers covered by 8 new pytest cases.- Surveyor grounding (
research/lab/orchestrator.py:_stage_survey). Before invoking the surveyor LLM we now runresearch.aggregator.research()ontopic + selected_RQ(academic + tech buckets, top 30 hits, no model-synthesis). Top hits formatted as[N] (source) Title / URL / snippetblocks (≤8KB), passed as context, prompt instructs surveyor to cite from this list rather than memory. Search hits persisted assurvey_search_hitsartifact for reviewer-replay determinism. On any aggregator failure (no Tavily/Brave/etc. key, all sources 429, network down) surveyor logs a diagnostic note ([grounding skipped] aggregator returned 0 results, per-source: arxiv: 429, tavily: KEY_MISSING, ...) and falls back to the original prompt-only path. Reduces fabricated citations significantly on tested topics; verifier still catches the rest. _dedupe_self_repeatin_invoke()trims trailing self-repetition emitted by cheap / quantised models (text == text+textexact-halves match, or first-200-chars recur in back half with ≥80% normalised match). Sanity floor: never trim below 30% of original length. Why this matters: gpt-5-nano on the lab baseline produced PI rationale messages and RQ lists that appeared twice concatenated; without dedup these doubled inputs went into every downstream prompt, eating context and confusing the surveyor / writer._extract_numberedsimilarly dedupes by content (first 80 chars whitespace-collapsed lower-case) so a1..5\n1..5re-emission keeps 5 unique items, not 10.- Verifier hard timeout (
research/lab/verifier.py:verify_citations). Per-citation hard wall-clock cap (default 30s) enforced viaconcurrent.futures.ThreadPoolExecutor+future.result(timeout)so a hung urlopen() is interrupted at the Python level — socket-level timeout alone doesn't fire on slow-loris servers (we observed an 11-minute hang on arxiv in the field). Fresh single-worker pool per citation +shutdown(wait=False)so a hung worker doesn't queue-block subsequent citations (it leaks as a daemon thread, dies with the process). Stage-level cap (default 5 min) — citations not processed when budget exhausts get markedverification_skippedso finalization still produces a report.progress_cb(i, n, status)wired to averifiermessage in the run log so/lab logs <run_id>shows[3/12] verified,[5/12] hard timeoutetc. - REPL ergonomics.
/lab daemon start+/lab startprint the eventualreport.mdpath up front (no more "where did my report go?" friction). Stage transitions stream live to the terminal as the orchestrator runs (↳ /lab daemon ► [run_id] survey)./lab status <run_id>shows both new-format and legacylab_xxx/paths so users can find old reports without manual digging./configparses JSON-style values (lists, dicts, signed numbers, quoted strings) —/config wechat_smart_reply_whitelist=["wxid_..."]is no longer silently saved as a string. Leading whitespace before/is stripped at the REPL loop so« /lab daemon start»(paste with a stray space) hits the slash dispatcher instead of being routed to the agent — saves the user from a confusing failure on local cheap models that hallucinate tool-call syntax as text when asked to "run /lab daemon start".
- Tests: 884 passing in 95 seconds (842 unit/integration + 22 e2e), zero regressions on the prior 669-test baseline. ~80 new pytest cases covering: iteration scoring (parser permissiveness + clamp + dim averaging + weakest-dim routing), state reconstruction (full + rewind, all 9 stages), backlog CRUD + atomic claim + reset-running, daemon singleton semantics, verifier per-citation + stage-level timeouts, slug edge cases (Chinese, max-len, word boundary),
_dedupe_self_repeat(exact halves, prefix recurrence, sanity floor, no-op clean text),_extract_numbereddedupe, self-uid bypass for smart-reply, draft cache one-shot + TTL, Retry-After parsing (seconds + HTTP-date + None), backlog parser strict mode.
- Phase A: research lab as a 24/7 workflow (
-
May 7, 2026 (v3.5.77): MCP HTTP/SSE transport + OAuth 2.0 PKCE,
.envloader,ANTHROPIC_ENDPOINTcorporate-proxy override, AskUserQuestion UI polish (#88, #89). Three loosely related improvements landed together:- MCP HTTP / Streamable-HTTP / SSE transport (
mcp_client/client.py).HttpTransportnow handles three response shapes: plain JSON, Streamable-HTTP (POST returns an SSE stream — read firstdata:line), and bidirectional SSE with a session endpoint. The defaultAcceptheader isapplication/json, text/event-streambecause servers like sap-jira return 406 when only one is advertised. On a401from the resource URL, anOAuthSessionis initialised lazily, the access token is injected asAuthorization: Bearer <token>, and thehttpx.Clientis rebuilt under a dedicated_oauth_lockso two concurrent 401-retries can't race on close+create. Server-name sanitization onMCPManagerkeys lets hyphenated names likegithub-toolsresolve correctly through the qualifiedmcp__server__toolpath;add_server,connect_server, andreload_serverall sanitize the lookup key the same way the parser does. - OAuth 2.0 PKCE flow (
mcp_client/oauth.py). Full MCP Authorization spec: RFC 9728 resource-server discovery (tries/.well-known/oauth-protected-resource/<path>then the bare path), RFC 8414 AS metadata, RFC 7591 dynamic client registration when noclient_idis configured, Authorization Code + PKCE (S256) with a local 127.0.0.1 callback HTTP server, refresh-token rotation, and atomic token persistence to~/.cheetahclaws/mcp_oauth.jsonwritten via a.tmpswap thenos.replace, with the file at mode0600and the parent directory at0700. The redirect-URI port is picked once and reused for both registration and the callback (otherwise strict OAuth servers rejectredirect_urimismatch). Scope is sourced from the AS's advertisedscopes_supported— preferringmcpif listed, otherwise the first one, otherwise thescopeparameter is omitted entirely so servers without anmcpscope no longer reject withinvalid_scope.statemismatch anderrorquery params surface as runtime errors; the callback browser tab confirms auth completion. - REPL surface (
commands/advanced.py,cheetahclaws.py). New/mcp add <name> --transport http <url>and--transport sse <url>for one-line HTTP/SSE registration; explicit/mcp listsubcommand; tool descriptions in/mcpoutput now wrap at 72 cols viatextwrap.wrapinstead of a hard 60-char slice./helpadvertises the new subcommands. .env+ANTHROPIC_ENDPOINT(cheetahclaws.py,config.py,providers.py,commands/core.py)._load_env()parses<repo>/.envat the very top ofcheetahclaws.py— before any other import readsos.environ— supporting bothK=VandK="quoted V", ignoring#comments, and usingos.environ.setdefaultso existing shell vars always win.config.pyreadsANTHROPIC_ENDPOINTfromos.environand unconditionally writes it tocfg["anthropic_endpoint"](env var beats persisted JSON), defaulting tohttps://api.anthropic.comwhen neither is set.providers.pypassesbase_url=cfg["anthropic_endpoint"]toanthropic.Anthropic; the/doctorand onboarding probes hitf"{_ant_base}/v1/messages"via the same value. Net effect: a corporate proxy can replaceapi.anthropic.comcleanly across streaming, health checks, and onboarding without touching~/.cheetahclaws/config.json. MCP HTTPheadersvalues now also pass throughos.path.expandvars, so"Authorization": "Bearer $GITHUB_TOKEN"works after the.envloader has populatedos.environ.- AskUserQuestion UI polish (
agent.py,ui/render.py,tools/interaction.py,cheetahclaws.py). AskUserQuestion is now in the auto-approve set alongsideEnterPlanMode/ExitPlanMode— it's an interactive tool by definition, the permission gate was redundant.print_tool_startandprint_tool_endearly-return for AskUserQuestion so the spinner and→ N lines (M chars)summary don't appear;_tool_descadds a short preview of the first question. The question itself is rendered throughclr()with Markdown stripped (**bold**,`code`,*italic*removed in that order so***x***collapses correctly), option indices are cyan, descriptions dim. The REPL prompt now prints a full-width─rule viaos.get_terminal_size()(80-char fallback) before each input, matching Claude Code's visual rhythm.
- MCP HTTP / Streamable-HTTP / SSE transport (
-
May 5, 2026: Telegram bridge — file round-trip + clickable permission prompts (fixes #84). Two missing code paths in
bridges/telegram.pyproduced both halves of the issue: (1) the bridge only had_tg_send(text viasendMessage), so--accept-allmade no difference — when the model claimed it had "sent a file" it was just text, and there was no multipartsendDocumenthelper, no inbounddocumenthandler, and no way for the agent to emit a file; (2) permission prompts arrived as text containing[y/N/a(ccept-all)]that looked clickable but weren't, because the poll loop only listened formessageupdates and there was no inline-keyboard rendering path. Patch:_tg_send_document(token, chat_id, file_path, caption=None)—multipart/form-dataupload assembled by hand becauseurllib's JSON-only path can't carry binary bodies. 49 MB ceiling (Telegram's hard limit is 50 MB; the headroom catches encoding overhead). Six explicit failure modes, each surfaces a specific error to the chat: missing file, stat failure, empty file, oversize, network exception, APIok: false(description forwarded verbatim).- Inbound
documenthandler in_tg_poll_loop— downloads viagetFile, sanitizes filename to[A-Za-z0-9._-]_to keep the save path safe, writes to/workspaceif mounted (Docker scenario) ortempfile.gettempdir()otherwise, echoes the saved path back to chat, and submits a path-aware prompt to the agent ("I just uploaded a file at <path>. Please review it."— overridden by caption if present). !sendfile <absolute_path>— explicit user-driven send, runs in a daemon thread so the poll loop doesn't block on uploads. Strips backticks/quotes around the path.- Auto-send on
Write—_bg_runner._on_tool_startrecords the in-flightfile_pathforWritecalls;_on_tool_endmails it (FIFO-paired so parallel writes match correctly). Skipped when the result starts withError:orDenied:. De-duplicated per turn via a_sent_files: set[str]so the agent retrying the same path doesn't double-mail. - Permission UX across every channel (
[approve][reject]is now actually pickable everywhere). Issue #84 also flagged that permission prompts looked like buttons but weren't; fixed in the same patch and extended cross-bridge so the experience is consistent regardless of where the user is.ask_input_interactive(prompt, config, options=[(label, value), …])is the new contract;ask_permission_interactivepasses[("✅ Approve", "y"), ("❌ Reject", "n"), ("✅✅ Accept all", "a")]and every channel renders an interactive picker:- Telegram — real
inline_keyboardbuttons.callback_dataiscc:<prompt_id>:<value>whereprompt_idis a fresh 8-char id;_tg_poll_loop'sallowed_updateswidened to["message", "callback_query"]and the new_handle_callback_query(token, chat_id, cb, session_ctx)performs auth check (chat_id match),answerCallbackQueryto clear the click spinner, prompt-id validation (stale clicks on older prompts silently dropped so two rapid permission prompts cannot bleed into each other),editMessageTextappending✓ Selected: <value>for visible scroll-back, and finally firestg_input_event. Markdown failure on the prompt body falls back to a no-parse_modekeyboard send; total failure falls back to plain text — and the menu block embedded in the prompt body keeps that path usable. - Slack / WeChat — numbered menu rendered into the message body (Slack header
❓ Input Required, WeChat header❓ 需要输入). The message reads[1] ✅ Approve (reply 1 or y)etc.; the user replies with the digit, the canonical letter, or any label word (approve/reject/accept/all). All three reply forms normalize to the canonical value before the caller sees them. - Terminal — same numbered menu printed above the input cursor, same digit / letter / label-word reply normalization.
- Web (chat API) — untouched; the existing browser approval UI handles this.
The cross-bridge wiring lives in three pure helpers in
tools/interaction.py:_format_menu_block(options)(numbered text rendering),_build_value_map(options)(digit + canonical-value + label-word lookup table, first-write-wins on collisions), and_resolve_choice(raw, value_map)(whitespace-trimmed, case-insensitive lookup; pass-through for unknown replies so free-text fallback still works). Backward-compatible: every existingask_input_interactivecaller (and there are many —/checkpoint,/session,/agent,/config,/voice) passes nooptions=and gets exactly the same free-text behavior as before. NewRuntimeContextfields:tg_callback_prompt_id: strandtg_callback_message_id: int.
- Telegram — real
- Tests — 49 new pytest cases.
tests/test_telegram_bridge.py(27):urllib.request.urlopenand_tg_apimocked;threading.Threadmonkeypatched to a synchronous stub for the auto-send hook; an end-to-end test drivesask_input_interactive(options=…)from a worker thread, simulates a click via_handle_callback_query, and asserts the worker returns the clicked value. Coverage: text splitting + Markdown fallback, multipart body assertions (chat_id, UTF-8 caption, filename, raw bytes), all six_tg_send_documentfailure paths, four_bg_runnerWrite variants, four_tg_send_keyboardpaths, five_handle_callback_querypaths, two end-to-end click variants.tests/test_options_menu.py(22): the three pure helpers (rendering / value map / resolution; including emoji-stripped label tokens, case-insensitivity, whitespace, non-string defensive paths, first-write-wins on collisions), plus per-bridge worker-thread end-to-end for Slack (4 reply forms), WeChat (2 reply forms), terminal (digit / label / canonical / no-options regression). Full suite: 718 passed in 43s, no regressions on the 669 pre-existing tests.
-
May 3, 2026: Research Lab — autonomous multi-agent paper writing with sandboxed experiments + web UI. New
/labslash command (CLI) and/labpage (web) drive 9 specialised agents through 9 stages — questioning, literature survey, outline, code drafting, sandboxed Python execution, analysis, paper drafting with reviewer iteration, citation verification, finalisation — until convergence or budget exhaustion. Output is a Markdown report with verified citations + BibTeX bundle + (when the topic admits experiments) the engineer's runnable script and any plots produced. Targets arXiv-grade preprint quality, not 顶会; honest about the LLM-substrate ceiling. Branch:feature/research-lab(PR pending).- 9 agents, deliberately heterogeneous models. PI / Questioner / Surveyor / Designer / Engineer / Analyst / Writer / Reviewer × 3 / Lay Reader. The reviewer pool defaults to 3 different provider families (Claude / GPT / Gemini, etc.) when API keys are available, to reduce the same-source rubber-stamping that plagues homogeneous multi-agent debate. Per-role model overrides via
lab_role_override. - 9-stage state machine: QUESTIONING → SURVEY → OUTLINE → IMPLEMENTATION → EXPERIMENT → ANALYSIS → DRAFTING → VERIFICATION → FINALIZATION. Each producer-stage is followed by reviewer-author iteration with a 2/3-reviewers-pass quorum (default), max 5 rounds force-advance, and a "0/3 for 3 rounds → redesign" early bail. PI breaks ties.
- Real experiments via subprocess sandbox.
research/lab/sandbox.pyruns the Engineer's Python script with a 180 s timeout, 4-min CPU rlimit, 2 GB AS rlimit, dedicated workspacecwd, andMPLBACKEND=Aggso matplotlib plots without a display. On non-zero exit the Engineer is fed the stderr and revises (max 3 attempts). The Analyst then parsesRESULT: {...}JSON lines from stdout into a Results section so the Writer doesn't get to invent numbers. v0 isolation only — not a hostile-code boundary; Docker is Phase 2.5. - Citation verifier — three APIs, four states. Each citation in the final draft is checked against arXiv → Semantic Scholar → CrossRef in priority order. Jaccard title similarity ≥ 0.55 + last-name set overlap ≥ 0.5 to count as
verified. The four-state outcome (verified | ambiguous | not_found | verification_skipped) explicitly distinguishes "we found this isn't real" from "we couldn't reach the network"; the latter never gets recorded as a fabrication signal. - SQLite persistence at
~/.cheetahclaws/research_lab.db(separate file from the daemon'ssessions.dbso neither interferes with the other). Six tables:lab_runs,lab_stages,lab_messages,lab_artifacts,lab_budget,lab_experiments. State survives acheetahclawsrestart in principle; auto-resume is Phase 2.5. - Web UI at
/lab. Single vanilla-JS page (no build step, no React) that talks to/api/lab/*JSON endpoints; auto-polls every 5 s while a run is open; renders the final report inline with a mini Markdown renderer; auto dark/light mode. The HTTP layer (web/lab_api.py) slots into the existing stdlib HTTP server with one dispatcher branch — no FastAPI/Flask dep. - Realistic positioning, stated explicitly. Sakana AI Scientist, Stanford Agent Lab, and similar prior work all hit a ceiling near rejection-line ICLR; this lab inherits that ceiling. The product target is arXiv-grade preprint, not 顶会, and the docs say so up front. Dominant residual failure mode is fabricated citations passing title-match but with subtle author/year errors — the verifier catches most but human review of references is non-optional.
- What's deliberately not in v0 (tracked in
docs/guides/research-lab.md): multi-tenant isolation, GPU pool, Docker sandbox, LaTeX rendering, reference-manager integration, plagiarism check, real-time SSE updates, billing,/lab resume. - Tests: 54 cases for the lab (storage, convergence, verifier, sandbox, orchestrator-with-stubbed-LLM end-to-end, web routes), full suite 701 passing on
feature/research-lab. Pricing — single run typically $2-5 (survey-style, no experiments) to $5-15 (with sklearn-scale experiments) using Claude Sonnet + GPT-4o + Gemini Pro mixed.
- 9 agents, deliberately heterogeneous models. PI / Questioner / Surveyor / Designer / Engineer / Analyst / Writer / Reviewer × 3 / Lay Reader. The reviewer pool defaults to 3 different provider families (Claude / GPT / Gemini, etc.) when API keys are available, to reduce the same-source rubber-stamping that plagues homogeneous multi-agent debate. Per-role model overrides via
-
May 2, 2026: Daemon foundation lands —
cheetahclaws serveis real. F-1 of the 9-PR roadmap merged via PR #80, on top of a re-landed spike (PR #81) that the RFC 0001 contract code lives in. End users see no new feature yet — F-1 ships the headless daemon plus itscheetahclaws daemon {status, stop, logs, rotate-token}control surface, but no service runs inside it; that's F-2..F-8.- Recap of how the spike landed. PR #77 (the spike) merged then immediately reverted (3183fc6) to avoid pre-empting @mxh1999's foundation design. Once @mxh1999 opened PR #80 explicitly built on top of the spike, the revert was undone via PR #81 (
Re-land daemon spike for #80 (un-revert 3183fc6)) so the F-1 PR could merge cleanly without a delete-vs-modify conflict. End state onmain: spike + foundation + verified. - What's runnable now.
cheetahclaws serve --listen tcp://127.0.0.1:8765 --print-tokenboots the daemon;cheetahclaws daemon statusreports pid / transport / address / uptime /system.pingoutcome from the discovery file at~/.cheetahclaws/daemon.json;cheetahclaws daemon stopcallssystem.shutdownover RPC and falls back to SIGTERM /TerminateProcess;cheetahclaws daemon logs [-n N]tails~/.cheetahclaws/logs/daemon.log;cheetahclaws daemon rotate-tokenregenerates the TCP bearer token. The legacycheetahclaws spike-daemon ...from the spike-notes is preserved as a backward-compat alias. - Behavior change worth flagging.
/healthz,/readyz,/metricsare now auth-gated by default per RFC 0001 §3 — the spike returned them unauthenticated as a stub. Prometheus / external scrapers opt out viacheetahclaws serve --unauthenticated-metrics(off by default; documented as a deliberate weakening with a one-line warning at startup). - Polish nits surfaced during smoke and fixed in a follow-up. (1)
daemon.jsonnow optionally records atoken_pathfield whenserve --token-pathoverrides the default, socheetahclaws daemon status / stop / rotate-tokenfind the token the daemon is actually using instead of failing 401 against the default location. (2)python -m daemon.cli --help(and thecheetahclaws spike-daemon --helpalias) now print a usage banner and exit 0 instead ofunknown subcommand: --help/ exit 2; unknown subcommands also include the banner so users see how to recover. (3) The serve-mode startup prints (token: …,cheetahclaws daemon listening on …,audit log: …) nowflush=Trueso they appear immediately when stdout is redirected to a file under&— previously they sat in Python's 4KB block buffer until the daemon exited. (4)tests/e2e_daemon_skeleton.pytoken-length floor raised from 32 to 40 so an accidental shrink to 16 raw bytes (~22 chars) would break loudly. - Tests: 669 passing on
main(637 unit + 22 daemon-only e2e + 10 polish-fix unit tests).pytest tests/ -q. - What's NOT in F-1, intentional. No
agent.runintegration (session.sendexposes only the demoecho.pingfrom the spike + the contractsystem.pingfrom the foundation). No bridges in the daemon (Telegram/Slack/WeChat are F-6/7/8). No SQLite event store (in-memory ring from the spike survives until F-2). No cost guardrails (F-9). No subprocess-per-agent runner (F-4). macOS peer-cred still left asTODO(macos)indaemon/auth.py.
- Recap of how the spike landed. PR #77 (the spike) merged then immediately reverted (3183fc6) to avoid pre-empting @mxh1999's foundation design. Once @mxh1999 opened PR #80 explicitly built on top of the spike, the revert was undone via PR #81 (
-
Apr 30, 2026:
[spike]daemon foundation reference scaffolding — validates RFC #74 end-to-end.- What landed. A new
daemon/package (~1.1k LoC across 9 files, plus 360 lines of pytest) that implements the contract surface defined indocs/RFC/0001-daemon-design-note.md:ThreadedTCPServer/ThreadedUnixServer, JSON-RPC 2.0 dispatcher onPOST /rpc, SSE onGET /events?since=<id>, LinuxSO_PEERCREDpeer-cred + bearer-token auth, audit log,client_idmint/persist/resume, originator-only permission answer, 30 min interactive timeout withpermission.refresh_timeout. Three demo methods (echo.ping,permission.demo,permission.answer) prove the model without draggingagent.runintegration into the spike scope. - Why a spike before the foundation PR. RFC #74 was merged with 9 must-fix follow-ups from review (threading model, SSE heartbeat,
client_idlifecycle,session.sendsemantics, API versioning, event retention, audit-log default-flip, interactive timeout, macOS peer-cred). Rather than re-litigate them in a doc PR, the spike puts every "✓" item from the review matrix into runnable code; mxh1999's foundation PR can then rebuild on the contract or replace the throwaway parts. Coverage matrix indocs/RFC/0001-spike-notes.md. - What it deliberately is not. No
agent.runwiring, no bridge migration, no SQLite event store, no cost guardrails, no agent-runner subprocess, no metrics. macOSSO_PEERCREDis punted with aTODO(macos)— the spike runs Linux-only. - Surprises worth flagging for the foundation PR (full list in spike notes): stdlib
HTTPServerdefaultsrequest_queue_size=5— long-lived SSE connections cause new TCP connects to wait on a full-second SYN retransmit; bumped to 256.BaseHTTPRequestHandlerdefaults to HTTP/1.0, socurl --no-bufferwon't print SSE bytes until the connection closes;EventSourceandhttp.client(used by tests +daemon.spike_client) are unaffected.SO_PEERCREDucred struct format isiII(signed pid, unsigned uid/gid), not the older docs'3i. Originator persistence is whole-file rewrites for the spike — foundation PR should swap for the SQLite originator schema. - How to try it.
cheetahclaws spike-daemon serve --listen tcp://127.0.0.1:8765 --print-token, thenpython -m daemon.spike_client --target tcp://127.0.0.1:8765 ping(orwatchfor SSE tail, orrequest/answerfor the originator-routing demo). Token can be passed via$CHEETAHCLAWS_TOKENto avoid argparse's--token <value>quirk on tokens starting with-. - Tests: 13 cases covering #1, 2, 3, 4, 6, 7, 8, 9 from the review matrix; all green.
pytest tests/test_daemon_spike.py -v. Branch:feature/daemon-spike(draft PR).
- What landed. A new
-
Apr 30, 2026: Docker / home-server support, terminal
AskUserQuestiondeadlock fix, Ollama tool-call payload fix.- Docker (#73) — new
Dockerfile,docker-compose.yml,.env.example,.dockerignoreat the repo root, plus a full walkthrough atdocs/guides/docker.md. Targets the home-server / DGX-Spark scenario: web UI + Telegram bridge running together in one container, talking to an Ollama instance on the host viahost.docker.internal:11434, with./workspacebind-mounted so files can be shared over Samba to your phone or other PCs. Container runs as a non-rootcheetahuser; UID/GID inherit from the host (${UID:-1000}:${GID:-1000}) so files in./workspacestay owned by you.tinifor clean PID-1 signal handling, healthcheck on/api/config,EXPOSE 8080. README's "Web UI" section gets a Docker subsection; "Documentation" table gets a new row. --webauto-starts bridges — previously--webonly spun up the HTTP server andsys.exit-ed, skipping the~/.cheetahclaws/config.jsonTelegram / WeChat / Slack auto-start block in the REPL bootstrap. New helper_start_headless_bridges(config)creates a sharedAgentState, wiressession_ctx.run_queryto a minimal headless driver aroundagent.run(), then starts every configured bridge as a daemon thread in the same process. Docker users get browser UI + phone bridge from a single command; non-Docker--webusers on a remote box get the same. No new flag — same--web, just complete.- AskUserQuestion deadlock fix (#69) — the previous queue +
threading.Eventdesign assumed a separate consumer thread would drain_pending_questionsandevent.set()the agent thread, but the consumer (drain_pending_questions) ran afterrun()returned. Sincerun()blocked inside_ask_user_question'sevent.wait(timeout=300), the drain never reached, the terminal froze for 300 seconds, and Ctrl-C was swallowed byEvent.wait. Bridges (Telegram / WeChat / Slack / Web) only worked because their listener threads calledevent.set()externally; the terminal had no equivalent. Fix:_ask_user_questionis now synchronous — prints the prompt and reads input directly viaask_input_interactive, which already routes correctly for terminal and every bridge. Removed_pending_questions,_ask_lock, anddrain_pending_questions; removed the post-turn drain incheetahclaws.py. Tests rewritten to mockbuiltins.inputover the new sync path. - Ollama tool-call payload fix (#71) — for assistant turns that carry only
tool_callsand no visible text,messages_to_openaiemittedcontent: null(m.get("content") or None). OpenAI accepts that, but Ollama's OpenAI-compat endpoint rejects it with HTTP 400invalid message content type: <nil>. Switched toor ""— empty string is accepted by every OpenAI-compat backend we target. The same 400 used to fall intoErrorCategory.UNKNOWN, which is retryable, so the same broken payload was retried 3× and burned the circuit-breaker budget → 120 s cooldown blocking the entire session even though the request body, not the network, was the problem. NewINVALID_REQUESTcategory matchesBadRequest/400/invalid.?message.?content/malformed.?requestand is non-retryable;urllib.error.HTTPErrorwithcode=400maps to it explicitly; and the hint surfaces a pointer to issue #71 plus a/clearsuggestion when the error string containsinvalid message content type. - Tests: 589 passing. Includes rewritten
TestAskUserQuestion(free-text answer, option selection by number,0→ freetext fallback) and quick verification that_start_headless_bridgesis a no-op without bridge config and starts a Telegram thread whentelegram_token+telegram_chat_idare present.
- Docker (#73) — new
-
Apr 24, 2026: Multi-model prompt adaptation — single shared
default.mdbaseline + tiny per-family overlays. Routing by model family, not provider/runtime. DeepSeek v4 thinking-mode protocol.- Single base + small overlay design.
prompts/base/default.mdis the shared baseline for every model — general prompt-engineering guidance (be concise, parallel tool calls, minimal scope, stop conditions, safe-vs-unsafe action list, etc.) applies to all families. Family-specific quirks live inprompts/overlays/<family>.mdand are appended only when the model has an authoritative, vendor-documented quirk. - Three overlays ship today (each cites the vendor guide it's based on):
claude.md— XML-tag preference for structured output (Anthropic prompt engineering guide).gemini.md— explicit "Agentic Mode (Active)" framing + 4-step explore→verify→act→report loop (Gemini 3 prompting guide).openai-reasoning.md— only matches o1 / o3 / o4 /gpt-5-codex; suppresses "Let me think step by step…" narration since reasoning is internal (OpenAI reasoning best practices).
- Routing by model family, not by provider/runtime. A Qwen-3 model gets the same prompt whether served by Alibaba DashScope, Ollama, vLLM, or OpenRouter.
pick_base_prompt(provider, model_id)matches on the last path segment of the model id, case-insensitive. Tested bytest_runtime_is_irrelevant_for_family_routing. - Overlay-admission policy. Every overlay must (a) cite a vendor prompting guide URL in a top-of-file
<!-- Source: ... -->comment (enforced bytest_overlay_cites_source), (b) not duplicate anything already indefault.md, (c) stay ≤ 20 lines (enforced bytest_overlay_under_line_cap). The unifieddefault.mditself caps at 150 lines (enforced bytest_base_prompt_under_line_cap). - DeepSeek v4 thinking-mode protocol. Streams
delta.reasoning_contentasThinkingChunk; round-tripsreasoning_contentthroughAssistantTurn → neutral history → messages_to_openaiso v4's spec is satisfied when an assistant turn carriestool_calls.config["thinking"]is tri-state —None(default, server-default ON),True(explicit ON),False(explicit OFF, injectsextra_body={"thinking":{"type":"disabled"}}). Bumps DeepSeek context window to 128K and registersdeepseek-v4-pro/deepseek-v4-flash. - Tests: 73 prompt-related cases, 578 unit tests total, all green. New regression guards:
test_dead_family_base_files_are_gone(no per-family base files),test_overlay_cites_source(every overlay grounded in vendor docs),test_env_block_separates_platform_from_git_info(locks thePlatform: Linux- Git branch:whitespace fix). - Architecture refactor lineage. Builds on PR #63 (which split
SYSTEM_PROMPT_TEMPLATEinto per-family files) and consolidates back to single base + overlays after benchmarking showed the per-family duplication was net negative for non-flagship models. Seeprompts/README.mdfor full design rationale.
- Single base + small overlay design.
-
Apr 20, 2026 (v3.5.76): Research pipeline — 20 sources, time-range filter, cross-platform heat table, citations analysis, saved reports, Chinese platforms (B站 · 微博 · 小红书 · 知乎), /monitor trend-tracking, one-click
/ssjwizard, entity extraction, multi-query expansion, side-by-side compare/research <topic>— fans out to 20 sources in parallel: arXiv · Semantic Scholar · OpenAlex · HuggingFace Papers · alphaXiv · Google Scholar · HackerNews · GitHub · Reddit · StackOverflow · Google News · Polymarket · SEC EDGAR · Tavily · Brave · Twitter/X · 知乎 · B站 · 微博 · 小红书. 13 sources work zero-config; 7 optional (need keys or cookies).- Engagement-weighted ranking — each source's native signal (HN points, GitHub stars, Reddit upvotes, citations, HF upvotes, B站播放, 微博赞, 小红书赞, Twitter likes, Polymarket USD volume) is log-normalized against a per-source calibration to a shared 0-1 scale. Blended with a 14-day-half-life recency bonus. Cross-source dedup by URL keeps the highest-engagement entry on duplicates.
- Time range filter —
--range 1d|3d|7d|14d|30d|60d|90d|6m|1y|2y|5y|all(or natural30days,6months,2years) and explicit--since YYYY-MM-DD --until YYYY-MM-DD. Each source translates the window to its native filter: arXivsubmittedDate:[...], Semantic Scholaryear=LO-HI, OpenAlexfrom_publication_date:..., HNnumericFilters=created_at_i>..., GitHubpushed:>..., Redditt=hour|day|week|month|year|all, StackOverflowfromdate=/todate=, Google Newsafter:/before:, SEC EDGARdateRange=custom, Tavilystart_published_date, Bravefreshness=pd|pw|pm|py, Twitter v2start_time/end_time, Google Scholar client-side year filter, HuggingFace / Bilibili / Weibo client-side. Polymarket and Zhihu have no date filter API and are documented as exceptions. - Cross-platform attention table — every brief renders a Markdown table: per-platform result count · top engagement label · median result age · domain. Skipped/failed sources appear too with clear reasons. The LLM synthesis prompt copies this table verbatim and adds 2-3 sentences comparing attention distribution (academic-heavy vs. social-heavy vs. news-heavy).
- Publication trend sparkline + 12-month bar chart — a compact Unicode sparkline (
▁▂▃▄▅▆▇█) across the last 24 months in the brief header; a full per-month bar chart lower down. Built from ALL dated results across academic/news/social sources, giving a single-glance view of where the buzz has moved. - Notable-citer analysis (
--citations) — secondary Semantic Scholar calls on top academic results, pulling citing-paper authors and filtering to those with ≥10k total citations (configurable via--citation-threshold). Surfaces a table with name · affiliation · total cites · h-index · which papers they cited. Adds 2-10 API calls per run; recommended to pair withSEMANTIC_SCHOLAR_API_KEYto escape the anonymous 100-req/5-min limit. - Entity extraction — offline, zero-LLM pattern-matching that scans every pulled result for frequent named entities across four categories: models (GPT-5, Claude-Opus-5, Llama-4, Gemini-2.5-Pro, GLM-5.1, Qwen-3, DeepSeek-V3, Grok, Mistral, Phi, Yi, Kimi, …), benchmarks (MMLU, MMLU-Pro, GSM8K, MATH, HumanEval, HumanEval+, SWE-bench, LiveCodeBench, MMMU, MathVista, GAIA, AgentBench, WebArena, Arena-Hard, FrontierMath, ARC-AGI, GPQA-Diamond, HLE, C-Eval, CMMLU, RULER, LongBench, …), orgs (OpenAI, Anthropic, Google DeepMind, Meta, xAI, Mistral AI, DeepSeek, Moonshot, Alibaba, Zhipu, Tencent, ByteDance, Hugging Face, NVIDIA, 01.AI, AI2, Mila, Stanford, MIT, Berkeley, CMU, Tsinghua, …), and people (from academic result author fields). Counts dedupe within a single result so one spammy abstract doesn't skew the ranking. Renders as a "Top mentioned entities" section directly beneath the heat table — one glance answers "what's everyone talking about?" without the LLM round-trip.
- Multi-query expansion (
--expandor--expand N) — asks the active model to propose 2-6 sibling subqueries (different angles — theory vs. tooling vs. industry deployment vs. controversy — not paraphrases), then runs each in parallel across all sources with proportionally reduced per-source limits. Results merge into the main pipeline (dedup + rank + synth). Example:/research --expand "frontier LLM benchmarks"auto-expands toLLM evaluation methodology,benchmark saturation and contamination,capability measurement frontier models,human preference benchmarks evaluation. Coverage jumps several-fold for broad topics. - Side-by-side compare —
/research compare "topic A" vs "topic B" [vs "topic C"]runs 2 or 3 independent research queries in parallel and produces a unified comparative brief: verdict at a glance · side-by-side heat tables · shared themes · unique strengths per topic · open questions. Citations use prefixed[A-N]/[B-N]/[C-N]markers so readers can trace every claim back to the right topic's evidence pool. Falls back to a deterministic no-LLM rendering with all three heat tables + entity tables when no model is configured. - Auto-save to
~/.cheetahclaws/research_reports/— every/researchand/research comparerun writes two files:<YYYY-MM-DD_HHMMSS>-<slug>.md(rendered brief) +.jsonsidecar (serialized Brief + notable citers + entities). Opt out with--no-save. Explicit export via--save-as PATH. New/reportscommand:list(50 most recent) ·open <id>(print) ·path <id>(print file path) ·delete <id>. - Weekly trend tracking via
/monitor— new topic prefixresearch:<query>(orresearch:<range>:<query>— e.g.research:30d:RLHF) dispatches to the full 20-source pipeline each scheduled run. Supportsdaily/weekly/12h/... schedules and--telegram/--slack/console channels. Each invocation: pulls all 20 sources · filters by the subscription's time window · renders the cross-platform heat table + sparkline as the first digest item · writes a full report · pushes to configured channels. Subscribe via/subscribe research:<topic> weeklyor the/monitorwizard's new "Trend tracker" option. /ssjwizard integration — 3 new menu items for zero-flag operation:16. 🔍 Research— asks topic → time range (1-5) → citations y/N → runs/researchwith right flags17. 📊 Trend Track— asks topic → tracking window → frequency → creates the/subscribe research:<range>:<topic>subscription18. 📁 Reports— opens/reportsbrowser
- Chinese platform sources (4 of them):
- Bilibili (B站) — zero-config search-all endpoint; returns video + article results with 播放/点赞/弹幕/评论 engagement.
[video · 11:55] 彻底搞懂 Transformer · 54,209 播放 · 2,430 赞 · 78 弹幕. - 知乎 Zhihu — v4 search_v3 API, requires
ZHIHU_COOKIE(browser-extractedd_c0; z_c0); returns answers / articles / questions with 赞/评论/关注 engagement. - 微博 Weibo — m.weibo.cn getIndex endpoint, requires
WEIBO_COOKIE(browser-extractedSUB; SUBP); returns posts with 赞/转/评 engagement. Parses relative Chinese time forms (刚刚,5分钟前,2小时前,今天 HH:MM,MM-DD). - 小红书 Xiaohongshu — edith.xiaohongshu.com notes search, requires
XHS_COOKIE(+ oftenXHS_X_S); returns notes with 赞/评/收藏 engagement. Note: Xiaohongshu anti-bot is aggressive; cookies may expire hourly. Fallback: use--sources tavilywith<query> site:xiaohongshu.com.
- Bilibili (B站) — zero-config search-all endpoint; returns video + article results with 播放/点赞/弹幕/评论 engagement.
- Architecture:
research/package:__init__.py,types.py,time_range.py,http.py,cache.py(24h SQLite at~/.cheetahclaws/research_cache.db, keyed on source + query + limit + time range),classifier.py(keyword-based topic→domain routing, zero latency, zero LLM),ranker.py,aggregator.py,synthesizer.py,citations.py,entities.py,reports.py,sources/(20 modules).tools/research.py: exposesResearchtool to agent (13 parameters: topic, domains, sources, limit, time_range, since, until, analyze_citations, citation_threshold, expand, save_as, auto_save, synthesize, use_cache).commands/research_cmd.py:/research(with compare subcommand) and/reports.monitor/fetchers.py:fetch_research()bridges/monitorsubscriptions to the research pipeline.commands/advanced.py: SSJ menu entries 16/17/18 delegate to the right/research//subscribe//reportscommand line.
- Tests (
tests/test_research.py) — 88 tests across 23 sections covering: types, classifier routing, engagement ranker, cross-source dedup, SQLite cache roundtrip + TTL expiry, each of the 20 sources (happy path + schema-shift resilience + missing-key skip behavior), aggregator parallel fan-out + failure isolation + cache integration, synthesizer LLM path + deterministic no-LLM fallback, heat table + sparkline + trend rendering, citations helper, time-range preset + ISO parsing + per-source native mapping, reports save/load/delete/path, Chinese platform parsing (including Zhihu answer/article/question shapes, Weibo relative-date parser, Xiaohongshu localized count parsing), monitorresearch:prefix dispatch + range-prefix form, entity extraction across all four categories + dedup-within-result guarantee, multi-query expansion producing distinct cache keys, compare mode running 2-3 parallel queries + correct prefixed citation markers. - Packaging —
pyproject.tomladdsresearchandresearch.sourcesto the editable packages list so installed binaries can import the new module. - Version bumped to 3.5.76.
-
Apr 18, 2026 (v3.5.75): External plugin discovery via
CHEETAHCLAWS_PLUGIN_PATH+ safer dependency management; end-to-end prompt-cache token tracking across providers-
PluginScope.EXTERNAL— new scope for plugins discovered in-place (never copied to~/.cheetahclaws/plugins/). Complements existingUSERandPROJECTscopes. Use case: shared team/company plugin directories mounted at a common path. -
CHEETAHCLAWS_PLUGIN_PATHenv var — colon-separated (os.pathsep) list of directories scanned for plugin subdirs. Each immediate subdirectory that has aplugin.jsonorPLUGIN.mdis surfaced as an external plugin. No new manifest format — reuses the existingPluginManifest.from_plugin_dir()loader. Missing or empty path segments are ignored; hidden directories (.git,.DS_Store, etc.) are skipped. -
Default disabled — external plugins land in
/pluginlist as[external] disabled. User must run/plugin enable <name>once to activate. Enable state persists to~/.cheetahclaws/plugins.jsonunder a newexternal_enabled: {name: bool}map, so it survives restarts without the plugin being installed. -
No silent pip install — unlike the original proposal in #49, cheetahclaws never installs plugin dependencies from an import-failure fallback. Dependency installation happens only at explicit user-consent points:
/plugin install(existing flow), or the first/plugin enableof an external plugin that declaresdependencies. The model cannot trick the runtime into mutating the Python environment. -
Dependency check uses
importlib.metadata.distribution()— new_missing_dependencies(deps)helper keys off the PyPI distribution name, notfind_spec(name). This fixes the PyPI-vs-import-name trap that breaks common packages:Pillow(imports asPIL),PyYAML(imports asyaml),opencv-python(cv2),scikit-learn(sklearn),beautifulsoup4(bs4). The oldfind_spec("pillow")approach returnedNonefor installed Pillow and would loop-install forever. -
Safety guards —
uninstall_pluginon anEXTERNALentry only drops the enable-state record; it nevershutil.rmtrees the user's source directory.update_pluginrefuses external plugins with "update the source directory directly" instead of attemptinggit pull. Malformedplugin.jsonfiles are logged to stderr and skipped, so one bad manifest can't crash/pluginlist. -
Dedupe on name collision — if a plugin name exists in both installed (
USER/PROJECT) and external scopes, the installed entry wins. Within external scopes, the earliest directory inCHEETAHCLAWS_PLUGIN_PATHwins (consistent with$PATHsemantics). -
Tests (
tests/test_plugin_external.py) — 16 tests covering: env var parsing with empty/nonexistent segments,plugin.jsonandPLUGIN.mddiscovery, hidden-directory skip, malformed-JSON resilience, path-order priority, installed-shadows-external dedupe, enable/disable persistence round-trip, PEP 508 requirement parsing (package[extra]>=1.0→package), and a regression test for the PyPI-vs-import-name bug. -
New public export —
from plugin import PLUGIN_PATH_ENVgives the env var name for use in tooling/docs. -
Not changed: existing
USER/PROJECTinstall flow,plugin.json/PLUGIN.mdmanifest format,/plugincommand subcommands. Fully backward compatible — unsetCHEETAHCLAWS_PLUGIN_PATHand the system behaves exactly as before. -
Fix (tool-history integrity for OpenAI-compatible providers) — resolves #57: after long sessions, DeepSeek (and other OpenAI-compatible endpoints) started rejecting requests with
"Messages with role 'tool' must be a response to a preceding message with 'tool_calls'"(HTTP 400), only recoverable by rebooting which lost all context. Root cause:compaction.find_split_point()chose a split index by token count alone, so a split could land between anassistant(tool_calls)message and itstoolresponse messages, leaving orphanedtoolentries in the kept half. Three-layer defense:compaction._respect_tool_pairs(messages, split)— post-processes the split index: if the last message in the old half is anassistantwithtool_calls, advances the split forward past all consecutivetoolresponses; also skips any standalonetoolmessage the split would land on. Falls back to returning0(skip compaction this turn) if no safe split exists — the threshold will re-trigger next turn.compaction.sanitize_history(messages)— single-pass O(n) invariant enforcer. Tracks pendingtool_call_ids from the most recentassistant(tool_calls)in a rolling set; drops anytoolmessage whosetool_call_idis not in the set (orphan), and strips unansweredtool_callsentries from assistant messages when a non-tool message intervenes. If alltool_callson an assistant are stripped, thetool_callskey is removed entirely andcontentis normalized to a non-null string (required by the OpenAI schema). Does not mutate input.agent.run()— callssanitize_historyafter everymaybe_compactand before eachstream()call. Any divergence (from compaction, crashed tool execution, checkpoint restore, or future code paths) is caught before it reaches the provider; emits ahistory_sanitizedwarn-log with the number of messages removed so regressions are visible.- Why three layers instead of one: the split-point fix prevents the primary source of orphans; the sanitizer is a defense-in-depth net that keeps the invariant regardless of where history corruption originates; the agent-loop wiring ensures the net is actually applied. No user-visible behavior change on well-formed histories —
test_well_formed_history_unchangedpins this. - Tests (
tests/test_compaction.py) — 15 new tests across three classes (TestFindSplitPoint.test_split_never_splits_tool_pair,TestRespectToolPairs$ \times 4, $TestSanitizeHistory× 7) covering split-boundary edge cases (split at every ratio from 0.2 to 0.5, multi-tool-call blocks, standalone orphan tool at split), sanitizer correctness (well-formed history unchanged, orphan drop, partial and full unanswered-tool_calls stripping, unanswered at end of list, wrongtool_call_iddrop), and an input-immutability guarantee.
-
End-to-end prompt-cache token tracking (closes #43) — cache hit/miss counters now flow from provider →
AgentState→ checkpoint snapshots across every supported provider family. Two new default-0 fieldscache_read_tokens/cache_write_tokensonAssistantTurn;AgentState.total_cache_read_tokens/total_cache_write_tokensaccumulate viagetattr(..., 0)so providers that never set the fields still work. Extraction centralized into two helpers inproviders.py:_anthropic_cache_tokens(usage)readscache_read_input_tokens+cache_creation_input_tokens;_openai_cached_read_tokens(usage)walksprompt_tokens_details.cached_tokens. Both coerce missing /Noneto0— older SDKs, non-cached calls, Bedrock-over-litellm wrappers all fall through instead of raisingAttributeError. Provider coverage:Family Cache read Cache write Mechanism Anthropic ( stream_anthropic)✓ ✓ Both fields on final.usagewhen prompt-caching beta is activeOpenAI-schema ( stream_openai_compat— OpenAI, Gemini, Kimi, Qwen, Zhipu, DeepSeek, MiniMax, Groq, xAI, any compatible endpoint)✓ 0 (by design) OpenAI's schema has no separate "cache creation" counter; caching is implicit on their side Ollama ( stream_ollama)0 0 No prompt-caching in Ollama today Any future / custom provider 0 (default) 0 (default) getattr(event, "cache_read_tokens", 0)no-op fallbackPersistence:
checkpoint/store.make_snapshotwritestoken_snapshot["cache_read"]/["cache_write"];/checkpoint <id>(and/rewind) restores them alongside input/output totals so counters stay in lock-step with whatever snapshot the user rewound to. Structured logging:api_call_donerecords now includecache_read_tokens/cache_write_tokensalongsidein_tokens/out_tokens. Note: not yet surfaced in/costor/statusoutput — the tracking layer landed first, a follow-up will expose it in the user-facing commands. -
Tests (
tests/test_cache_tokens.py) — 14 tests across 5 layers:AssistantTurnfield defaults + explicit values;AgentStateaccumulation across increments; realmake_snapshotontmp_pathwith all four token fields; Anthropic + OpenAI extraction helpers against synthetic usage objects (populated / missing / None); end-to-endagent.runwith a scripted stream — single-turn propagation and multi-turn accumulation; plus atest_rewind_restores_cache_tokens_from_snapshotregression test that asserts the round-trip.tests/e2e_checkpoint.pyupdated to keep the scripted rewind path in sync with production code. -
Version bumped to 3.5.75.
-
-
Apr 16, 2026 (v3.5.74): Web UI production hardening — persistence, multi-user auth, ops endpoints, JS module split, pytest suite
- SQLite persistence (
web/db.py,web/models.py) — SQLAlchemy-backed store with 4 tables:users,chat_sessions,messages,api_credentials. Sessions + message history now survive server restarts (previously in-memory only, lost on restart). DB file at~/.cheetahclaws/web.db(0600). Config keyCHEETAHCLAWS_WEB_DBoverrides the path. - Multi-user auth (
web/auth.py) — replaced single generated password with full accounts: bcrypt password hashing (passlib) + stateless JWT cookies (PyJWT, HS256, 7-day TTL). JWT signing secret persisted to~/.cheetahclaws/web_secret(0600) so logins survive restarts. New endpoints:POST /api/auth/register(first user becomes admin),POST /api/auth/login,POST /api/auth/logout,GET /api/auth/whoami,GET /api/auth/bootstrap(first-run routing). LegacyPOST /api/authkept for the terminal password page. - Session CRUD — new
PATCH /api/sessions/{id}to rename,DELETE /api/sessions/{id}to remove,GET /api/sessions/{id}/exportto download conversation as Markdown. Auto-titling from first user message. Cross-user isolation enforced even on in-memory cache hits (one session hit patched after smoke test revealed the leak). - Structured JSON logging (
web/logging_setup.py) —logging+ custom JSON formatter emits one record per line to stderr, e.g.{"ts":..., "level":"info", "logger":"web.server", "msg":"req", "method":"POST", "path":"/api/auth/login", "status":200, "dur_ms":259, "user_id":1}. Every HTTP response auto-logs method/path/status/dur_ms/user_id/peer. Level controlled byCHEETAHCLAWS_LOG_LEVELenv (default INFO). - Ops endpoints —
GET /healthreturns{ok, db, uptime_s}(503 if DB unreachable);GET /metricsreturns Prometheus v0.0.4 text withcheetahclaws_{uptime_seconds, requests_total, requests_4xx, requests_5xx, auth_logins_total, auth_logins_failed, auth_registrations_total, users_total, ws_connections_total}. Unauthenticated so Prometheus/k8s probes can hit them. - JS module split (
web/static/js/) — monolithic 1813-linechat.html→ 552 lines of HTML + 9 vanilla JS modules (chat.jscore class,util.js,auth.js,sidebar.js,tools.js,approval.js,settings.js,welcome.js,init.js) loaded via plain<script src>tags. Prototype-mixin pattern (Object.assign(ChatApp.prototype, {...})) keepsapp.foo()call sites unchanged. No bundler, no build step. - ETag + conditional caching — JS/CSS/HTML served with
Cache-Control: no-cache, must-revalidate+ weak ETag (mtime-size). Browser gets 304 when unchanged, fresh content after any edit. Binary assets keep 24h cache. Path traversal blocked by resolved-pathis_relative_tocheck. - pytest suite (
tests/test_web_api.py) — 21 end-to-end HTTP tests using httpx: bootstrap/register/login/whoami/logout, sessions CRUD + export + markdown, cross-user isolation, persistence after cache clear,/health,/metricscounter deltas, CORS preflight, auth gating of every endpoint. Spins the real server in a thread on a random port, DB truncated between tests. Runs in ~5s.pytest tests/test_web_api.py. - Sidebar UX — chat sessions now show title + relative time ("just now", "12m ago", "3d ago") + message count + busy dot. Search box filters by title/id on the client. Right-click (or long-press) gives a context menu: Rename / Export Markdown / Delete. Footer shows current username + Sign out link.
- Register-or-login on first visit — chat UI now calls
/api/auth/bootstrapon load; if no user exists it shows a "Create your first account" form (first registration becomes admin), otherwise the "Sign in" form. Username + password instead of a single server-generated password. - Theme: light default + system auto —
:rootnow carries the light palette;@media (prefers-color-scheme: dark)swaps in the dark palette when the user hasn't explicitly chosen a theme. Toggle button cycles system → light → dark → system, icon reflects the effective theme, title tooltip spells out the current mode. Inline pre-paint script in<head>setsdata-themebefore first paint to avoid FOUC. - Auto port selection —
cheetahclaws --web(no--port) now tries 8080 first; onEADDRINUSEit binds:0and lets the kernel pick a free port, banner reports the real URL. Explicit--port Nbinds exactly N or fails loudly (user intent preserved).--portargparse default changed from8080→Noneas a sentinel. - Favicon + MIME polish —
web/static/favicon.{png,ico}cropped fromdocs/media/logos/logo-5.png(leaping cheetah, transparent background, multi-size ICO 16/32/48). Served from root as/favicon.icofor browser defaults. MIME table extended with.ico(image/vnd.microsoft.icon),.svg,.jpg,.woff,.woff2. - Welcome dashboard rebalanced — old 5-card "Bridges & Media" row (ragged in 2×2 grid) split into two 4-card sections: Bridges (Telegram · WeChat · Slack · Monitor) and Multi-Modal Media (Voice Input · Vision · Copy Output · Export).
/cwdadded to Development Tools. Tagline changed to "Personal AI Assistant · Support Any Model · Autonomous 24/7". - Bridges commands in Chat UI —
/telegram,/wechat(+/weixinalias),/slack,/voicenow registered inweb/api.py's slash registry (previously only the terminal REPL had them), so clicking the dashboard cards actually runs the command. - New extras —
pip install 'cheetahclaws[web]'installssqlalchemy>=2.0,passlib[bcrypt]>=1.7.4,PyJWT>=2.8.0. CLI-only installs remain dependency-free.[all]extra updated. - Version bumped to 3.5.74.
- SQLite persistence (
-
Apr 16, 2026 (v3.5.73): Web UI — browser-based Chat UI + structured event API
- Web Chat UI (
web/chat.html) —cheetahclaws --webnow serves a rich browser-based chat interface at/chatalongside the existing PTY terminal at/. Features: real-time streaming via Server-Sent Events (SSE), collapsible tool cards with status badges, inline permission approval buttons (Allow/Deny), activity indicator (spinner + state labels for Thinking/Running/Processing), Markdown rendering with XSS sanitization (marked.jsbundled), dark/light theme toggle withlocalStoragepersistence, mobile-responsive layout with sidebar overlay. - Structured event API (
web/api.py) — newChatSessionclass bridgesagent.run()generator to WebSocket/SSE event streams following the same pattern as the Telegram/Slack/WeChat bridges. Events:text_chunk,thinking_chunk,tool_start,tool_end,permission_request,permission_response,turn_done,command_result,interactive_menu,input_request,status,error. Event buffer with replay for late-joining subscribers. - 8 new API endpoints —
POST /api/prompt(submit prompt or slash command),WS /api/events(real-time event stream),POST /api/approve(permission response),GET /api/sessions(list sessions),GET /api/sessions/{id}(session details + message history),GET/PATCH /api/config(read/write config),GET /api/models(list all 11 providers and models),POST /api/auth(login, sets HttpOnly cookie). - Settings panel — click ⚙ to open: model selector grouped by 11 providers (Anthropic, OpenAI, Gemini, Ollama, DeepSeek, Qwen, etc.), permission mode dropdown, thinking/verbose toggles, max tokens input, per-provider API key management with status indicators, quick action buttons (Compact/Status/Cost/Context), terminal link for fallback.
- Slash command support in Chat UI — all 45+ commands work. Quick commands (
/status,/help,/model,/context) return results instantly via POST response. Long-running commands (/brainstorm,/worker,/plan,/agent) stream events in real-time via SSE (server keeps HTTP connection open)./ssjrenders a clickable 12-item interactive menu./brainstorm(no args) shows a topic input box before starting. - SSJ sub-commands —
/ssj debate,/ssj commit,/ssj readme,/ssj scan,/ssj propose,/ssj reviewnow run directly as agent queries without showing the interactive menu. The menu only appears for/ssj(no args). - Feature dashboard — welcome page shows 24 feature cards organized in 6 categories (Core, Agent Features, Session & Memory, Multi-Model, Development Tools, Bridges & Media) with 7 clickable quick-command chips.
- Security hardening —
hmac.compare_digest()for timing-safe token comparison, XSS sanitization (HTML tags escaped before Markdown rendering), CORS restricted to request Origin echo (no wildcard), HttpOnly + SameSite=Strict cookies, auth checked before WebSocket upgrade,_BufferedSocketwrapper replaces fragilesock.recvmonkey-patching. - Session management — chat sessions with idle timeout (30 min), background reaper for orphaned sessions, session list in sidebar with message count and busy indicator, click to switch, "+" to create new.
- Web bridge integration —
RuntimeContextextended withweb_input_event,web_input_value,in_web_turnfields.tools/interaction.pyroutes permission prompts to web bridge viathreading.Eventsynchronization.commands/advanced.pydetects web turns and skips interactive prompts (uses defaults like Telegram bridge). - Thread-safe stdout streaming —
_ThreadLocalStdoutinterceptsprint()only from the target command thread, broadcasts astext_chunkevents. Other threads unaffected. pyproject.tomlpackaging —webpackage added topackageslist,*.js,*.css,*.htmladded topackage-data. Static assets (xterm.min.js,marked.min.js,chat.html) correctly included inpip installdistributions.- Docs — new Web UI Guide (304 lines): quick start, full feature list, settings panel, API reference with JSON examples for all 8 endpoints and 12 event types, architecture notes, troubleshooting. README updated with Web UI section, feature table entry, CLI options, and examples.
- Version bumped to 3.5.73.
- Web Chat UI (
-
Apr 15, 2026 (v3.5.72): Trading agent, error classifier, parallel tools, prompt injection detection, SQLite sessions, tool cache, auxiliary model, safe stdio
- Trading agent module (
modular/trading/) — AI-powered multi-agent trading analysis and backtesting system. 5-phase analysis pipeline: data collection (technical indicators, fundamentals, news) → Bull/Bear researcher debate with BM25 memory → research judge recommendation → risk management panel (aggressive/conservative/neutral 3-way debate) → portfolio manager final decision (BUY/OVERWEIGHT/HOLD/UNDERWEIGHT/SELL). 4 built-in backtest strategies (dual MA, RSI mean reversion, Bollinger breakout, MACD crossover) with equity and crypto engines. 7 AI tools (GetMarketData,GetPrice,GetTechnicalIndicators,GetFundamentals,GetNews,RunBacktest,TradingMemory). 11 pure-Python technical indicators. Data source fallback chains (yfinance → coingecko → akshare). Post-trade reflection mechanism feeds lessons back into BM25 memory. SSJ integration as option 14 with guided sub-menu. Supports US/HK/A-share stocks and 20+ cryptos. Install:pip install "cheetahclaws[trading]". - Error classifier (
error_classifier.py) — centralized API error taxonomy (auth, billing, rate_limit, context_overflow, model_not_found, overloaded, connection, timeout) with per-category recovery hints, retryability, and backoff multipliers. Replaces fragile string matching inagent.pyandcheetahclaws.py. - Parallel tool execution (
agent.py) — when the LLM returns multiple tool calls,concurrent_safe=Truetools (Read, Glob, Grep, WebSearch, etc.) now run in parallel via ThreadPoolExecutor (up to 8 workers). Write tools remain sequential. Permission checks are still serial. - Prompt injection detection (
context.py) — CLAUDE.md files are scanned for 8 threat patterns (e.g., "ignore previous instructions", "system prompt override", credential exfiltration via curl/echo) before injection into the system prompt. Detected files are excluded with a security warning. - SQLite session store + full-text search (
session_store.py) — sessions are now saved to SQLite (WAL mode) alongside JSON files. FTS5 index enables/search <query>to find past conversations by content. Auto-imports legacyhistory.jsonon first search. - Tool result cache (
tool_registry.py) — read-only tools cache results bysha256(name + params), LRU eviction at 64 entries. Write tools (Write, Edit, Bash, NotebookEdit) invalidate the cache automatically. Eliminates redundant file reads in agent loops. - Auxiliary model routing (
auxiliary.py) — side tasks (context compression, summarization) now route to a fast/cheap model (Gemini Flash, GPT-4o-mini, etc.) instead of the primary model. Auto-detects from available API keys. Configurable viaauxiliary_modelin config. - Auto-discovery tool loading (
tools/__init__.py) — extension modules loaded via_EXTENSION_MODULESlist +__import__()loop instead of manual import statements. Adding a new extension is one line. - Safe stdio wrapper (
cheetahclaws.py) —sys.stdout/sys.stderrwrapped with_SafeWriterthat silently handlesBrokenPipeErrorand closed file descriptors. Prevents crashes when terminal disconnects during bridge/daemon operation. - One-line installer (
scripts/install.sh) —curl -fsSL .../install.sh | bashhandles platform detection (Linux/macOS/WSL2/Termux), Python/git/pip checks, clone, install, and PATH setup. First run triggers the setup wizard automatically. - Contributing section in README with quick-start commands for contributors, linking to CONTRIBUTING.md and Plugin Authoring Guide.
- Browser tool (
tools/browser.py) —WebBrowserenders JavaScript pages with headless Chromium (via playwright). Supports extract, screenshot, and click actions with CSS selectors. Solves dynamic/SPA pages thatWebFetchcan't handle. Optional:pip install cheetahclaws[browser]. - Email tools (
tools/email.py) —ReadEmail(IMAP) reads inbox with search by sender/subject;SendEmail(SMTP) sends emails with threading support. Zero external deps (Python stdlib). Configure with/config email_address=.... - File tools (
tools/files.py) —ReadPDFextracts text from PDFs (pymupdf);ReadImagedoes OCR on images (pytesseract, 99 languages);ReadSpreadsheetreads Excel/CSV/TSV with formatted table output. Optional:pip install cheetahclaws[files]. [all]extra —pip install cheetahclaws[all]installs every optional dependency (voice, vision, autosuggest, browser, files, OCR).- Version bumped to 3.5.72.
- Trading agent module (
-
Apr 15, 2026 (v3.5.71): Plugin docs, example template, config namespace fix, typing-time autosuggest
- Plugin authoring guide (
docs/guides/plugin-authoring.md) — full guide for building third-party plugins: tools (TOOL_DEFS), commands (COMMAND_DEFS), skills, MCP servers, manifest format, testing, publishing checklist, and common mistakes. - Example plugin template (
examples/example-plugin/) — copy-and-edit starter with working tools (ExampleSearch,ExampleStatus), command (/examplewith subcommands), skill, andplugin.jsonmanifest. - Fix
confignamespace collision — renamedconfig.pytoconfig.pyto avoid conflict with systemconfignamespace packages.pip install -e .followed bycheetahclawsfrom outside the project directory no longer crashes withImportError. - Typing-time autosuggest (PR #38 by @honghua) — optional
prompt_toolkitintegration for inline ghost suggestions and keyboard-selectable completion menu while typing slash commands. Install withpip install cheetahclaws[autosuggest]. Falls back to readline when not installed. Env varCHEETAH_PT_INPUT=0to opt out. - Python 3.10-3.13 compat fix (PR #38) —
Path.read_text(newline=)intools/fs.pyreplaced with portableopen()helper (thenewline=kwarg is 3.14+ only). - Version bumped to 3.5.71.
- Plugin authoring guide (
-
Apr 14, 2026 (v3.5.70): Setup wizard, Ollama UX, context indicator, and session robustness
- Interactive setup wizard (
commands/core.py,cheetahclaws.py) —cheetahclaws --setupor/setuplaunches a guided setup: pick from 6 providers (Ollama, Anthropic, OpenAI, Gemini, DeepSeek, custom), auto-detect env vars, set API key, verify connection. Auto-triggers on first run (noconfig.json). API key missing warning now suggests--setup. - Ollama UX improvements —
/modelnow shows live local Ollama models (via/api/tags) instead of a hardcoded list./model ollamatriggers the interactive model picker. Connection failures and 404 errors now give actionable messages ("Is Ollama running?", "Pull it with: ollama pull ..."). Tool-calling fallback message clarified. - Context usage in prompt — the REPL prompt now shows context window usage as a percentage: dim when <40%, yellow at 40-70%, red at >=70%. Users can see when compaction is approaching without running
/context. - Session save/resume robustness — atomic writes (write-to-temp + rename) prevent corruption on crash.
/loadand/resumenow catch corrupted JSON with friendly error messages and suggest daily backups. History file corruption no longer blocks auto-save. - Version from pyproject.toml —
VERSIONis now read dynamically frompyproject.toml(single source of truth), no more hardcoded version drift. Falls back toimportlib.metadatawhen installed as a package. /doctorenhanced — added internet connectivity check andpytedependency check; optional vs required deps now distinguished ([FAIL]for missing required deps).- Fix
mcpnamespace collision — renamed internalmcp/package tomcp_client/to avoid conflict with the officialmcppip package (Anthropic MCP SDK). Previously,pip install .followed bycheetahclawscrashed withImportError: cannot import name 'MCPClient'. - Version bumped to 3.5.70.
- Interactive setup wizard (
-
Apr 14, 2026 (v3.5.69): Actionable error messages, dependency sync, and contributor guide
- Actionable API error messages (
cheetahclaws.py) — the REPL error handler now detects 6 common failure modes (invalid API key, network timeout, Ollama not running, rate limit, model not found, insufficient credits) and prints a specific hint alongside the error instead of a generic message. The proactive watcher background thread no longer dumps raw Python tracebacks to stdout — errors are routed throughlogging_utilsinstead. - Dependency sync (
pyproject.toml,requirements.txt) —pyte>=0.8.0added topyproject.tomlcore dependencies (was only inrequirements.txt, causing import failures afterpip install .).requirements.txtrewritten to mirrorpyproject.tomlas single source of truth, with optional deps (sounddevice,Pillow) clearly marked. CONTRIBUTING.md— new contributor guide covering project structure, architecture (config vs RuntimeContext, tool/plugin/hooks systems), development conventions, and a PR checklist. Addresses recurring PR issues where contributors misunderstood the plugin loader (TOOL_DEFSvsregister_tool()), hooks system (no event-based hooks), and runtime state management.- Version bumped to 3.5.69.
- Actionable API error messages (
-
Apr 14, 2026 (v3.5.68): CI/CD, config/runtime separation, and module reorganization
- GitHub Actions CI (
.github/workflows/ci.yml) — added automated testing on every push and PR:pytestacross Python 3.10–3.13, plus a package smoke test that installs viapip install .and verifies all modules are importable. No more silent packaging regressions. - Config/runtime separation (
runtime.py) — runtime state (_proactive_thread,_pending_image,_plan_file, bridge turn flags, etc.) moved out of theconfigdict intoRuntimeContextfields. Theconfigdict now holds only serializable user configuration. Addedruntime.get_ctx(config)helper for easy access. Migrated 18 files; 327 tests pass. - Tool module reorganization — 7 top-level
tools_*.pyfiles consolidated into atools/package (tools/security.py,tools/fs.py,tools/shell.py,tools/web.py,tools/notebook.py,tools/diagnostics.py,tools/interaction.py). All existingfrom tools import ...code continues to work unchanged viatools/__init__.py. - Version bumped to 3.5.68.
- GitHub Actions CI (
-
Apr 14, 2026 (v3.5.67): Packaging fix,
/configsafety, and readline completion fix- Fix
ModuleNotFoundErroronpip install/uv tool install(pyproject.toml) — 16 missing top-level modules (logging_utils,agent_runner,tools_fs,tools_shell, etc.) and themonitorpackage were not declared inpyproject.toml, causingNo module named 'logging_utils'and similar crashes after installation (#36). All runtime modules are now correctly packaged. /configno longer exposes secrets (commands/config_cmd.py) — the/configdisplay now filters out sensitive keys (api_key,telegram_token,wechat_token, and any key ending in_key,_token, or_secret) as well as internal runtime keys (prefixed with_). Previously,/configcrashed withTypeErroron non-serializablethreading.Threadobjects and leaked credentials.- Readline completion condition fix (
cheetahclaws.py) — changed"/" in linetoline.startswith("/")in the completer and display hook, preventing false matches on non-slash input containing/. Completion menu now redisplays the prompt line correctly after showing matches. - Packaging fix,
/configsafety, and readline completion fix — FixedModuleNotFoundErroron install (#36), secrets filtering in/config, readline completion. - Version bumped to 3.5.67.
- Fix
-
Apr 12, 2026 (v3.5.66): Auto max_tokens cap per model + tool robustness fixes
- Automatic
max_tokenscapping (providers.py) — a newresolve_max_tokens()function automatically capsmax_tokensto the model's actual limit before every API call, eliminatingBadRequestError: max_tokens cannot be greater than max_model_lenerrors when using vLLM or other bounded local endpoints. Priority: (1) per-model hard limit from a built-in table of 30+ known models; (2) forcustomprovider,GET /v1/modelsis queried at first call and themax_model_lenfield is used (result cached per base URL); (3) provider-levelcontext_limit // 2as a conservative fallback. The user's configured value is always treated as an upper bound — never increased. KeyError: 'file_path'in agent tool calls (tools.py) — when a model (e.g. Qwen) generates a malformed tool call omitting the requiredfile_pathparameter forRead/Write/Edit, the agent runner now returns a descriptive error string ("Error: missing required parameter 'file_path'") instead of crashing with an unhandledKeyError. The agent can then self-correct on the next iteration.KeyError: 'white'in/agentSSJ wizard (ui/render.py) —"white": "\033[37m"added to the ANSI color tableC; the agent wizard's summary box usedclr(name, 'white')which crashed on startup.- Version bumped to 3.5.66.
- Automatic
-
Apr 12, 2026 (v3.5.65):
/agentSSJ entry, bridge-compatible wizard, and/monitorinteractive wizard fix- SSJ entry 14 — 🤖 Agent (
commands/advanced.py) — The SSJ power menu now has a 14th option that launches the/agentinteractive wizard directly, covering all four autonomous templates (Research Assistant, Auto Bug Fixer, Paper Writer, Auto Coder). - Bridge-compatible agent wizard (
commands/agent_cmd.py) — The wizard's input helper_ask()now routes throughask_input_interactive()so it works correctly over Telegram, Slack, and WeChat bridges (previously used bareinput()which is terminal-only). /monitorinteractive wizard input fix (all three bridges) — When the/monitorwizard sends a menu to a bridge and waits for user input, the next message from the user was incorrectly treated as a new AI query. Each bridge's poll loop now checkssession_ctx.tg/slack/wx_input_eventbefore dispatching to the AI — wizard replies are correctly routed back to the waiting prompt.- Version bumped to 3.5.65.
- SSJ entry 14 — 🤖 Agent (
-
Apr 12, 2026 (v3.5.64):
/monitor— AI subscription system &/agenttask template system for auto research/monitorwizard — typing/monitorwith no arguments launches an interactive setup wizard: live subscription list, numbered menu (add subscription / run now / start-stop scheduler / remove / configure notifications), zero memorization required. Works in terminal and all three bridges.monitor/package —fetchers.py(arxiv RSS + weekend API fallback · Yahoo Finance · CoinGecko · Reuters/BBC/AP RSS · DuckDuckGo),summarizer.py(AI summarization viaproviders.stream()),notifier.py(Telegram / Slack / console delivery),scheduler.py(background daemon,daily/6h/30mschedules),store.py(persistent subscriptions at~/.cheetahclaws/monitor_subscriptions.json)./subscribe <topic> [schedule] [--telegram] [--slack]— subscribe toai_research,stock_TSLA,crypto_BTC,world_news, orcustom:<query>. Schedule defaults todaily; delivery defaults to configured channels./agentwizard —/agentwith no args launches the autonomous agent wizard (Research Assistant / Auto Bug Fixer / Paper Writer / Auto Coder / Custom); walks through template-specific questions, confirms, then starts the loop in a background thread.agent_runner.py— isolatedAgentStateper runner, callsagent.run()per iteration, auto-approves permissions, pushes iteration summaries via active bridge, persists to~/.cheetahclaws/agents/<name>/log.jsonl.- 4 built-in agent templates (
agent_templates/):research_assistant,auto_bug_fixer,paper_writer,auto_coder— Markdown-driven program.md style (inspired by Karpathy's autoresearch). - Job queue & remote control (all three bridges) — persistent job registry (
jobs.py,~/.cheetahclaws/jobs.json); new bridge commands:!jobs/!j(dashboard),!job <id>(detail),!retry <id>(re-run failed job),!cancel [id](stop job); per-bridge queue (FIFO when AI is busy);on_tool_start/on_tool_endhooks wired in all three bridges for live step tracking. - Version bumped to 3.5.64.
-
Apr 12, 2026 (v3.5.63): Phone bridge: PTY permission prompt now responds correctly to digit inputs
- Ink SelectInput fix (
bridges/interactive_session.py) — Claude Code's permission prompts (e.g. "❯ 1. Yes 2. Yes, don't ask again 3. No") are rendered by Ink'sSelectInputwhich only responds to arrow-key + Enter events, not digit key presses. Sending2from the phone previously had no effect (or misrouted to the wrong option) because the raw digit was written verbatim to the PTY. - Automatic arrow-key translation —
send_input()now detects when the pyte screen shows a numbered❯ 1.menu and maps the digit to the correct ANSI escape sequence:1→ Enter (cursor already on item 1),2→↓+ Enter,3→↓↓+ Enter, and so on up to9. The translation fires only when the screen shows the menu pattern and the input is a single digit; all other inputs are forwarded unchanged. - Version bumped to 3.5.63.
- Ink SelectInput fix (
-
Apr 12, 2026 (v3.5.62):
/agent— autonomous research & coding loop (task template system)/agentwizard — typing/agentwith no arguments launches an interactive setup wizard: numbered menu (Research Assistant / Auto Bug Fixer / Paper Writer / Auto Coder / Custom), template-specific follow-up questions, summary & confirm. Zero memorization required.agent_runner.py— core autonomous loop engine. EachAgentRunnerowns an isolatedAgentState, runsagent.run()per iteration, auto-grants permissions (configurable), reports iteration summaries via bridge (Telegram/Slack/WeChat) or terminal, persists results to~/.cheetahclaws/agents/<name>/log.jsonl.- 4 built-in task templates (
agent_templates/):research_assistant(read papers → notes → related work),auto_bug_fixer(run tests → fix → commit),paper_writer(outline → draft section by section),auto_coder(tasks.md → implement → test → commit). - Custom templates — drop any
.mdfile following the program.md pattern (inspired by Karpathy's autoresearch) and launch with/agent start /path/to/template.md. /agent start <template> [args]— power-user direct launch with--name,--interval,--no-auto-approveflags./agent stop/list/status/templates— full lifecycle management from terminal or phone bridge.- Phone control —
!agent list,!agent stop <name>,!agent status <name>work in all three bridges for remote monitoring while agents run overnight. - Version bumped to 3.5.62.
-
Apr 12, 2026 (v3.5.61): Phone vibe-coding: interactive PTY session robustness improvements
!exitwith accidental space (bridges/telegram.py,slack.py,wechat.py) — exit detection now normalises whitespace so! exit,! quit,! stop(with a space after!) all correctly terminate the PTY session./exitand/quitintercepted — these were previously forwarded verbatim into the running process (e.g. Claude Code), causing confusion. They are now caught by the bridge before routing and cleanly end the session.- Input acknowledgement — every keystroke forwarded to a PTY session immediately echoes back
⌨ <text>so the user knows their input was received, even before the process produces output. !ping/!screen/!refresh— new meta-commands (also tolerating a space:! ping) that force the current pyte screen state to be re-rendered and sent regardless of the deduplication cache.- Dedup reset on input (
interactive_session.py) —send_input()now clears_last_sentafter writing to the PTY, guaranteeing the next output flush is always delivered even if screen content appears unchanged. force_flush()method (interactive_session.py) — public method that resets the dedup cache and immediately re-renders and sends the visible screen; used by!ping.- Version bumped to 3.5.61.
-
Apr 12, 2026 (v3.5.60): Production reliability, maintainability, and product completeness improvements
- Structured logging (
logging_utils.py) — newline-delimited JSON log output witherror/warn/info/debuglevel filtering, thread-safe file or stderr sink, andconfigure_from_config()for zero-boilerplate setup. All API calls, retries, tool events, and bridge lifecycle events now emit structured log events withsession_idcorrelation. - Circuit breaker (
circuit_breaker.py) — per-provider three-state machine (CLOSED → OPEN → HALF_OPEN) with rolling failure window (default: 5 failures in 60 s) and exponential cooldown (default: 120 s).providers.pywraps every streaming call;agent.pycatchesCircuitOpenErrorand returns a user-visible message without retrying. - Quota control (
quota.py) — four enforcement limits (session_token_budget,session_cost_budget,daily_token_budget,daily_cost_budget) checked before every API call. Daily accumulation persisted to~/.cheetahclaws/quota/YYYY-MM-DD.json; in-memory counters are thread-safe. All nine new config keys registered inconfig.pyDEFAULTS. - Explicit bootstrap (
bootstrap.py) — startup sequence made visible and testable: ① configure logging → ② import tool registry → ③ start health-check server.cheetahclaws.pycalls_bootstrap(config)once afterload_config(); all steps are idempotent. tools.pysplit — the 1,400-linetools.pydecomposed into seven focused sub-modules:tools_security.py(path safety, bash whitelist),tools_fs.py(read/write/edit/diff/glob),tools_shell.py(bash/grep/process-tree),tools_web.py(webfetch/websearch),tools_notebook.py(NotebookEdit),tools_diagnostics.py(GetDiagnostics),tools_interaction.py(AskUserQuestion, SleepTimer, drain_pending_questions).tools.pyremains as a thin re-export shim — allfrom tools import Xcalls continue to work unchanged.- Session file versioning (
commands/session.py) — saved session files now include"_version": 1._migrate_session()upgrades v0 → v1 on load/resume; future schema changes can add new migration steps without breaking existing saves. - Health-check HTTP server (
health.py) — optional daemon thread started viahealth_check_portconfig key. Three endpoints:GET /healthz(always 200, uptime + active sessions),GET /readyz(503 if any circuit breaker is open),GET /metrics(full JSON: uptime, model, sessions, circuit states, daily token/cost usage). - Bridge auto-reconnect (
bridges/telegram.py,bridges/slack.py,bridges/wechat.py) — each poll loop now returns an exit reason ("stopped"for clean shutdown,"auth_error"for invalid token). A supervisor wrapper (_tg/slack/wx_supervisor) catches unexpected crashes and restarts the poll loop with exponential backoff (2 s → 4 s → … → 120 s). Auth errors stop the bridge immediately without reconnect. All bridges logbridge_crash/bridge_auth_errorevents vialogging_utils. /helpcompleteness — 13 commands that were registered but missing from the help docstring are now shown:/resume,/status,/compact,/init,/export,/copy,/doctor,/checkpoint,/rewind,/plan,/brainstorm,/worker,/image. Product tagline updated to reflect CheetahClaws's current scope.- Version bumped to 3.5.60.
- Structured logging (
-
Apr 12, 2026 (v3.5.59): Modular architecture refactoring — monolith → layered packages
cheetahclaws.pysplit — the 5,100-line monolith has been decomposed into focused packages.cheetahclaws.pyis now a ~1,300-line REPL entry-point; all bridge, UI, and command logic lives in dedicated modules.ui/render.py— ANSI color helpers (clr,info,ok,warn,err) and Rich Live streaming renderer extracted into a standalone package; imported by every module that needs terminal output.bridges/— Telegram (telegram.py), WeChat (wechat.py), and Slack (slack.py) bridge implementations moved out ofcheetahclaws.pyinto their own sub-package.commands/— REPL slash-command handlers extracted intosession.py(session load/save/export),config_cmd.py(/config, /status, /doctor),core.py(/clear, /compact, /cost, /verbose, /thinking, /image, /model),checkpoint_plan.py(/checkpoint, /rewind, /plan), andadvanced.py(/brainstorm, /worker, /ssj and related).runtime.py—RuntimeContextsingleton — live session references (run_query,handle_slash,agent_state,tg_send,slack_send,wx_send) that were previously injected into the config dict under_underscorekeys are now a typed@dataclasssingleton (runtime.ctx). One process → one ctx → no key collisions, no dict sprawl. Per-bridge synchronous input events (tg_input_event/value,slack_input_event/value,wx_input_event/value) are also stored here, eliminating the last threading-Event race in config.- Packaging fixes (
pyproject.toml) —runtimeadded topy-modules;ui,bridges,commands,modular,modular.video,modular.voice,videoadded topackagesso all new layers are included inpip install ..package-dataadded formodular/video/PLUGIN.mdandmodular/voice/PLUGIN.md. - pytest config —
asyncio_default_fixture_loop_scope = "function"added to silence pytest-asyncio deprecation warnings;python_filesextended to collecte2e_*.pyalongsidetest_*.py(267 tests now collected by default). - Version bumped to 3.5.59.
-
Apr 11, 2026 (v3.5.58): Slack bridge via Slack Web API
- Slack bridge (
/slack) (cheetahclaws.py) —/slack <xoxb-token> <channel_id>connects cheetahclaws to a Slack channel using the Slack Web API (no external packages required — stdliburllibonly). Pollsconversations.historyevery 2 seconds for new messages; sends responses viachat.postMessage. A "⏳ Thinking…" placeholder is posted immediately and then updated in-place with the real reply when the model finishes. - Slash command passthrough — send
/cost,/model gpt-4o,/clear, etc. from Slack and they execute in cheetahclaws; results are sent back to the same channel. - Interactive menu routing — permission prompts and interactive menus are routed to Slack; your next message is used as the selection input.
- Auth check on start —
auth.testis called before starting the poll loop; invalid or revoked tokens are caught immediately with a clear error message. - Auto-start —
slack_token+slack_channelsaved to~/.cheetahclaws/config.json; bridge starts automatically on every subsequent launch. /slack stop//slack logout//slack status— full lifecycle control;/stopsent from Slack also stops the bridge gracefully.- WeChat / Slack auto-start banner flags — the startup banner now shows
wechatandslackflags when the respective bridges are configured (previously onlytelegramwas shown).
- Slack bridge (
-
Apr 11, 2026 (v3.5.57): WeChat bridge, tmux integration, shell escape,
max_tokensfix, new OpenAI models- WeChat bridge (
/wechat) (cheetahclaws.py) —/wechat loginauthenticates with WeChat by scanning a QR code (same iLink Bot API used by the official WeixinClawBot /@tencent-weixin/openclaw-weixinplugin). After a one-time scan,token+base_urlare saved to~/.cheetahclaws/config.jsonand the bridge auto-starts on every subsequent launch. The bridge runs a long-poll loop (POST /ilink/bot/getupdates, 35-second window) in a daemon thread — normal timeouts are handled transparently and do not trigger backoff or reconnect. - context_token echo — the iLink protocol requires each reply to include the sender's latest
context_token. The bridge caches this peruser_idin memory and echoes it automatically on every outbound message. - Typing indicator — a
sendtypingrequest is sent every 4 seconds while the model processes, keeping the WeChat chat responsive. - Slash command passthrough — send
/cost,/model gpt-4o,/clear, etc. from WeChat and they execute in cheetahclaws; results are sent back to the same WeChat conversation. - Session expiry handling —
errcode -14(session expired) clears saved credentials and prompts re-authentication on the next/wechatcall. - Message deduplication —
message_id/seqdedup prevents double-processing on reconnect. /wechat stop//wechat logout//wechat status— full lifecycle control from the terminal or from WeChat itself (/stop).- Bug fix:
max_tokensrejected by gpt-5-nano / o4-mini / o3 (providers.py) — newer OpenAI models have removed the legacymax_tokensparameter and requiremax_completion_tokensinstead. Any request usingmax_tokenswith these models was returning a 400 error and exhausting all retries. The OpenAI provider now unconditionally sendsmax_completion_tokens; all other OpenAI-compatible providers (Ollama, vLLM, Gemini, Kimi, …) continue to usemax_tokens, which their servers expect. - New models listed —
gpt-5,gpt-5-nano,gpt-5-mini,o3,o4-miniadded to the known OpenAI model list so they appear in/modelsuggestions and get the correct token-cap from the provider config. - Native tmux integration (
tmux_tools.py) — 11 tmux tools for the AI agent:TmuxListSessions,TmuxNewSession,TmuxSplitWindow,TmuxSendKeys,TmuxCapture,TmuxListPanes,TmuxSelectPane,TmuxKillPane,TmuxNewWindow,TmuxListWindows,TmuxResizePane. Auto-detected at startup — tools register only whentmux(Linux/macOS) orpsmux(Windows) is found; zero impact if absent. The AI can now run long-lived commands in visible panes that outlive the Bash tool's timeout, read output on demand withTmuxCapture, and build autonomous monitoring loops. System prompt is automatically extended with tmux usage guidance when the binary is present. - Shell escape (
cheetahclaws.py) — type!followed by any shell command (!git status,!ls -la,!python --version) to execute it directly without AI involvement. Output prints inline; control returns to the prompt immediately.
- WeChat bridge (
-
Apr 10, 2026 (v3.5.56): Retry mechanism, improved token estimator, plan-context fix after force compaction
- Retry with exponential backoff (
agent.py) — the provider stream loop now retries up to 3 times on any API error instead of crashing the session. Context-too-long errors trigger an immediate force compaction and retry; overloaded/rate-limit errors use longer backoff (4 s, 8 s, 16 s); all other errors use standard backoff (2 s, 4 s, 8 s). After exhausting retries a graceful inline message is shown — the session is never killed. - Improved token estimator (
compaction.py) —estimate_tokensnow useschars / 2.8(waschars / 3.5) to better account for code-heavy content, adds 4 tokens per message for framing overhead, and applies a 10 % safety buffer. The old divisor underestimated real token counts, causing compaction to skip when it should have triggered and leading to context-overflow crashes. - Force-compact safety net (
cheetahclaws.py) —run_querynow catches any uncaught error and shows a friendly message instead of crashing the REPL. Context-too-long errors are handled first with a force compaction + retry. - Bug fix: plan context preserved after force compaction (
agent.py) —_force_compactnow restores the plan file context intostate.messagesafter callingcompact_messages, matching the behavior ofmaybe_compact. Previously, force compaction in plan mode silently dropped the plan file content from context. - Bug fix: removed dead context-error handler (
cheetahclaws.py) — theis_context_errblock insiderun_query's outerexceptwas unreachable because context-too-long exceptions are already caught and handled insideagent.py's retry loop. The dead code has been removed. - Remote Ollama support (
providers.py) — the Ollama provider base URL can now be overridden via theOLLAMA_BASE_URLenvironment variable or theollama_base_urlconfig key, replacing the hardcodedlocalhost:11434default. This enables connecting to a remote Ollama instance (e.g. inside Docker or on another machine) without switching to the generic OpenAI-compatible provider. - Readline resilience in containerised environments (
cheetahclaws.py) —setup_readlinenow catchesPermissionErrorandOSErrorwhen loading history from a read-only or bind-mounted home directory. Theatexitwrite-history callback is also wrapped in a try/except so shutdown errors are swallowed silently instead of printing noisy tracebacks.
- Retry with exponential backoff (
-
Apr 08, 2026 (v3.5.55): Modular ecosystem, TTS Content Factory, CJK voice auto-detect, readline ANSI fix
- Modular ecosystem (
modular/) — new plug-and-play module folder. Each submodule (modular/video/,modular/voice/) is self-contained with its owncmd.pyexporting aCOMMAND_DEFSdict. The registry auto-discovers all modules at startup; missing modules degrade gracefully without affecting the rest of the system. Existingvideo/andvoice/imports continue to work via backward-compat shims. - TTS Content Factory (
/tts) — new command for AI-powered text-to-speech generation. Interactive wizard: choose a voice style (narrator, newsreader, storyteller, ASMR, motivational, documentary, children, podcast, meditation, custom), duration, TTS engine (Gemini → ElevenLabs → Edge, best available), and individual voice. In AI mode the active model writes the script; in custom-text mode you paste your own. Output:.mp3audio +_script.txtcompanion file. Also accessible as option 12 in/ssj. - CJK auto-voice detection — Edge TTS with an English voice silently skips Chinese/Japanese/Korean characters (only reads the Latin parts). The TTS backend now detects CJK-heavy text and auto-switches to
zh-CN-XiaoxiaoNeuralwhen a non-CJK voice is selected, ensuring the full text is synthesized. - Edge TTS long-text chunking — Edge TTS silently truncates text beyond ~3 000 chars. The pipeline now splits text into ≤ 2 000-char chunks at sentence boundaries, synthesizes each chunk independently, and concatenates with ffmpeg — audio now always covers the complete script.
- Readline ANSI fix (#29 / #31) — ANSI color codes in
input()prompts now wrapped with\001…\002(RL_PROMPT_START/END_IGNORE) so readline accounts for them as zero-width. Fixes cursor drift and duplicate-line content when scrolling REPL history. - SSJ Developer Mode extended — SSJ menu now includes option 11 (🎬 Video factory, conditional) and option 12 (🎙 TTS factory, conditional), matching the modular availability flags.
- Modular ecosystem (
-
Apr 07, 2026 (v3.5.54): Video factory major upgrade: custom script mode, PIL subtitle engine, web image search, wizard UX overhaul: Idea → Story → Final AI Video. Inspired by Kevin, with sincere thanks for his great help and inspiration in making this project better.
- Custom script mode — new content mode in
/videowizard. Choose "Custom script" to paste your own narration text: TTS reads it aloud, and the same text is automatically shown as subtitles (timed proportionally). No AI story generation step, no Whisper required. Ideal for product promos, personal narrations, and multilingual content. - PIL subtitle rendering engine — subtitles are now rendered with Pillow (PIL) + NotoSansSC font instead of the libass filter. This fixes non-Latin characters (Chinese, Japanese, Korean, Cyrillic, Arabic) showing as black boxes. The pipeline uses a two-pass approach: fast
-c:v copyassembly, then PIL-rendered PNG overlays burned in via ffmpegfilter_complex. Falls back to no subtitles if PIL fails — never crashes the pipeline. - Subtitle source selection — new wizard step to choose subtitle mode: Auto (Whisper transcription), Story text (burn script/story as subtitles — works for all languages, no Whisper needed), Custom text (paste your own), or None.
- Text-to-SRT from plain text —
text_to_srt()splits any plain text into natural subtitle chunks (word-wrap for Latin, punctuation+character-wrap for CJK) and distributes timing proportionally across the audio duration. Works for all languages, offline. - Free web image search —
/videonow searches for relevant stock photos from Pexels → Wikimedia Commons → Lorem Picsum when no source images or Gemini Web session are available. AI-generated search queries (model-driven) improve relevance. Always produces images — never fails. - AI-powered source image selection — when a source folder contains more images than needed, the model reads filenames and story content to select the most relevant ones. Keyword-scoring fallback when the model is unavailable.
- Wizard UX overhaul — full step-loop wizard with
b=back,q=quit at every step. All options have Auto as default (Enter = Auto). Custom language input (type any language name + Whisper code). Style list shows before prompting. Custom output path step. Detects content language from topic text automatically. - Audio/video sync fix —
_audio_duration()now parsesffmpeg -i stderrDuration output for accurate measurement. Previously used a file-size estimate at 128kbps, causing 2.7× overestimate for Edge TTS (which outputs at 48kbps). Videos now always match audio length. - Source materials —
--source <dir>pre-loads images, audio, video, and text files. Images are used directly; audio/video narration replaces TTS; text files are summarised and injected as story context.
- Custom script mode — new content mode in
-
Apr 07, 2026 (v3.5.53): Telegram photo/voice support, process-tree kill on Bash timeout, Windows shell hints, worker fix
- Telegram photo vision — send a photo to the Telegram bridge and CheetahClaws will describe it using the active vision model (GPT-4o, Gemini 2.0 Flash, Claude, etc.). The bot downloads the highest-resolution version, encodes it as Base64, and routes it through the same
_pending_imagepath as/img. Caption text (or a default "describe this image" prompt) is forwarded alongside the image. - Telegram voice/audio STT — send a voice message or audio file to the Telegram bridge and CheetahClaws transcribes it automatically. OGG voice notes are converted to PCM via
ffmpegand passed to the local Whisper backend; falls back to the OpenAI Whisper API whenffmpegis unavailable. The transcription is echoed back to the chat before being submitted as a query. - Process-tree kill on Bash timeout — when a
Bashcommand times out, CheetahClaws now kills the entire child process tree instead of only the shell. On Unix,os.killpgsendsSIGKILLto the process group; on Windows,taskkill /F /Tterminates all child processes. GUI apps (e.g. PyQt games launched by the agent) no longer leave zombie processes after a timeout. The internal implementation usesstart_new_session=Trueinstead ofpreexec_fn=os.setsidfor thread safety. - Worker runs all pending tasks by default —
/workerpreviously processed only 1 task per session (a bug). It now runs all pending tasks by default. The--workers Nflag still limits the batch size when needed. - Windows shell hints in system prompt — non-Claude models now receive a Windows-specific shell cheat-sheet in the system prompt (
typevscat,dir /s /bvsfind,delvsrm, etc.) so the agent generates correct commands on Windows without manual guidance. - Bash timeout hints — the
Bashtool description now advises the model to usetimeout=120–300for slow commands (npm install,npx,pip install, builds), reducing spurious 30-second timeouts on package operations. - Bug fix: background event prompt shows actual cwd — the yellow re-prompt printed after a background event completed was hardcoded to
[claude-code-local]; it now shows the real working-directory name ([{cwd.name}]), consistent with the main REPL prompt.
- Telegram photo vision — send a photo to the Telegram bridge and CheetahClaws will describe it using the active vision model (GPT-4o, Gemini 2.0 Flash, Claude, etc.). The bot downloads the highest-resolution version, encodes it as Base64, and routes it through the same
-
Apr 06, 2026 (v3.5.53): Telegram interactive menus,
/imgalias,/voice device, OpenAI/Gemini vision support- Telegram interactive menus fixed — slash commands with interactive input (e.g.
/ollama,/permission,/checkpoint) were blocking the Telegram poll loop, making it impossible to respond to the menu prompts. Slash commands now run in a daemon thread (like regular queries), keeping the poll loop free. All interactive menus (ask_input_interactive) work correctly over Telegram. /imgalias —/imgis now an alias for/image, for faster clipboard-image workflows./voice device— new subcommand to list all available input microphones and select one interactively. The selected device index is persisted in the session config and shown in/voice status. Useful on systems with multiple audio interfaces (e.g. USB headset + built-in mic).- Vision support for OpenAI / Gemini models —
/img(and/image) now sends images in the OpenAI multipartimage_urlformat to cloud vision models (GPT-4o, Gemini 2.0 Flash, etc.), in addition to the existing Ollama native format. No configuration change needed — the correct format is selected automatically based on the active provider. - Bug fix: threading race condition —
_in_telegram_turnis now tracked viathreading.local()per-slash-runner thread instead of a shared config key, eliminating a race condition that could corrupt the flag when a regular message arrived while an interactive slash command was waiting for input.
- Telegram interactive menus fixed — slash commands with interactive input (e.g.
-
Apr 06, 2026 (v3.5.52): Checkpoint system, plan mode, compact, and utility commands, support MiniMax Models, fix telegram bugs With sincere thanks for Xiaohan's great help in making this project better.
- Checkpoint system (
checkpoint/package): auto-snapshots conversation state and file changes after every turn./checkpointlists all snapshots;/checkpoint <id>rewinds both files and conversation history to any previous state;/checkpoint clearremoves all snapshots for the session./rewindis an alias. 100-snapshot sliding window; initial snapshot captured at session start. Throttling: skips when nothing changed. File backups use copy-on-write; snapshots capture post-edit state. - Plan mode:
/plan <desc>enters a read-only analysis mode — Claude may only read the codebase and write to a dedicated plan file (.nano_claude/plans/<session_id>.md). All other writes are silently blocked with a helpful message./planshows the current plan;/plan doneexits plan mode and restores original permissions;/plan statusreports whether plan mode is active. Two new agent tools —EnterPlanModeandExitPlanMode— let Claude autonomously enter and exit plan mode for complex multi-file tasks; both are auto-approved in all permission modes. /compact [focus]: manually trigger conversation compaction at any time. An optional focus string guides the LLM summarizer on what context to preserve. Auto-compact and manual compact both restore plan file context after compaction.- Utility commands:
/initcreates aCLAUDE.mdtemplate in the current directory;/export [filename]exports the conversation as Markdown (default) or JSON;/copycopies the last assistant response to the clipboard (Windows/macOS/Linux);/statusshows version, model, provider, permissions, session ID, token usage, and context %;/doctordiagnoses installation health (Python version, git, API key + live connectivity test, optional deps, CLAUDE.md presence, checkpoint disk usage, permission mode).
- Checkpoint system (
-
Apr 06, 2026 (v3.5.51): Project renamed from Nano Claude Code to CheetahClaws
- The project has been rebranded from Nano Claude Code to CheetahClaws — a more distinctive name that captures the spirit of the tool: a sharp, agile coding assistant. The
Clin CheetahClaws is a subtle nod to Claude. - CLI command:
nano_claude→cheetahclaws - PyPI package:
nano-claude-code→cheetahclaws - Config directory:
~/.nano_claude/→~/.clawnest/→~/.cheetahclaws/ - Main entry point:
nano_claude.py→cheetahclaws.py - All documentation, GitHub URLs, and internal references updated accordingly.
- Added CheetahClaws vs OpenClaw comparison section to README.
- The project has been rebranded from Nano Claude Code to CheetahClaws — a more distinctive name that captures the spirit of the tool: a sharp, agile coding assistant. The
-
Apr 06, 2026 (v3.5.53): Telegram interactive menus,
/imgalias,/voice device, OpenAI/Gemini vision support- Telegram interactive menus fixed — slash commands with interactive input (e.g.
/ollama,/permission,/checkpoint) were blocking the Telegram poll loop, making it impossible to respond to the menu prompts. Slash commands now run in a daemon thread (like regular queries), keeping the poll loop free. All interactive menus (ask_input_interactive) work correctly over Telegram. /imgalias —/imgis now an alias for/image, for faster clipboard-image workflows./voice device— new subcommand to list all available input microphones and select one interactively. The selected device index is persisted in the session config and shown in/voice status. Useful on systems with multiple audio interfaces (e.g. USB headset + built-in mic).- Vision support for OpenAI / Gemini models —
/img(and/image) now sends images in the OpenAI multipartimage_urlformat to cloud vision models (GPT-4o, Gemini 2.0 Flash, etc.), in addition to the existing Ollama native format. No configuration change needed — the correct format is selected automatically based on the active provider. - Bug fix: threading race condition —
_in_telegram_turnis now tracked viathreading.local()per-slash-runner thread instead of a shared config key, eliminating a race condition that could corrupt the flag when a regular message arrived while an interactive slash command was waiting for input.
- Telegram interactive menus fixed — slash commands with interactive input (e.g.
-
Apr 06, 2026 (v3.5.52): Checkpoint system, plan mode, compact, and utility commands, support MiniMax Models, fix telegram bugs
- Checkpoint system (
checkpoint/package): auto-snapshots conversation state and file changes after every turn./checkpointlists all snapshots;/checkpoint <id>rewinds both files and conversation history to any previous state;/checkpoint clearremoves all snapshots for the session./rewindis an alias. 100-snapshot sliding window; initial snapshot captured at session start. Throttling: skips when nothing changed. File backups use copy-on-write; snapshots capture post-edit state. - Plan mode:
/plan <desc>enters a read-only analysis mode — Claude may only read the codebase and write to a dedicated plan file (.nano_claude/plans/<session_id>.md). All other writes are silently blocked with a helpful message./planshows the current plan;/plan doneexits plan mode and restores original permissions;/plan statusreports whether plan mode is active. Two new agent tools —EnterPlanModeandExitPlanMode— let Claude autonomously enter and exit plan mode for complex multi-file tasks; both are auto-approved in all permission modes. /compact [focus]: manually trigger conversation compaction at any time. An optional focus string guides the LLM summarizer on what context to preserve. Auto-compact and manual compact both restore plan file context after compaction.- Utility commands:
/initcreates aCLAUDE.mdtemplate in the current directory;/export [filename]exports the conversation as Markdown (default) or JSON;/copycopies the last assistant response to the clipboard (Windows/macOS/Linux);/statusshows version, model, provider, permissions, session ID, token usage, and context %;/doctordiagnoses installation health (Python version, git, API key + live connectivity test, optional deps, CLAUDE.md presence, checkpoint disk usage, permission mode).
- Checkpoint system (
-
Apr 06, 2026 (v3.5.51): Project renamed from Nano Claude Code to CheetahClaws
- The project has been rebranded from Nano Claude Code to CheetahClaws — a more distinctive name that captures the spirit of the tool: a sharp, agile coding assistant. The
Clin CheetahClaws is a subtle nod to Claude. - CLI command:
nano_claude→cheetahclaws - PyPI package:
nano-claude-code→cheetahclaws - Config directory:
~/.nano_claude/→~/.clawnest/→~/.cheetahclaws/ - Main entry point:
nano_claude.py→cheetahclaws.py - All documentation, GitHub URLs, and internal references updated accordingly.
- Added CheetahClaws vs OpenClaw comparison section to README.
- The project has been rebranded from Nano Claude Code to CheetahClaws — a more distinctive name that captures the spirit of the tool: a sharp, agile coding assistant. The
-
00.29 PM, Apr 06, 2026 (v3.5.5): SSJ Developer Mode, Telegram Bridge, Worker Command, and UX improvements
/ssj— SSJ Developer Mode: Interactive power menu with 10 workflow options: Brainstorm, TODO viewer, Worker, Expert Debate, Propose Improvements, Code Review, README generator, Commit helper, Git Diff Scan, and Idea-to-Tasks Promotion. Menu stays open between actions and supports/commandpassthrough (e.g./exitworks from inside SSJ)./workercommand: Auto-implements pending tasks frombrainstorm_outputs/todo_list.txtone by one. Supports selecting specific tasks with comma-separated numbers (e.g.1,4,6), a custom todo file path (--path /other/todo.md), and a worker count limit (--workers 3). If you accidentally pass a brainstorm.mdoutput file, Worker detects it and offers to redirect totodo_list.txt— or to generate it first from the brainstorm file and then run Worker automatically. Each task gets a dedicated prompt that reads code, implements the change, and marks it done./telegram— Telegram Bot Bridge: Receives messages via Telegram Bot API and routes them through the model, sending responses back to the chat. Auto-starts on launch if configured. Only responds to the authorizedchat_id. Supports slash command passthrough (/cost,/model, etc.), shows a typing indicator while the model processes, and can be stopped remotely by sending/stopin Telegram.- Brainstorm → TODO pipeline: After brainstorm synthesis, automatically generates
brainstorm_outputs/todo_list.txtwith prioritized checkbox tasks. TODO viewer (SSJ option 2) shows only pending tasks as numbered (completed tasks shown with ✓ without a number). - Expert Debate improvements: SSJ option 4 now prompts for the number of debate agents (default 2, minimum 2); rounds are auto-calculated as
(agents × 2 − 1). The debate result is saved to the same directory as the debated file (<stem>_debate_HHMMSS.md). An animated per-round per-expert spinner (⚔️ Round 2/3 — Expert 1 thinking...) keeps the terminal lively throughout the debate. - Brainstorm spinner: Animated spinner with random phrases while brainstorm agents are thinking.
- Force quit: 3× Ctrl+C within 2 seconds triggers
os._exit(1)— kills the process immediately regardless of blocking I/O. - Interactive Ollama Model Picker — when a request fails with 404 (model not found), cheetahclaws queries the local Ollama API (
/api/tags) and presents a numbered model selector to switch models and retry without restarting. Cancelling aborts gracefully without crashing the REPL. - Windows file handling —
_read,_write, and_editintools.pynow force UTF-8 encoding andnewline=""._editdetects pure-CRLF files (every\nis part of\r\n) and restores line endings after edit; mixed-line-ending files are left as-is to avoid corruption. - /brainstorm command —
/brainstorm [topic]runs a multi-persona AI debate. The model first generates N expert personas tailored to the topic (geopolitics → analysts & diplomats; software → architects & engineers; etc.). Agent count is chosen interactively at runtime (2–100, default 5). Results are saved tobrainstorm_outputs/and synthesized by the main agent. - Rich Live SSH fix — Rich's in-place Live streaming is now automatically disabled in SSH sessions (
SSH_CLIENT/SSH_TTYdetected) where ANSI cursor-up breaks and causes repeated output lines. Override with/config rich_live=true/false. threading.RLock— replacedthreading.LockwithRLockto support re-entrant calls from brainstorm synthesis and Ollama retry paths.
-
05:39 PM, Apr 05, 2026 (v3.5.4): Reasoning, Rendering, and Packaging Improvements, Enhanced Memory System, Native vision support for local Ollama models, Bracketed Paste Mode, Rich Tab Completion
- Bracketed Paste Mode — replaced the old timing-based multi-line paste detection with the standard terminal Bracketed Paste Mode protocol. Pasted text of any length (code blocks, long prompts, multi-paragraph instructions) is now collected as a single turn with zero latency and no blank-line artifacts. Falls back to a 60 ms timing window for terminals that don't support BPM. Bracketed paste mode is cleanly disabled on REPL exit.
- Rich Tab Completion with descriptions — pressing Tab after
/now shows every command with a one-line description and a hint of its subcommands. Typing/pluginthen Tab lists all subcommands (install,uninstall,enable, …). Auto-completes to the unique match when only one command matches the prefix. Subcommands supported for/mcp,/plugin,/tasks,/cloudsave,/voice,/permissions,/proactive, and/memory. - Model name bug fix —
--model ollama/qwen3.5:35bno longer gets corrupted toollama/qwen3.5/35b. The startup colon-to-slash conversion now only fires when the left side of:is a known provider name and no/is already present, preserving Ollama'smodel:tagformat. - Native vision support for local Ollama models (
llava,gemma4,llama3.2-vision): new/image [prompt]command captures the current clipboard image, encodes it to Base64, and attaches it to the next prompt. Install Pillow withpip install cheetahclaws[vision]; Linux users also needxclip(sudo apt install xclip). - Enhanced Memory System — added
confidence/source/last_used_at/conflict_groupmetadata to every memory entry; conflict detection onMemorySavewarns before overwriting;MemorySearchre-ranks results byconfidence × recency(30-day decay) and updateslast_used_aton hits; new/memory consolidatecommand runs a lightweight AI analysis of the current session and auto-saves up to 3 long-term insights (user preferences, feedback corrections, project decisions) at 0.8 confidence — never overwrites higher-confidence user memories. - Post-merge fixes — removed a debug
debug_payload.jsonfile write that was firing on every OpenAI-compatible API call (left over from PR #11 development). Also fixed ANSI dim color not being reset after the thinking block ends, which caused subsequent text to appear dim in non-Rich terminals. Bumpedpyproject.tomlversion to3.5.4, and movedsounddeviceto the optionalvoiceextra (pip install cheetahclaws[voice]). - Native Ollama reasoning + terminal rendering fix — local reasoning models (
deepseek-r1,qwen3,gemma4) now stream their<think>blocks to the terminal. Ollama exposes thoughts inmsg["thinking"], but cheetahclaws was previously dropping them; this is now fixed by yieldingThinkingChunkfrom the Ollama adapter. Also fixed a Windows CMD/PowerShell rendering issue where token-by-token ANSI dim resets caused thoughts to print vertically, and correctedflush_response()so it runs once at the end instead of on every thinking token. Enable with/verboseand/thinking. - uv support — added
pyproject.toml; install withuv tool install .to make thecheetahclawscommand available globally from anywhere in an isolated environment, without manual PATH setup.
-
00:41 PM, Apr 05, 2026: v3.5.3 add structured session history — Structured session history: on every exit, sessions are saved to
daily/YYYY-MM-DD/(capped atsession_daily_limit, default 5 per day) and appended to a masterhistory.json(capped atsession_history_limit, default 100). Each session file now includessession_idandsaved_atmetadata./loadgroups sessions by date with time, ID, and turn-count display; supports multi-select (1,2,3) to merge sessions andHto load the full history with token-count confirmation. Both limits are configurable via/config. -
00:41 PM, Apr 05, 2026: v3.5.3 fix session — Structured session history: on every exit, sessions are saved to
daily/YYYY-MM-DD/(capped atsession_daily_limit, default 5 per day) and appended to a masterhistory.json(capped atsession_history_limit, default 100). Each session file now includessession_idandsaved_atmetadata./loadgroups sessions by date with time, ID, and turn-count display; supports multi-select (1,2,3) to merge sessions andHto load the full history with token-count confirmation. Both limits are configurable via/config. -
09:34 AM, Apr 05, 2026: v3.5.3 — Added GitHub Gist cloud sync:
/cloudsave setup <token>to configure,/cloudsaveto upload the current session to a private Gist,/cloudsave auto onto sync automatically on/exit,/cloudsave listto browse cloud sessions, and/cloudsave load <id>to restore from the cloud. Uses stdliburllib— no new dependencies. Also added version number (e.g.,v3.5.2) in the startup banner: The startup banner now displays the current version number (v3.5.2) in green, making it easy to identify which version is running at a glance. -
08:58 AM, Apr 05, 2026: v3.5.2 — Introduced
/proactive [duration]command: a background daemon thread watches for user inactivity and automatically wakes the agent up after the specified interval (e.g./proactive 5m), enabling continuous monitoring loops without user intervention./proactivewith no args now shows current status;/proactive offdisables it explicitly. Proactive polling state is stored inconfig(no module-level globals). Watcher exceptions are logged viatracebackinstead of silently swallowed. Also fixed duplicated output in Rich-enabled terminals by buffering text during streaming and rendering Markdown once viarich.live.Live— updates happen in-place for a true streaming Markdown experience. -
10:51 PM, Apr 04, 2026: v3.05_fix04 — Fixed a crash on
/modeland config save commands caused by the newly introduced_run_query_callbackbeing serialized to JSON; also addedSleepTimerusage
guidance to the system prompt so the agent knows when to invoke background timers proactively. -
10:28 PM, Apr 04, 2026: v3.05_fix03 — Added a native
SleepTimertool that lets the agent schedule background timers and autonomously wake itself up after a delay — no user prompt required. Paired with athreading.Lockto prevent output collisions when background and foreground calls overlap. Also includes cross-platform fixes: Windows ANSI color support, CRLF-aware Edit tool matching, an interactive numbered menu for/load, native Ollama streaming via/api/chat, and auto-cappingmax_tokensper provider to prevent API errors. -
08:31 PM, Apr 04, 2026: v3.05_fix — Autosave +
/resume: session is automatically saved tomr_sessions/session_latest.jsonon/exit,/quit,Ctrl+C, andCtrl+D. Run/resumeto restore the last session instantly, or/resume <file>to load a specific file frommr_sessions/, and better support for api and local Ollama models (specifically gemma4), along with Windows compatibility enhancements, session management UX improvements, and cross-platform reliability fixes for the Edit tool. -
00:41 AM, Apr 04, 2026: v3.05 — Voice input (
voice/package):sounddevice→arecord→ SoX recording backends,faster-whisper→openai-whisper→ OpenAI API STT backends. Smart keyterm extraction from git branch + project name + recent files passed as Whisperinitial_promptfor coding-domain accuracy./voice,/voice status,/voice lang <code>REPL commands. Works fully offline with no API key. 29 new tests (~11.6K lines of Python). -
10:29 PM, Apr 03, 2026: v3.04 — Expanded tool coverage:
NotebookEdit(edit Jupyter.ipynbcells — replace/insert/delete with full JSON round-trip) andGetDiagnostics(LSP-style diagnostics via pyright/mypy/flake8/tsc/shellcheck). Also fixed a pre-existing schema-index bug in_register_builtinsby switching to name-based lookup (~10.5K lines of Python). -
06:00 PM, Apr 03, 2026: v3.03 — Task management system (
task/package):TaskCreate/TaskUpdate/TaskGet/TaskListtools with sequential IDs, dependency edges (blocks/blocked_by), metadata, persistence to.cheetahclaws/tasks.json, thread-safe store,/tasksREPL command, 37 new tests (~9500 lines of Python). -
02:50 PM, Apr 03, 2026: v3.02 — Plugin system (
plugin/package): install/uninstall/enable/disable/update via/pluginCLI, recommendation engine (keyword+tag matching), multi-scope (user/project), git-based marketplace.AskUserQuestiontool: interactive mid-task user prompts with numbered options and free-text input (~8500 lines of Python). -
10:00 AM, Apr 03, 2026: v3.01 — MCP (Model Context Protocol) support:
mcp/package, stdio + SSE + HTTP transports, auto tool discovery,/mcpcommand, 34 new tests (~7000 lines of Python). -
12:20 PM, Apr 02, 2026: v3.0 — Multi-agent packages (
multi_agent/), memory package (memory/), skill package (skill/) with built-in skills, argument substitution, fork/inline execution, AI memory search, git worktree isolation, agent type definitions (~5000 lines of Python), see update. -
10:00 AM, Apr 02, 2026: v2.0 — Context compression, memory, sub-agents, skills, diff view, tool plugin system (~3400 lines of Python Code).
-
01:47 PM, Apr 01, 2026: Support VLLM inference (~2000 lines of Python Code).
-
11:30 AM, Apr 01, 2026: Support more closed-source models and open-source models: Claude, GPT, Gemini, Kimi, Qwen, Zhipu, DeepSeek, and local open-source models via Ollama or any OpenAI-compatible endpoint. (~1700 lines of Python Code).
-
09:50 AM, Apr 01, 2026: Support more closed-source models: Claude, GPT, Gemini. (~1300 lines of Python Code).
-
08:23 AM, Apr 01, 2026: Release the initial version of CheetahClaws (~900 lines of Python Code).