Ephemeral Settings Reference

February 23, 2026 · View on GitHub

Complete reference for all ephemeral settings. Set with /set <key> <value> during a session or --set <key>=<value> at startup. Ephemeral settings don't persist to settings.json — they live only for the current session unless saved to a profile with /profile save.

For guidance on tuning these for specific models, see Settings and Profiles.

Reasoning

Control extended thinking / chain-of-thought. Most models need reasoning.enabled true at minimum; the rest have sensible defaults.

SettingTypeDefaultProfileDescription
reasoning.enabledbooleanfalseyesTurn on thinking mode. Required for models like Kimi K2-Thinking, Claude with thinking, o3.
reasoning.effortenumprovider defaultyesHow hard the model thinks: minimal, low, medium, high, xhigh. Higher = slower + more tokens but better results. Anthropic Opus defaults to high. Codex defaults to medium.
reasoning.maxTokensnumberyesCap the thinking token budget (OpenAI). Limits how much the model can think per turn.
reasoning.budgetTokensnumberyesAnthropic-specific thinking budget. Usually set automatically via reasoning.effort or adaptive thinking.
reasoning.adaptiveThinkingbooleanfalseyesLet Anthropic auto-tune the thinking budget based on task complexity. Enabled by default for Claude via the anthropic provider alias.
reasoning.includeInResponsebooleantrueyesShow thinking blocks in the terminal. Set false to get reasoning quality without the visual noise.
reasoning.includeInContextbooleantrueyesKeep thinking in conversation history sent to the model. If false, the model can't reference its own prior reasoning — hurts multi-step tasks.
reasoning.stripFromContextenumnoneyesPrune old thinking to manage context growth. none = keep all (best quality). allButLast = keep only latest thinking (good balance). all = discard all thinking from context (saves tokens).
reasoning.formatenumyesAPI format: native or field. Leave unset unless you know your provider needs a specific format.
reasoning.summaryenumyesOpenAI Responses API reasoning summary: auto, concise, detailed, none. Codex alias defaults to auto.
text.verbosityenumyesOpenAI Responses API text verbosity for thinking output: low, medium, high.

Context and Compression

Control how much context the model sees and when/how history is compressed. These directly affect quality — too small and the model loses track; too large and it drowns in noise.

SettingTypeDefaultProfileDescription
context-limitnumbermodel defaultyesMax tokens for the entire context window (system prompt + history + tool output). Set lower than the model's max to leave headroom.
compression-thresholdnumbermodel defaultyesFraction of context-limit that triggers compression (0.0–1.0). E.g., 0.7 means compress when 70% full. Lower = more frequent compression but more headroom.
max-prompt-tokensnumber200000yesHard ceiling on any single prompt to the API. Safety net to prevent runaway costs.
maxOutputTokensnumberyesMax output tokens per response (generic, translated by provider). Anthropic alias sets this to 40000. Limits how much the model writes per turn.
compression.strategyenummiddle-outyesCompression algorithm: middle-out (LLM-summarizes middle turns) or top-down-truncation (drops oldest turns).
compression.profilestringyesProfile to use for compression LLM calls. Lets you use a cheaper model for summarization.
compression.density.readWritePruningbooleantrueyesDrop read-file results when the file was subsequently written. Reduces noise from obsolete reads.
compression.density.fileDedupebooleantrueyesDeduplicate repeated @file inclusions.
compression.density.recencyPruningbooleanfalseyesKeep only the N most recent results per tool type. Aggressive — enable only for very long sessions.
compression.density.recencyRetentionnumber3yesHow many recent results to keep per tool type when recencyPruning is on.
compression.density.compressHeadroomnumber0.6yesMultiplier for compression target (0–1). Lower = more aggressive compression.
compression.density.optimizeThresholdnumberstrategy defaultyesContext usage fraction that triggers density optimization.

Tool Output Limits

Prevent tools from flooding the context. Applied to all tools via the batch scheduler. See Settings and Profiles for how these interact.

SettingTypeDefaultProfileDescription
tool-output-max-itemsnumber50 (read-many-files), 1000 (grep)yesMax files/matches per tool call. Lower to force the model to be more surgical.
tool-output-max-tokensnumber50000yesMax tokens across tool output in a batch. Split across concurrent tool calls.
tool-output-truncate-modeenumwarnyesWhat happens when output exceeds limits. warn = drop output entirely, tell model to narrow query. truncate = cut to fit silently. sample = pick representative lines.
tool-output-item-size-limitnumber524288 (512KB)yesMax bytes per individual file/item. Prevents one huge file from consuming the budget.
file-read-max-linesnumber2000yesDefault max lines when reading a text file with no explicit limit. Prevents accidentally reading massive files.

Timeouts

Prevent commands and tasks from hanging indefinitely. In seconds (not milliseconds, despite older docs).

SettingTypeDefaultProfileDescription
shell-default-timeout-secondsnumber300 (5 min)yesDefault timeout for shell commands. The model can request a specific timeout, but this applies when it doesn't.
shell-max-timeout-secondsnumber900 (15 min)yesHard ceiling — the model can't request longer than this. Increase for long builds/test suites.
shell-inactivity-timeout-secondsnumber— (disabled)yesKill commands that produce no output for this long. Resets on each output line. Good for catching commands that hang waiting for input.
task-default-timeout-secondsnumber900 (15 min)yesDefault timeout for subagent tasks.
task-max-timeout-secondsnumber1800 (30 min)yesHard ceiling for subagent tasks.
socket-timeoutnumber (ms)yesHTTP request timeout for API calls, in milliseconds. Useful for slow local models.

Loop Detection

Catch models that get stuck repeating the same action.

SettingTypeDefaultProfileDescription
maxTurnsPerPromptnumber-1 (unlimited)yesHard limit on turns per prompt. Set to a positive integer to cap runaway sessions.
loopDetectionEnabledbooleantrueyesMaster switch for all loop detection. Disable only if you're sure the model won't loop.
toolCallLoopThresholdnumber50yesConsecutive identical tool calls before intervention. -1 = unlimited.
contentLoopThresholdnumber50yesConsecutive identical content chunks before intervention. -1 = unlimited.

Streaming and Network

SettingTypeDefaultProfileDescription
streamingenumenabledyesenabled or disabled. Disable for providers that don't support streaming or for debugging.
api-versionstringyesAPI version string. Required by some providers (e.g., Azure OpenAI).
socket-keepalivebooleanyesTCP keepalive for local AI servers. Prevents idle connections from dropping.
socket-nodelaybooleanyesTCP_NODELAY for local AI servers. Reduces latency at the cost of more packets.
stream-optionsJSONyesExtra stream options passed to the OpenAI API (e.g., {"include_usage": true}).
retriesnumberyesMax retry attempts for failed API calls.
retrywaitnumber (ms)yesInitial delay between retries. Exponential backoff applies.

Rate Limiting

Proactive throttling to stay within provider rate limits.

SettingTypeDefaultProfileDescription
rate-limit-throttleenumyeson or off. When on, LLxprt proactively slows down before hitting rate limits.
rate-limit-throttle-thresholdnumberyesPercentage of rate limit (1–100) to start throttling at.
rate-limit-max-waitnumber (ms)yesMax time to wait for rate limit headroom before sending anyway.
prompt-cachingenumoffyesProvider-side prompt caching: off, 5m, 1h, 24h. Saves costs when repeating similar prompts. Codex alias defaults to 24h.

Load Balancer

Settings for multi-endpoint load balancing. Only apply when using load-balanced provider configurations.

SettingTypeDefaultProfileDescription
tpm_thresholdnumberyesMinimum tokens/minute before triggering failover to next endpoint.
timeout_msnumber (ms)yesMax request duration before load balancer fails over.
circuit_breaker_enabledbooleanyesEnable circuit breaker for failing backends.
circuit_breaker_failure_thresholdnumber3yesFailures before opening the circuit (stop sending to that backend).
circuit_breaker_failure_window_msnumber (ms)60000yesTime window for counting failures.
circuit_breaker_recovery_timeout_msnumber (ms)30000yesCooldown before retrying an opened circuit.

Subagent and Task Control

SettingTypeDefaultProfileDescription
task-max-asyncnumber5yesMax concurrent async subagent tasks. -1 = unlimited (up to 100).
subagents.async.enabledbooleantrueyesEnable/disable async subagent execution.
todo-continuationbooleanyesEnable todo continuation mode — model picks up where it left off from a todo list.

Tool Control

SettingTypeDefaultProfileDescription
tools.disabledstring[]yesList of tool names to disable. The model won't see these tools at all.
tools.allowedstring[]yesAllowlist — if set, only these tools are available. Overrides tools.disabled.
tool_choicestringyesTool choice strategy sent to the API: auto, required, none.

Prompt Configuration

SettingTypeDefaultProfileDescription
enable-tool-promptsbooleanfalseyesLoad tool-specific prompt files from ~/.llxprt/prompts/tools/. Adds specialized instructions per tool.
include-folder-structurebooleanyesInclude the workspace folder tree in the system prompt. Helps the model navigate, but costs tokens.

Custom Headers

SettingTypeDefaultProfileDescription
custom-headersJSONyesCustom HTTP headers as a JSON object. Applied to all API requests.
user-agentstringyesOverride the User-Agent header. Some providers (e.g., Kimi) require specific user agents.

Shell Behavior

SettingTypeDefaultProfileDescription
shell-replacementstringyesCommand substitution mode: allowlist (safe subset), all (everything), none/false (disabled). Controls whether $() and backticks work in shell commands.

Authentication

SettingTypeDefaultProfileDescription
auth.noBrowserbooleanfalseyesSkip automatic browser launch for OAuth. Use manual code entry instead. Useful for SSH/headless environments.
authOnlybooleanyesForce OAuth-only authentication.

Memory

SettingTypeDefaultProfileDescription
model.canSaveCorebooleanfalsenoAllow the model to write to .LLXPRT_SYSTEM (core system memory). Unsafe — the model can override your own directives. Not saved to profiles deliberately.
model.allMemoriesAreCorebooleanfalseyesLoad LLXPRT.md files as part of the system prompt instead of user context. Makes the model treat your memories as hard directives rather than suggestions.

Debugging

SettingTypeDefaultProfileDescription
emojifilterenumautoyesEmoji handling: allowed, auto (detect terminal support), warn, error.
dumponerrorenumyesDump API request body to ~/.llxprt/dumps/ on errors: enabled or disabled.
dumpcontextenumyesContext dumping: now (dump immediately), status, on (every turn), error (on errors), off.

Model Parameters

These are passed directly to the provider API as-is. LLxprt doesn't validate them. Set with /set modelparam <name> <value>.

ParameterTypeDescription
temperaturenumberSampling temperature (0.0–2.0). Lower = more deterministic.
max_tokensnumberMax tokens to generate (OpenAI/Anthropic). Alias: maxTokens.
max_output_tokensnumberMax output tokens (Gemini native param).
top_pnumberNucleus sampling threshold.
top_knumberTop-k sampling.
frequency_penaltynumberPenalize repeated tokens.
presence_penaltynumberPenalize tokens that appeared at all.
seednumberRandom seed for deterministic output (OpenAI only).
stopstring[]Stop sequences — model stops generating when it produces any of these.
response_formatJSONResponse format (e.g., {"type": "json_object"}).
logit_biasJSONPer-token bias.
reasoningJSONOpenAI reasoning config object. Usually set via reasoning.* settings instead.