Memtrace Telemetry
June 1, 2026 · View on GitHub
Memtrace ships with opt-out telemetry that helps us spot crashes, regressions, and performance issues across the user base — the kind of problems that otherwise only show up when someone takes the time to file an issue or DM us. The telemetry pipeline exists for one reason: to make the product better for the people running it.
This document covers what we collect, what we don't, where the data goes, and how to turn it off.
TL;DR
- We never collect source code, file contents, file paths beyond what's needed for crash fingerprints, symbol names, or embeddings.
- We collect: app starts, indexing/embedding durations, panic reports, PR workflow counters, and
WARN/ERRORlog lines from our own crates.- We also collect content-free Rail routing-quality buckets (was the result relevant — never the search text or matched files), measured asynchronously by the background daemon so it never slows a search.
- Set
MEMTRACE_TELEMETRY=offto disable everything (orMEMTRACE_RAIL_SHADOW=offfor just the Rail buckets).- Default is on for crashes, errors, usage events, and the content-free Rail routing-quality buckets. Opt-out is one env var or one config-file line.
What We Collect
There are four streams (the fourth, Rail shadow, on by default in observe
and measured asynchronously off your critical path). Each one ships to
https://memtrace.io/api/telemetry/ingest over HTTPS, authenticated with
the same Bearer session token your install already uses for the heartbeat.
1. Usage events (telemetry_events)
One row per discrete signal the binary emits. Today the supported events are:
| Event | When it fires | Data attached |
|---|---|---|
start | Every memtrace start / memtrace mcp invocation | subcommand, transport mode |
index_complete | After Phase-1 indexing finishes | duration_ms, repo count |
embed_complete | After Phase-2 embedding finishes | duration_ms, embedding count |
pr_review_completed | After memtrace code-review finishes a PR review run | posted/dry-run boolean, watch boolean, comment count, finding count, graph mode, minimum severity, severity-count buckets, source-count buckets |
pr_watch_registered | When memtrace code-review --post --watch registers a local PR watch | comment count, graph mode, local watch status |
pr_watch_synced | When memtrace start, memtrace mcp, or memtrace pr sync polls watched PRs | aggregate watch counts by status: awaiting response, human replied, approved, changes requested, stale after push, merged, closed, poll errors |
pr_watch_poll_error | When a watched PR poll fails | coarse error kind only: rate_limited, token, github, parse, or unknown |
Each row also carries: a stable per-machine device_id (the same one you
see in your ~/.memtrace/credentials.json), the binary version (e.g.
0.3.17), the OS string (e.g. macos-aarch64), and the host-tier score
the resource detector picked. Nothing else.
What this lets us see: how many people run Memtrace each day, whether
indexing got slower in a recent release, and whether the auto-tuned
light/standard/heavy tiers are landing in the right buckets on real
hardware. It's the telemetry equivalent of a daily check-in graph.
For PR review telemetry, the local watcher may store PR coordinates in
~/.memtrace/pr-watches.json so it can poll GitHub later, but the telemetry
payload deliberately does not include those coordinates. PR URLs,
repository owner/name, branch names, commit SHAs, file paths, comment bodies,
reviewer identities, and discussion text stay local.
2. Errors (telemetry_errors)
The binary uses tracing for all internal logging. Anything we log at
WARN or ERROR level inside our own crates is mirrored to the
telemetry queue.
Before a row reaches the queue we run it through a sanitiser:
- Absolute paths under
$HOMEcollapse to~ - Strings that look like API tokens / session tokens / GitHub PATs match
a regex (
[A-Za-z0-9_+/=-]{40,}) and get replaced with<redacted-token> - Email addresses get replaced with
<redacted-email>
Then the row gets a content fingerprint
(sha256(version || target || level || first 6 message tokens)).
Recurring errors with the same fingerprint don't fan out into hundreds
of rows — they bump an occurrences counter on a single row.
What this lets us see: "v0.3.16 introduced a new WARN that didn't exist in v0.3.15", or "23% of macOS-aarch64 users hit this fastembed init warning". Those are the signals that drive bug fixes.
3. Crash reports (telemetry_crashes)
If the binary panics, the panic hook captures:
- The panic message (sanitised the same way as errors)
- The crash location as
file:line(e.g.src/main.rs:42) - The Rust backtrace, capped at 16 KB and run through the same sanitiser
These get written to a local file at
~/.memtrace/telemetry/queue.jsonl synchronously inside the panic hook,
so even a hard crash that exits the process gets captured. They flush
to memtrace.io on the next successful run.
What this lets us see: the regressions that nobody bothered to file an issue about. Pre-telemetry, the M3 Air "stuck on Loading embedding model" hang and the Windows MSVC build failures were each visible to us only after a user took the time to DM us — for every user who told us, several others probably hit the same thing and quietly uninstalled.
4. Rail shadow telemetry (rail_shadow)
Memtrace Rail is the optional router that can intercept code-discovery searches (grep/ripgrep/find for source symbols) and answer them from Memtrace's graph instead of raw text search. Before we ever consider making Rail active by default, we need evidence that its answers can be trusted — so when Rail is active it records a content-free measurement of what it would have returned, without ever capturing your search.
One row per Memtrace-owned code search, carrying only categories and buckets:
| Field | Example | What it is |
|---|---|---|
mode | observe / nudge / rail / strict | Rail's operating mode |
surface | memtrace_owned | the search was for source symbols in an indexed repo |
would_route | true | whether Rail would route this to Memtrace |
shape | identifier / alternation / phrase / regex / empty | the shape of the pattern — never the text itself |
retrieval | hit / miss / unavailable | did Memtrace have a confident answer |
score_bucket | lt10 / b10 / b25 / gte50 | bucketed relevance score (never the raw number) |
relevance_proxy | true / false | computed on your machine: did the top result's name/path contain a token from the search? Only this yes/no leaves — the strings are compared locally and discarded |
latency_bucket | fast / mid / slow | how quickly Memtrace answered |
Plus the same device_id / version / os envelope as the other streams.
When it sends: by default, in observe mode (every install). Crucially it
is measured asynchronously, off your critical path: the search hook records
a request and returns instantly — it never queries the backend — and the
long-running daemon performs the measurement in the background. So it adds no
latency to any search (the grep/find runs exactly as before). Enforcing
modes (memtrace rail enable nudge|rail|strict) additionally measure inline,
since Rail already has to query in order to route. Opt out with
MEMTRACE_TELEMETRY=off (all telemetry) or MEMTRACE_RAIL_SHADOW=off (Rail
only); MEMTRACE_RAIL_SHADOW_SAMPLE=0..1 bounds the background work on busy
machines.
What this lets us see: whether Rail's answers are actually relevant (precision), how often it can help (coverage), which query shapes it handles well, and the right confidence threshold — so any decision to make Rail active by default is backed by real evidence, not guesswork. What it never tells us: what you searched for, or which files/symbols a search matched.
What We Don't Collect
We don't have to manage tradeoffs here because the categories are clean: none of the following ever leaves your machine via telemetry, and the data model on the receiving end has no column to put them in.
- ❌ Source code or file contents
- ❌ The text of your
grep/find/search commands — Rail records only the pattern shape (identifier/regex/…), never the query you typed - ❌ Which files, symbols, or results a search matched — only a local yes/no relevance bucket, computed on your machine
- ❌ Symbol names from your codebase
- ❌ Embeddings, BM25 indices, or any derived data from your code
- ❌ Repository names, paths, or remote URLs
- ❌ GitHub PR URLs, issue/review/comment bodies, reviewer identities, or pull request discussion text
- ❌ Branch names, commit messages, or git history
- ❌ Any path that points inside the indexed repo
- ❌ Environment variables (the sanitiser strips token-shaped strings, but we never read env values directly into telemetry payloads)
- ❌ IP addresses (we don't log them server-side; standard request logs are kept for 7 days for abuse mitigation only)
If a panic backtrace happens to include a path inside one of your
repositories — say a tree-sitter library hit an assertion while parsing
your code — the path component still gets sanitised (home dir → ~)
but the backtrace is otherwise verbatim. If you'd rather opt out of that
risk completely, set MEMTRACE_TELEMETRY=off. We'd rather you stay opted
in and tell us if you find a backtrace that looks too revealing.
Where the Data Goes
- Transport: HTTPS to
https://memtrace.io/api/telemetry/ingest, authenticated with the same Bearer session token your install uses for the existing license heartbeat. No third-party analytics SDK is embedded — every byte of the pipeline is in this repo atcrates/memtrace-mcp/src/telemetry.rs. - Storage: four Postgres tables on the memtrace.io infrastructure
(
telemetry_events,telemetry_errors,telemetry_crashes, andrail_shadow), schema inmemtrace-ui/drizzle/0002_telemetry.sqlandmemtrace-ui/drizzle/0018_rail_shadow.sql. Retention is unlimited today; we'll publish a retention policy before exceeding 90 days of data. - Access: the admin analytics dashboard at
https://memtrace.io/admin/analyticsis gated to@syncable.devemail accounts only. We do not share or sell this data, and we don't publish anonymised aggregates without notice.
How to Turn It Off
Environment variable (per process)
MEMTRACE_TELEMETRY=off memtrace start
Accepted off-values: off, 0, false, disabled, no. Anything
else (including unset) keeps telemetry on.
When disabled:
- The panic hook still installs (so a crash in a disabled-telemetry
session still leaves a local breadcrumb in
~/.memtrace/telemetry/), but the file never gets shipped. - The tracing layer becomes a no-op — no in-memory aggregation, no
queue writes for
WARN/ERROR. - The flusher goroutine exits immediately — no network calls.
- Usage events from
record_event()short-circuit.
Make it permanent
Add this to your shell profile:
# ~/.zshrc / ~/.bashrc
export MEMTRACE_TELEMETRY=off
Or set it in your editor's MCP config so the daemon mode picks it up:
{
"command": "memtrace",
"args": ["mcp"],
"env": { "MEMTRACE_TELEMETRY": "off" }
}
Verifying What's in the Queue
Telemetry sits on disk before being shipped. You can read it directly:
cat ~/.memtrace/telemetry/queue.jsonl | head -5
Each line is one record. The kind field marks it as event, error,
or crash. There is no separate "raw" buffer — what you see here is
everything.
If you want to inspect what would have been shipped without actually
shipping it, set MEMTRACE_TELEMETRY=off (the queue still won't be
written) and then read the JSONL on a fresh run after un-setting it.
Changes to This Policy
Material changes to what we collect, where it's stored, or how long it's kept will be announced in:
- The release notes of the version that introduces the change
- This file (with a
## Changelogsection at the bottom) - The
memtrace-public/PRIVACY.mdsummary
If you have questions or spot something that should be sanitised but isn't, open an issue at github.com/syncable-dev/memtrace-public or email support@syncable.dev.