Data And Architecture
May 16, 2026 ยท View on GitHub
Effect Runtime Boundary
Birdclaw's core I/O code should be written as Effect programs. Use Effect.gen for multi-step workflows, typed failures for expected errors, and Effect.forEach / Effect.sleep for concurrency, retry, timeout, and pacing logic.
Keep framework edges boring:
- CLI command handlers may
awaitPromise wrappers. - React components may call Promise wrappers from effects and event handlers.
- route handlers may return normal
Responsevalues.
Inside src/lib, prefer exporting both forms when useful:
export function runThingEffect(
input: Input,
): Effect.Effect<Output, ThingError> {
return Effect.gen(function* () {
// core logic
});
}
export function runThing(input: Input): Promise<Output> {
return runEffectPromise(runThingEffect(input));
}
Use runEffectPromise from src/lib/effect-runtime.ts for Promise wrappers so typed failures are thrown directly instead of being hidden behind Effect fiber failures.
Current migrated surfaces:
- typed Effect-to-Promise boundary handling in
src/lib/effect-runtime.ts - web API client parsing and sync-job polling in
src/lib/api-client.ts birdcommand availability and execution insrc/lib/bird-command.tsbirdJSON transport, large stdout capture, and temp-file cleanup insrc/lib/bird.tsxurlcommand execution, JSON parsing, retry delay, mutation helpers, and public adapter wrappers insrc/lib/xurl.ts- backup export/import/validation, Git repo setup, pull, commit/push, stale auto-update, and auto-sync orchestration in
src/lib/backup.ts - moderation action transport fallback and bird action/profile adapter helpers in
src/lib/actions-transport.tsandsrc/lib/bird-actions.ts - moderation target resolution plus local blocks/mutes write helpers, remote block sync, and x-web block/unblock mutations in
src/lib/moderation-target.ts,src/lib/moderation-write.ts,src/lib/blocks-write.ts,src/lib/blocks.ts,src/lib/mutes-write.ts, andsrc/lib/x-web.ts - batch blocklist file import in
src/lib/blocklist.ts - authored, mentions, mention-thread sync including xurl recent-search and parent-walk fallback internals, conversation surface, home timeline, saved collection, DM live sync, profile hydration/resolution/affiliation, profile-reply inspection, shared tweet lookup, research and whois report generation, inbox scoring, and follow graph live sync cache/fetch/merge flows in
src/lib/authored-live.ts,src/lib/mentions-live.ts,src/lib/mention-threads-live.ts,src/lib/conversation-surface.ts,src/lib/timeline-live.ts,src/lib/timeline-collections-live.ts,src/lib/dms-live.ts,src/lib/profile-hydration.ts,src/lib/profile-affiliation-hydration.ts,src/lib/profile-resolver.ts,src/lib/profile-replies.ts,src/lib/tweet-lookup.ts,src/lib/research.ts,src/lib/whois.ts,src/lib/inbox.ts, andsrc/lib/follow-graph.ts - link preview metadata fetches and link-index backfill concurrency in
src/lib/link-preview-metadata.tsandsrc/lib/link-index.ts media fetcharchive reuse, HTTP download groups, pacing, and bounded concurrency insrc/lib/media-fetch.ts- archive discovery and archive-import subprocess boundaries in
src/lib/archive-finder.tsandsrc/lib/archive-import.ts - avatar read-through caching and URL expansion cache/fetch flows in
src/lib/avatar-cache.tsandsrc/lib/url-expansion.ts - OpenAI inbox scoring fetch/parse boundary in
src/lib/openai.ts - scheduled bookmark sync audit logging, overlap locking, backup pass, and launchd install in
src/lib/bookmark-sync-job.ts - web sync orchestration, plan runners, backup pass, and job polling in
src/lib/web-sync.ts
Production src/lib code should stay free of ad hoc async/await orchestration. Next migrations should target remaining CLI, React, and route edges only where an Effect boundary would simplify error handling, cancellation, retries, or concurrency; otherwise keep those framework adapters as small Promise wrappers over core Effect programs.
Transport Strategy
Support these adapters:
xurlbird- official API
Optional later:
- lower-level
xweb
Recommendation
v1 transport priority:
archivexurlbird- official direct API
- optional lower-level
xweb
Reason:
- working
xurlalready exists - users with
xurlsetup get zero-friction sync birdcan cover GraphQL/cookie-backed gaps if needed- official API adapter keeps long-term independence
xurl compatibility
Important stance:
- do not "pretend to be xurl" by mutating or owning
~/.xurlformat in v1 - shell out to
xurlas an adapter instead
Why:
- lower auth risk
- lower coupling to xurl store internals
- birdclaw stays transport-agnostic
- users already authenticated in
xurlget immediate value
Possible later feature:
birdclaw auth import-xurl- local one-shot import into birdclaw-managed credentials
- opt-in only
bird compatibility
Treat bird the same way:
- adapter, not architecture
- subprocess or wrapper boundary
- no dependency on
birdconfig/storage as core truth - useful for GraphQL/cookie-backed reads or actions when
xurldoes not cover a surface
Transport interface
import type { Effect } from "effect";
type TransportTask<A, E = TransportError> = Effect.Effect<A, E>;
type TransportKind = "archive" | "xurl" | "bird" | "official" | "xweb";
interface BirdTransport {
kind: TransportKind;
capabilities(): TransportTask<TransportCapabilities>;
currentUser(): TransportTask<AccountIdentity>;
listBookmarks(input: CursorInput): TransportTask<Page<BookmarkRecord>>;
listLikes(input: CursorInput): TransportTask<Page<LikeRecord>>;
listMentions(input: CursorInput): TransportTask<Page<TweetRecord>>;
listFollowers(input: CursorInput): TransportTask<Page<ProfileRecord>>;
listFollowing(input: CursorInput): TransportTask<Page<ProfileRecord>>;
listUserTweets(input: UserTimelineInput): TransportTask<Page<TweetRecord>>;
listDmEvents(input: DmEventsInput): TransportTask<Page<DmEventRecord>>;
getTweet(id: string): TransportTask<TweetRecord | null>;
postTweet(input: ComposeTweetInput): TransportTask<PostResult>;
reply(input: ReplyInput): TransportTask<PostResult>;
blockProfile(input: ProfileActionInput): TransportTask<ActionResult>;
unblockProfile(input: ProfileActionInput): TransportTask<ActionResult>;
muteProfile(input: ProfileActionInput): TransportTask<ActionResult>;
unmuteProfile(input: ProfileActionInput): TransportTask<ActionResult>;
}
Transport config
Per account:
- preferred transport:
auto | xurl | bird | official | xweb - fallback chain
- capability cache
- auth status snapshot
auto means:
- use
xurlif available and healthy - else use
birdif available and healthy - else use official auth if configured
- else archive-only mode
Data Model
SQLite only. Kysely schema in code, migrations checked into repo.
Core tables
accounts- local account metadata
- preferred transport
- sync defaults
profiles- Twitter users/authors/participants
- keep bio, follower/following counts, profile URL, location, verification type, structured URL entities, raw profile JSON, and lightweight influence fields queryable in canonical columns
- DM surfaces should render sender bio and influence context from here without needing raw payload lookups
profile_affiliations- active subject-to-organization affiliation edges from X profile badges / highlighted labels
- stores organization id or deterministic synthetic id, label, handle, badge URL, URL, source, and first/last-seen timestamps
- synthetic highlighted-label ids are upgraded to real local organization profile ids when
birdcan hydrate the org handle
profile_snapshots- deduplicated history of hydrated profile identity fields
- stores bio, display name, handle, location, profile URL, verification type, counts, active affiliations, raw JSON, and first/last seen timestamps per state
- lets
whoisexplain current-vs-former affiliation or bio evidence instead of losing overwritten profile text
profile_bio_entities- first-class extracted identity hints from profile bio/URL/affiliations
- stores active and inactive
handle,domain, andcompany_phrasevalues with first/last seen timestamps - feeds fuzzy identity ranking for prompts such as
blacksmith guy, where@useblacksmithorblacksmith.shis stronger than a generic DM keyword
blocks- account-scoped local blocklist
- canonical local state for blocklist UI and CLI
mutes- account-scoped local mutelist
- canonical local state for CLI moderation actions
- live transport result layered on top, not required for local bookkeeping
tweets- canonical tweet rows
- text, metrics, timestamps, references, author id
- raw JSON payload column
tweet_mediatweet_urlstweet_mentionsfollow_edges- current complete directional graph for
followersandfollowing
- current complete directional graph for
follow_snapshots- snapshot metadata for follower/following crawls, including complete/incomplete status
follow_snapshot_members- normalized membership per snapshot
follow_events- append-only started/ended follow events
bookmarks- account-scoped saved tweet ids
likes- account-scoped liked tweet ids
threads- optional thread/cache grouping
dm_conversationsdm_events- event log, not only message text
dm_participantsdm_payloads- full text / URLs / reactions / attachments when retained
sync_cursors- one row per stream + transport + account + scope
import_runssync_runsraw_objects- optional retained source payloads for reparsing
Search tables
tweets_ftsdm_fts
Use FTS5.
Day-1 search modes:
- exact filters
- keyword full-text
- date ranges
- author / conversation / bookmarked / liked filters
- DM sender follower-count filters
- DM sender derived influence-score filters
- replied / unreplied filters for mentions and DMs
- local block/mute maintenance via handle, id, or URL-derived profile match
No vector search required for MVP.
Indexing
Indexes from day 1:
- tweet id unique
- author + created_at desc
- created_at desc
- conversation + occurred_at desc
- bookmark/account + created_at desc
- like/account + created_at desc
- active follow edges by observer + direction
- follow events by observer + event_at desc
- latest follow snapshot by observer + type
- sync cursor unique by stream/account/scope/transport
Follow Graph Model
Borrow the shape from sweetistics, but local-first and SQLite-native.
Principles:
- directional edges, not dual booleans
- snapshots are the source of truth for full crawls
- events are append-only
- current state and history both matter
Direction semantics
- Conceptual
inbound: they follow the account - Conceptual
outbound: the account follows them - Stored/API
followers: inbound direction, matching the X endpoint and CLI command - Stored/API
following: outbound direction, matching the X endpoint and CLI command
Tables
follow_edges- primary key:
(account_id, direction, profile_id) account_idis the observer account;profile_idis the subject profile- fields:
current,first_seen_at,last_seen_at,ended_at,source,updated_at
- primary key:
follow_snapshots- one row per full followers/following crawl
- fields:
direction,status,page_count,result_count,source,raw_meta_json
follow_snapshot_members- normalized set of members per snapshot
follow_events- append-only
started/ended - references snapshot/run when available
- idempotent per account
- append-only
Cache and cost guardrails
sync followersandsync followingdefault to dry-run and do not call X- live xurl sync requires
--yes - fresh sync results are stored in
sync_cacheand reused by matching sync commands unless--refreshis passed - graph query commands only read
follow_edges,follow_snapshots,follow_snapshot_members,follow_events, andprofiles - incomplete snapshots keep fetched members for audit but do not update current edges or churn events
What this buys us
- current followers/following lists
- mutuals
- churn over time
- "who came in"
- "who left"
- first seen / last seen
- account growth graph
- graph-aware AI ranking later
UI ideas
- follows dashboard
- arrivals / departures timeline
- mutuals view
- notable churn
- relationship detail for one profile
- graph overlays in the AI inbox
Archive Import
Inputs
- Twitter export zip
- extracted archive directory
Supported archive slices in v1
- account
- tweets
- likes
- bookmarks if present
- profiles
- direct messages
- followers
- following
Import pipeline
- inspect manifest / discover files
- parse wrapper JS payloads
- normalize records
- write canonical entities
- update import provenance
- refresh FTS
Rules
- idempotent reruns
- preserve richer existing data
- raw source retained optionally
- full import refreshes all supported archive slices together
- selected import refreshes only requested slices and preserves unselected local data
- selected import validates
acct_primaryidentity before writing - selected collection imports preserve live collection rows and local tweet ownership
- selected DM imports are scoped to
acct_primaryand preserve other accounts
Selected import slices:
tweetslikesbookmarksprofilesdirectMessagesfollowersfollowing
Sync Model
Streams:
- own tweets
- mentions
- likes
- bookmarks
- DMs
- followers
- following
Future:
- notifications
- list timelines
- graph analytics
Sync behavior
- cursor-based where possible
- account-scoped checkpoints
- dedupe on canonical IDs
- partial success preserved
- safe rerun after crash
Local Web App
Purpose
Primary human UI.
Views
- inbox
- AI-ranked blend of mentions, replies, DMs, bookmarks to revisit, notable posts
- timeline/search
- tweet detail
- thread detail
- DM conversation
- DM conversation
- persistent sender context with bio and follower count
- filter by sender influence band
- filter replied vs unreplied
- bookmarks
- likes
- follows dashboard
- mutuals
- follow event history
- sync status
- account/auth/transports
- compose/reply
AI layer
Local-first ranking metadata stored in DB:
- score
- reason codes
- summary
- labels
- dismissed state
- acted-on state
Candidate ranking inputs:
- author importance
- sender follower count / influence
- sender bio cues
- reply intent
- mention density
- follower relationship
- conversation recency
- prior engagement
- bookmark/like overlap
- DM priority
- follow-graph proximity
- churn salience
Web server mode
birdclaw serve
- starts local server
- starts background sync automatically by default
- opens browser unless
--no-open - exposes sync health and recent job state in the UI
Profiles / Accounts
Support profiles from day 1.
Reason:
- separate DBs or configs for personal/test/future shared use
- easier OSS story
Default:
- one profile named
default
Auth / Secrets
Do not store secrets in config JSON.
Options by transport:
xurl- birdclaw shells out to
xurl - auth remains managed by
xurl
- birdclaw shells out to
bird- birdclaw shells out to
birdor wraps a narrow stable surface - useful for GraphQL/cookie-backed capabilities
- birdclaw shells out to
official- birdclaw stores tokens securely
- keychain if available, encrypted local store otherwise
xweb- explicit low-level escape hatch if needed later
Package Layout
birdclaw/
apps/
web/
packages/
archive/
cli/
core/
db/
server/
transport-bird/
transport-official/
transport-xurl/
transport-xweb/
ui/
docs/
spec.md
cli.md
data-architecture.md
Package responsibilities
core- domain types
- sync contracts
- ranking contracts
archive- archive parsers and normalizers
db- Kysely schema
- migrations
- repositories
- FTS helpers
- DM influence and replied/unreplied query helpers
transport-xurlxurldetection- subprocess exec wrappers
- output parsing
transport-birdbirddetection- subprocess exec wrappers
- GraphQL-focused reads/actions
transport-official- direct Twitter API client
transport-xweb- optional cookie/graphql mode
server- local app API
- background sync orchestration
cli- command surface
ui- React components, inbox, thread, DM views
- compact sender bio / influence surfaces for DM context
apps/web- TanStack Start app shell
Testing Plan
Unit
- archive file parsing
- domain normalization
- SQL repositories
- FTS queries
- ranker behavior
Integration
- import fixture archive into temp DB
- sync fixture pages into temp DB
- run search queries against populated DB
- verify
xurladapter against stubbed subprocess output - verify follow graph snapshot -> diff -> events behavior
Live
Opt-in only:
- real
xurlhealth check - real
birdhealth check - real sync smoke tests
Distribution
Primary:
- npm package
Secondary later:
- standalone desktop wrapper if the web UX becomes primary