crdt-merge v0.9.5

April 18, 2026 · View on GitHub

System Overview

crdt-merge is a 7-layer architecture with orthogonal Accelerator and CLI subsystems. Each layer builds on the one below, providing increasing levels of abstraction from raw CRDT primitives up to trust-entangled distributed convergence.


Layer Architecture (Bottom-Up)

┌─────────────────────────────────────────────────────────────────────────┐
│  LAYER 7: TRUST & ENTANGLEMENT — E4 (35 modules, 1,681 tests)         │
│  e4/ → TypedTrustScore, PCO, ProjectionDelta, CausalTrustClock,       │
│        AdaptiveVerification, TrustBoundMerkle, SLT Byzantine,          │
│        Resilience (18 modules), Integration bridges                    │
├─────────────────────────────────────────────────────────────────────────┤
│  LAYER 6: VERIFICATION & COMPLIANCE (932 LOC)                          │
│  compliance.py → ComplianceAuditor, EUAIActReport, GDPR/HIPAA/SOX     │
├─────────────────────────────────────────────────────────────────────────┤
│  LAYER 5: ENTERPRISE WRAPPERS (3,323 LOC)                              │
│  audit.py │ encryption.py │ rbac.py │ observability.py │ unmerge.py    │
├─────────────────────────────────────────────────────────────────────────┤
│  LAYER 4: AI / MODEL / AGENT (18,410 LOC)                              │
│  model/ (15,464) │ context/ (1,535) │ hub/ (726) │ agentic.py (402)   │
│  mergeql.py (743) │ viz.py (509) │ datasets_ext.py │ flower_plugin.py │
├─────────────────────────────────────────────────────────────────────────┤
│  LAYER 3: SYNC & TRANSPORT (2,626 LOC)                                 │
│  wire.py (740) │ merkle.py (554) │ gossip.py (546) │ delta.py (367)   │
│  schema_evolution.py (419)                                              │
├─────────────────────────────────────────────────────────────────────────┤
│  LAYER 2: MERGE ENGINES (3,984 LOC)                                    │
│  dataframe.py (444) │ arrow.py (969) │ streaming.py (362)              │
│  parquet.py (625) │ parallel.py (251) │ async_merge.py (188)           │
│  json_merge.py (145) │ _polars_engine.py (433)                         │
├─────────────────────────────────────────────────────────────────────────┤
│  LAYER 1: CORE CRDT PRIMITIVES (2,614 LOC)                             │
│  core.py (320) │ strategies.py (377) │ clocks.py (324)                 │
│  probabilistic.py (502) │ dedup.py (260) │ provenance.py (383)        │
│  verify.py (448)                                                        │
└─────────────────────────────────────────────────────────────────────────┘
         ↑                                              ↑
    ┌────┴──────────────┐                ┌──────────────┴────────┐
    │ ACCELERATORS      │                │ CLI                    │
    │ (4,465 LOC)       │                │ (548 LOC)              │
    │ DuckDB, dbt,      │                │ migrate.py             │
    │ Polars, Flight,   │                │ (MergeKit→Python)      │
    │ Airbyte, DuckLake,│                └────────────────────────┘
    │ SQLite, Streamlit  │
    └────────────────────┘

Layer Details

Layer 1: Core CRDT Primitives (2,614 LOC)

Purpose: Mathematical foundations — every other layer depends on this.

ModuleLOCKey Exports
core.py320GCounter, PNCounter, LWWRegister, ORSet, LWWMap
strategies.py377LWW, MaxWins, MinWins, UnionSet, Priority, Concat, LongestWins, Custom, MergeSchema
clocks.py324VectorClock, DottedVersionVector, Ordering
probabilistic.py502MergeableHLL, MergeableBloom, MergeableCMS
dedup.py260dedup(), DedupIndex, MinHashDedup
provenance.py383merge_with_provenance(), ProvenanceTracker, export_provenance()
verify.py448verify_crdt(), @verified_merge, CRDTVerifier

Dependencies: Python stdlib only (zero external dependencies)

CRDT Properties Guaranteed:

  • Commutative: merge(A, B) == merge(B, A)
  • Associative: merge(merge(A, B), C) == merge(A, merge(B, C))
  • Idempotent: merge(A, A) == A

Layer 2: Merge Engines (3,984 LOC)

Purpose: Apply CRDT strategies to real-world data formats (DataFrames, Arrow, Parquet, streams).

ModuleLOCKey Exports
dataframe.py444merge(), diff()
streaming.py362merge_stream(), merge_sorted_stream(), StreamStats
arrow.py969Arrow, ArrowBatch, arrow_merge()
parquet.py625SelfMergingParquet, ParquetMerge
parallel.py251parallel_merge(), ParallelMerge
async_merge.py188amerge(), amerge_stream(), AsyncMerge
json_merge.py145merge_dicts(), merge_json_lines()
_polars_engine.py433Polars engine internals

Dependencies: Layer 1 (core, strategies), pandas, pyarrow (optional), polars (optional)


Layer 3: Sync & Transport (2,626 LOC)

Purpose: Move merge state across networks — serialization, synchronization protocols, schema versioning.

ModuleLOCKey Exports
wire.py740serialize(), deserialize(), peek(), binary protocol
merkle.py554MerkleTree, merkle_diff()
gossip.py546GossipState, anti_entropy()
delta.py367DeltaStore, compute_delta(), apply_delta()
schema_evolution.py419evolve_schema(), check_compatibility()

Dependencies: Layer 1, Layer 2 (for merge operations)


Layer 4: AI / Model / Agent (18,410 LOC)

Purpose: ML model merging (26+ strategies), agentic AI state management, HuggingFace Hub integration.

Module/PackageLOCKey Exports
model/15,464ModelMerge, CRDTMergeState, 26+ strategy classes, LoRAMerge, ContinualMerge, FederatedMerge, GPUMerge, MergePipeline, SafetyAnalyzer, ConflictHeatmap
context/1,535ContextMerge, MemorySidecar, ContextConsolidator, ContextBloom, ContextManifest
hub/726HFMergeHub, AutoModelCard
agentic.py402AgentState, SharedKnowledge
mergeql.py743MergeQL DSL
viz.py509ConflictTopology, D3 export
datasets_ext.py106merge_datasets()
flower_plugin.py500Flower FL integration

Dependencies: Layer 1, Layer 2, Layer 3, torch (optional), transformers (optional)


Layer 5: Enterprise Wrappers (3,323 LOC)

Purpose: Production-grade wrappers adding audit trails, encryption, RBAC, observability, and GDPR compliance.

ModuleLOCKey Exports
audit.py430AuditLog, AuditEntry, AuditedMerge (SHA-256 chain)
encryption.py669EncryptedMerge, 4 crypto backends, key rotation
rbac.py357RBACController, SecureMerge
observability.py1,034MetricsCollector, ObservedMerge, PrometheusExporter, GrafanaDashboard, MergeTracer, DriftDetector, HealthCheck
unmerge.py833UnmergeEngine, ModelUnmerge, GDPRForget

Dependencies: All lower layers

Documentation Status: ZERO existing docs — fully documented in this repo for the first time.


Layer 6: Verification & Compliance (932 LOC)

Purpose: Regulatory compliance auditing and reporting.

ModuleLOCKey Exports
compliance.py932ComplianceAuditor, EUAIActReport, GDPR/HIPAA/SOX auditing

Dependencies: Layer 5 (audit, encryption), Layer 4 (model)

Documentation Status: ZERO existing docs — fully documented in this repo for the first time.


Accelerators (4,465 LOC)

Purpose: Performance-optimized integrations with external databases and tools.

Module~LOCPurpose
duckdb_udf.py~500DuckDB UDF integration
dbt_package.py~450dbt macro package
polars_plugin.py~600Polars engine plugin
flight_server.py~400Arrow Flight distributed server
airbyte.py~350Airbyte ETL connector
ducklake.py~400DuckLake (DuckDB + Data Lake)
sqlite_ext.py~450SQLite extension
streamlit_ui.py~315Streamlit dashboard components

CLI (548 LOC)

ModuleLOCPurpose
cli/migrate.py548MergeKit YAML → Python migration tool

Data Flow

                    ┌─────────────────────────────┐
                    │  External Data Sources       │
                    │  (DataFrames, Parquet, JSON,  │
                    │   ML Models, Agent State)     │
                    └──────────────┬────────────────┘

                    ┌──────────────────────────────┐
                    │  Layer 2: Merge Engines        │
                    │  merge(), merge_stream(),      │
                    │  arrow_merge(), etc.           │
                    │                                │
                    │  ┌──────────────────────────┐  │
                    │  │ Layer 1: CRDT Core       │  │
                    │  │ MergeSchema → per-field   │  │
                    │  │ strategy resolution       │  │
                    │  └──────────────────────────┘  │
                    └──────────────┬────────────────┘

              ┌────────────────────┼────────────────────┐
              │                    │                     │
    ┌───────────────────┐  ┌───────────────┐  ┌────────────────┐
    │ Layer 3: Transport │  │ Layer 4: AI    │  │ Layer 5:        │
    │ serialize()        │  │ ModelMerge()   │  │ Enterprise      │
    │ gossip sync        │  │ AgentState()   │  │ AuditedMerge()  │
    │ delta compress     │  │ MergeQL        │  │ EncryptedMerge()│
    └────────────────────┘  └────────────────┘  │ SecureMerge()   │
                                                 └────────┬────────┘

                                                 ┌────────────────┐
                                                 │ Layer 6:        │
                                                 │ Compliance      │
                                                 │ ComplianceAudit │
                                                 │ EUAIActReport   │
                                                 └─────────────────┘

LOC Summary

ComponentLOC% of Total
Layer 1: Core CRDT2,6148.0%
Layer 2: Engines3,98412.1%
Layer 3: Transport2,6268.0%
Layer 4: AI/Model18,41056.2%
Layer 5: Enterprise3,32310.1%
Layer 6: Compliance9322.8%
Accelerators4,465
CLI + init795
Total (AST actual)29,768

Note: The codebase inventory previously reported 38,157 LOC. AST-verified total is 29,768 LOC. Full discrepancy analysis in gap-analysis/INVENTORY_VS_ACTUAL.md.


Layer 7: Trust and Entanglement (E4)

Purpose: Recursive trust-delta protocol -- every merge operation carries cryptographic proof of provenance, every delta propagates trust as a first-class CRDT dimension.

ComponentKey Exports
crdt_merge.e4.trust_bound_merkleTrust-bound Merkle verification (256-ary, depth 4 at 1B leaves)
crdt_merge.e4.pcoProof-carrying operations (128-byte fixed wire format)
crdt_merge.e4.projection_deltaProjection delta encoding for billion-parameter spaces
crdt_merge.e4.typed_trustTyped multi-dimensional trust scores with GCounter evidence
crdt_merge.e4.causal_trust_clockCausal trust clocks (2.93M ops/s)
crdt_merge.e4.adaptive_verificationAdaptive verification controller (97K-109K ops/s)
crdt_merge.e4.trust_weighted_strategyTrust-weighted conflict resolution strategies
crdt_merge.e4.resilienceResilience subsystem -- 18 modules (Sybil, longcon, epoch, partition, PQ sigs, TLA+)
crdt_merge.e4.integrationIntegration bridges -- gossip, stream, agent, config

Dependencies: Layers 1-6 (composes with all existing subsystems)

Validation: 78/78 CRDT axioms (26 strategies × 3 laws), 156/156 CRDT axiom trials on real weight tensors (facebook/opt-1.3b + opt-6.7b), 34% adversarial-participant tolerance under the SLT harness (not PBFT consensus), 328 computational proofs


Architecture Map v2.0 -- crdt-merge v0.9.5