๐ crdt-merge
March 27, 2026 ยท View on GitHub
๐ crdt-merge
Conflict-free merge, dedup & diff for any dataset โ powered by CRDTs
Merge any two datasets in one function call. No conflicts. No coordination. No data loss.
Quick Start โข API Reference โข Why CRDTs โข All Languages
๐ Available in Every Language
| Language | Package | Install | Repo |
|---|---|---|---|
| Python ๐ | crdt-merge | pip install crdt-merge | crdt-merge |
| TypeScript | crdt-merge | npm install crdt-merge | crdt-merge-ts |
| Rust ๐ฆ | crdt-merge | cargo add crdt-merge | crdt-merge-rs |
| Java โ | io.optitransfer:crdt-merge | Maven / Gradle | You are here |
| CLI ๐ฅ๏ธ | included in Rust | cargo install crdt-merge | crdt-merge-rs |
๐ฏ The Problem
You have two versions of a dataset. Maybe two Spark jobs ran in parallel. Maybe two microservices updated the same records. Maybe you're merging data from multiple sources.
Today: Write custom merge scripts, lose data, or block on a coordinator.
With crdt-merge: One method call. Zero conflicts. Mathematically guaranteed.
List<Map<String, Object>> merged = CrdtMerge.merge(datasetA, datasetB, "id"); // done.
โก Quick Start
Maven
<dependency>
<groupId>io.optitransfer</groupId>
<artifactId>crdt-merge</artifactId>
<version>0.1.0</version>
</dependency>
Gradle
implementation 'io.optitransfer:crdt-merge:0.1.0'
From Source
git clone https://github.com/mgillr/crdt-merge-java.git
cd crdt-merge-java
mvn package
๐ API Reference
Merge Two Datasets
import io.optitransfer.crdtmerge.CrdtMerge;
List<Map<String, Object>> teamA = List.of(
Map.of("id", 1, "name", "Alice", "role", "engineer"),
Map.of("id", 2, "name", "Bob", "role", "designer")
);
List<Map<String, Object>> teamB = List.of(
Map.of("id", 2, "name", "Robert", "role", "designer"),
Map.of("id", 3, "name", "Charlie", "role", "pm")
);
List<Map<String, Object>> merged = CrdtMerge.merge(teamA, teamB, "id");
// id=1: Alice (only in A โ preserved)
// id=2: Robert (B wins โ latest)
// id=3: Charlie (only in B โ preserved)
Deduplicate
import io.optitransfer.crdtmerge.DedupEngine;
List<Map<String, Object>> data = List.of(
Map.of("name", "Alice"),
Map.of("name", "Alicia"),
Map.of("name", "Bob")
);
DedupEngine.DedupResult result = CrdtMerge.dedup(data, "name", 0.7);
System.out.println("Unique: " + result.unique.size());
System.out.println("Duplicates: " + result.duplicates.size());
Structural Diff
import io.optitransfer.crdtmerge.DiffEngine;
DiffEngine.DiffResult diff = CrdtMerge.diff(oldData, newData, "id");
System.out.println(diff.summary);
// "+5 added, -2 removed, ~3 modified, =990 unchanged"
Deep JSON Merge
import com.google.gson.JsonObject;
JsonObject configA = JsonParser.parseString(
"{\"model\": {\"name\": \"bert\", \"layers\": 12}, \"tags\": [\"nlp\"]}"
).getAsJsonObject();
JsonObject configB = JsonParser.parseString(
"{\"model\": {\"name\": \"bert-large\", \"dropout\": 0.1}, \"tags\": [\"qa\"]}"
).getAsJsonObject();
JsonObject merged = CrdtMerge.mergeJson(configA, configB);
// {"model": {"name": "bert-large", "layers": 12, "dropout": 0.1}, "tags": ["nlp", "qa"]}
Core CRDT Types
import io.optitransfer.crdtmerge.crdt.*;
// Distributed counter
GCounter counterA = new GCounter();
counterA.increment("server-1", 100);
GCounter counterB = new GCounter();
counterB.increment("server-2", 200);
GCounter merged = counterA.merge(counterB);
System.out.println(merged.value()); // 300
// Last-writer-wins register
LWWRegister<String> regA = new LWWRegister<>("Alice", 1000L);
LWWRegister<String> regB = new LWWRegister<>("Alicia", 2000L);
System.out.println(regA.merge(regB).value()); // "Alicia" (later wins)
// Observed-remove set
ORSet<String> setA = new ORSet<>();
setA.add("item1");
ORSet<String> setB = new ORSet<>();
setB.add("item2");
ORSet<String> mergedSet = setA.merge(setB);
System.out.println(mergedSet.contains("item1")); // true
System.out.println(mergedSet.contains("item2")); // true
๐ง Why CRDTs
CRDT = Conflict-free Replicated Data Type. A data structure with one mathematical superpower:
Any two copies can merge โ in any order, at any time โ and the result is always identical and always correct.
Three mathematical guarantees (proven, not hoped):
| Property | What it means |
|---|---|
| Commutative | merge(A, B) == merge(B, A) โ order doesn't matter |
| Associative | merge(merge(A, B), C) == merge(A, merge(B, C)) โ grouping doesn't matter |
| Idempotent | merge(A, A) == A โ re-merging is safe |
This means: zero coordination, zero locks, zero conflicts.
Built-in CRDT Types
| Type | Use Case | Example |
|---|---|---|
GCounter | Grow-only counters | Download counts, page views |
PNCounter | Increment + decrement | Stock levels, balances |
LWWRegister<T> | Single value (latest wins) | Name, email, status fields |
ORSet<T> | Add/remove set | Tags, memberships, dedup sets |
Features
- Tabular Merge โ Merge two lists of maps by primary key using CRDT LWW semantics
- Deduplication โ Exact and fuzzy dedup using Jaccard similarity on character bigrams
- Structural Diff โ See added, removed, and modified rows between two datasets
- JSON Merge โ Deep merge of nested JSON objects with conflict-free resolution
- Core CRDTs โ Production-ready GCounter, PNCounter, LWWRegister, ORSet
- Zero config โ One dependency (Gson), works with any Map/List data
๐๏ธ Use Cases
- Spark pipelines: Merge partitioned outputs without a coordinator
- Microservices: Each service maintains local state, merge on demand
- Event sourcing: Merge event streams from multiple sources
- Data lakes: Combine datasets from different teams/regions
- Cache reconciliation: Merge divergent cache states after network partition
Requirements
- Java 17+
- Gson 2.10.1+ (included via Maven)
Building
mvn compile # Compile
mvn test # Run tests (79/79 passing)
mvn package # Create JAR
License
Licensed under the Apache License, Version 2.0.
Contributing? By opening a pull request, you agree to our Contributor License Agreement.
Copyright 2026 Ryan Gillespie / Optitransfer. See NOTICE for attribution requirements.
For commercial licensing inquiries: rgillespie83@icloud.com, data@optitransfer.ch
Built with math, not hope. ๐งฌ
โญ Star on GitHub โข ๐ค Try on HuggingFace โข ๐ฆ Maven Central