Installation
August 10, 2020 ยท View on GitHub
Data sketches library for the V language.
Installation
v install mobarski.sketch
Usage
This library implements the following probabilistic data structures:
- Bloom filter - for membership estimation
- MinHash - for cardinality and similarity estimation
- Count-min sketch - for frequency estimation
- LogLog - for cardinality estimation
Bloom filter
import mobarski.sketch
mut s := sketch.bloom(1, 2) // 1*u64=64 bits, 2 hashing functions
s.add("v is simple")
s.add("v is fast")
s.add("v is safe")
s.add("v is compiled")
println(s.might_contain("v is simple")) // true
println(s.might_contain("v is fast")) // true
println(s.might_contain("v is safe")) // true
println(s.might_contain("v is compiled")) // true
println(s.might_contain("v is complex")) // false
println(s.might_contain("v is slow")) // false
println(s.might_contain("v is unsafe")) // false
println(s.might_contain("v is interpreted")) // false
Count-min sketch
import mobarski.sketch
mut s := sketch.new_countmin(4,3) // 4 items, 3 hashing functions
s.add("foo")
s.add("bar")
s.add("foo")
println(s.estimate("foo")) // 2
println(s.estimate("bar")) // 1
println(s.estimate("xyz")) // 0
MinHash
import mobarski.sketch
mut s := sketch.new_minhash(100)
for i in 0..20000 {
h.add(i.str())
h.add(i.str())
h.add(i.str())
}
println(h.estimate())
LogLog
import mobarski.sketch