go-mcache

April 10, 2026 · View on GitHub

Go Report Card GoDoc Tests Release

Generic in-memory cache for Go with TinyLFU admission, sharded storage, batched read tracking, and a timing-wheel expiration path.

Design

mcache combines two ideas to achieve both high hit ratios and high throughput:

Admission control via TinyLFU. Not every new item gets into the cache. On insertion, a Count-Min Sketch estimates the access frequency of the incoming item and compares it against a random sample of existing entries (SampledLFU). If the new item has lower frequency, it is rejected. This prevents one-time keys from evicting frequently accessed data — a problem that LRU caches have by design. The frequency sketch uses 4-bit counters (8 bytes per ~16 tracked keys) and resets periodically to adapt to changing access patterns.

Cheap read path. Reads always go through a sharded map lookup. Frequency tracking is updated in a best-effort batched buffer once the cache becomes meaningfully occupied, which avoids paying TinyLFU bookkeeping on every hot read while the cache is still far from capacity.

Architecture

Cache[K, V]
├── ShardedStore         1024 shards, cache-line padded, per-shard RWMutex
│   └── map[K]*Entry     standard Go map per shard
├── Policy[K]            generic over key type (no hash-collision ambiguity)
│   ├── TinyLFU          doorkeeper (Bloom filter) + Count-Min Sketch
│   └── SampledLFU[K]    dense array + map for O(1) random sampling
├── ExpiryWheel          coarse timing wheel for best-effort background expiry
├── RadixTree            opt-in, for prefix search on string keys
├── WriteBuffer          lock-free ring buffer for async batching
├── ReadBuffer           lossy batched policy-access replay
└── Metrics              optional atomic counters

Why exact-key policy matters

The policy tracks entries by exact key K, not by hash. This means:

  • No hash collision ambiguity — two keys with the same hash are tracked independently
  • Eviction victims are returned as {Key, KeyHash} pairs — no reverse lookup needed
  • The SampledLFU uses a dense array with swap-delete, so random sampling is O(sampleSize) instead of O(n) map iteration

Expiration

Entries with TTL are scheduled into a coarse timing wheel for best-effort background cleanup. Exact TTL enforcement still happens on Get/Has by checking the entry's ExpireAt, while the background worker lazily removes entries whose scheduled expiration still matches the live entry.

Install

go get github.com/OrlovEvgeny/go-mcache

Requires Go 1.23+

Usage

cache := mcache.NewCache[string, int](
    mcache.WithMaxEntries[string, int](100_000),
)
defer cache.Close()

cache.Set("key", 42, 5*time.Minute)

val, ok := cache.Get("key")  // type-safe, no assertion

Cost-based eviction

cache := mcache.NewCache[string, []byte](
    mcache.WithMaxCost[string, []byte](100 << 20), // 100 MB
    mcache.WithCostFunc[string, []byte](func(v []byte) int64 {
        return int64(len(v))
    }),
)

// A 10 MB value may evict multiple smaller entries
cache.Set("large", make([]byte, 10<<20), 0)

Batch reads

batch := cache.GetBatchOptimized(keys)
// Keys are sorted by shard index before lookup
// for sequential memory access and reduced lock contention
for i, key := range batch.Keys {
    if batch.Found[i] {
        process(key, batch.Values[i])
    }
}

Prefix search (string keys, opt-in)

cache := mcache.NewCache[string, int](
    mcache.WithPrefixSearch[string, int](true),
)

cache.Set("user:1:name", 1, 0)
cache.Set("user:1:email", 2, 0)
cache.Set("user:2:name", 3, 0)

iter := cache.ScanPrefix("user:1:", 0, 100)
for iter.Next() {
    fmt.Println(iter.Key()) // user:1:name, user:1:email
}

Async writes

cache := mcache.NewCache[string, int](
    mcache.WithBufferItems[string, int](64),
)

cache.Set("key", 1, 0)    // buffered, returns immediately
cache.Wait()               // blocks until buffer is flushed
val, _ := cache.Get("key") // guaranteed to see the value

When the write buffer is full, the operation falls back to synchronous execution instead of dropping the entry.

Configuration

OptionDescriptionDefault
WithMaxEntriesMaximum number of entriesunlimited
WithMaxCostMaximum total costunlimited
WithNumCountersTinyLFU counters (recommend 10x max entries)auto
WithShardCountNumber of shards (power of 2)1024
WithBufferItemsAsync write buffer size (0 = sync)0
WithMetricsEnable cache metrics collectiontrue
WithExpirationResolutionBackground expiration tick resolution100ms
WithDefaultTTLDefault TTL for entries without explicit TTL0 (no expiry)
WithCostFuncCustom cost calculatorcost = 1
WithKeyHasherCustom key hash functionauto (FNV-1a)
WithLockFreePolicyUse lock-free TinyLFU for readstrue
WithPrefixSearchEnable radix tree for ScanPrefixfalse
WithOnEvictCallback on evictionnil
WithOnExpireCallback on TTL expirationnil
WithOnRejectCallback when TinyLFU rejects entrynil

Supported key types

Built-in zero-allocation hashing for: string, int, int8int64, uintuint64, float32, float64, bool, uintptr. Other comparable types use fmt.Sprintf fallback — provide WithKeyHasher for production use with custom types.

API

// Core
cache.Set(key K, value V, ttl time.Duration) bool
cache.SetWithCost(key K, value V, cost int64, ttl time.Duration) bool
cache.Get(key K) (V, bool)
cache.Has(key K) bool
cache.Delete(key K) bool
cache.Len() int
cache.Clear()
cache.Close()
cache.Wait()

// Batch
cache.GetMany(keys []K) map[K]V
cache.GetBatch(keys []K) *BatchResult[K, V]
cache.GetBatchOptimized(keys []K) *BatchResult[K, V]
cache.SetMany(items []Item[K, V]) int
cache.DeleteMany(keys []K) int

// Iterators (cursor-based, Redis-style)
cache.Scan(cursor uint64, count int) *Iterator[K, V]
cache.ScanPrefix(prefix string, cursor uint64, count int) *Iterator[K, V]
cache.ScanMatch(pattern string, cursor uint64, count int) *Iterator[K, V]

// Iterator methods
iter.Next() bool
iter.Key() K
iter.Value() V
iter.All() []Item[K, V]
iter.Keys() []K
iter.ForEach(func(K, V) bool)
iter.Count() int
iter.Cursor() uint64
iter.Close()

// Metrics
cache.Metrics() MetricsSnapshot
// Fields: Hits, Misses, HitRatio, Sets, Deletes, Evictions,
//         Expirations, Rejections, CostAdded, CostEvicted, BufferDrops

Benchmarks

The repo includes a cross-library comparison suite for local in-process caches: go-mcache, ristretto, bigcache, freecache, go-cache, ttlcache, golang-lru/expirable, otter, and theine.

Run the same suite used for the numbers below:

env GOCACHE=/tmp/go-build-cache GOTOOLCHAIN=auto \
go test -run '^$' -bench '^BenchmarkCompare/' -benchmem -benchtime=1s -count=3 .

Current reference run:

  • machine: Apple M4 Pro
  • metric shown in tables: median ns/op across count=3 runs (lower is better)

The suite separates:

  • core throughput (ReadParallelHot, WriteParallelOverwrite, MixedParallel80_20, DeleteCycle)
  • TTL overhead (SetWithTTLParallel, ExpiredRead for precise TTL implementations)
  • bounded-cache pressure (MixedParallelZipf95_5, MissThenSetZipf)

These numbers are useful as a comparative signal for this repository, but they are still microbenchmarks on one machine. They should not be read as a universal "best cache for every workload" claim.

Core scenarios:

Scenariogo-mcacheristrettobigcachefreecachego-cachettlcachegolang-lru-expirableottertheine
Core/ReadParallelHot8.59814.64051.22052.180120.700408.300319.3004.9198.412
Core/WriteParallelOverwrite21.600238.00050.13052.840227.900365.100320.500393.600284.500
Core/MixedParallel80_2015.04075.30038.09051.16049.070401.000340.90085.12092.620
Core/DeleteCycle148.500241.70062.73039.67036.91078.37085.140129.500197.400

TTL and bounded scenarios:

Scenariogo-mcacheristrettobigcachefreecachego-cachettlcachegolang-lru-expirableottertheine
TTL/SetWithTTLParallel237.000507.30059.80053.430271.900261.900298.600401.600339.300
TTL/ExpiredRead7.03220.850158.100350.100100.9003.63511.010
Bounded/MixedParallelZipf95_537.43064.92086.130122.200320.700263.10021.79049.720
Bounded/MissThenSetZipf22.44026.60054.650120.700324.400262.1006.2589.766

means the scenario is not part of that library's comparison set in this suite (for example, ExpiredRead is only run for caches with precise TTL semantics).

Thread safety

All operations are safe for concurrent use. The concurrency model:

  • Reads: shard RLock + optional best-effort read buffering for policy replay
  • Writes: overwrite fast path updates entries in-place when possible; inserts go through admission/eviction
  • Expiration: timing-wheel scheduling on writes, lazy delete on background ticks
  • Metrics: atomic counters when enabled
  • Shards: cache-line padded to prevent false sharing between cores

Legacy API

The mcache.New() / CacheDriver API from v1 still works. It uses safeMap + GC-based expiration without TinyLFU. See mcache.go and gcmap/ for details. For new code, use the generic NewCache[K, V] API.

License

MIT