emhash

June 21, 2026 · View on GitHub

High-performance, memory-efficient C++ open addressing flat hash table

License: MIT C++ Standard Platform CI Version

emhash is a family of high-performance, header-only hash table implementations designed for maximum performance and memory efficiency. Through innovative collision resolution strategies and cache optimization techniques, emhash demonstrates exceptional performance across various benchmarks.


Table of Contents


Core Features

Extreme Performance

  • High load factor support: Set load factor up to 0.999 via EMH_HIGH_LOAD macro (available in hash_table5/7/8) or max_load_factor() (emhash6/7)
  • No tombstones: Performance does not degrade even with frequent insert/erase operations
  • Smart collision resolution: Three-way hybrid strategy combining linear probing, quadratic probing, and bidirectional search
  • Cache-friendly design: Single array inline storage minimizes memory footprint and maximizes cache hit rate

Memory Optimization

  • Compact layout: Significant memory savings when sizeof(key) % 8 != sizeof(value) % 8
    • Example: hash_map<uint64_t, uint32_t> saves 1/3 memory compared to hash_map<uint64_t, uint64_t>
  • Dynamic shrinking: shrink_to_fit() releases unused memory

Extended Features

FeatureDescription
insert_uniqueDirect insertion without lookup (performance boost)
try_getReturns pointer to value, nullptr if key not found
try_setSet value if key exists, do nothing if it doesn't (emhash5/8)
set_getUpdates value and returns old value (emhash5/8)
_eraseDelete operation returning void (faster)
LRU ModeEnable LRU cache optimization with EMH_LRU_SET

Platform Support

  • Operating Systems: Windows, Linux, macOS
  • Processors: x86_64, ARM64 (Apple M1/M2, AMD, Intel)
  • Compilers: GCC, Clang, MSVC (C++17/20)

Quick Start

1. Include Header

emhash is header-only — just copy one .hpp file to your project:

# Option A: Copy directly (simplest, no build system needed)
cp include/emhash/hash_table7.hpp /your/project/emhash/

# Option B: Clone and include
git clone https://github.com/ktprime/emhash.git
# Then add -I/path/to/emhash/include to your compiler flags
#include "emhash/hash_table7.hpp"  // Or emhash/hash_table[5-8].hpp

2. Basic Usage

#include "emhash/hash_table7.hpp"
#include <iostream>

int main() {
    // Create and reserve space
    emhash7::HashMap<int, std::string> map;
    map.reserve(100);

    // Insert elements (multiple ways)
    map[1] = "one";
    map.emplace(2, "two");
    map.insert_unique(3, "three");  // Best performance, key must be unique

    // Find element
    auto it = map.find(2);
    if (it != map.end()) {
        std::cout << "Found: " << it->second << "\n";
    }

    // try_get: returns pointer, more concise
    if (auto* pval = map.try_get(3)) {
        std::cout << "Value: " << *pval << "\n";
    }

    // Iterate
    for (const auto& [key, value] : map) {
        std::cout << key << " => " << value << "\n";
    }

    return 0;
}

3. Advanced Usage

// Custom key type
struct Point {
    int x, y;
    bool operator==(const Point& other) const {
        return x == other.x && y == other.y;
    }
};

struct PointHash {
    size_t operator()(const Point& p) const {
        return std::hash<int>()(p.x) ^ (std::hash<int>()(p.y) << 1);
    }
};

emhash7::HashMap<Point, std::string, PointHash> point_map;

// C++20 using Lambda
#if __cplusplus >= 202002L
auto hash = [](const Point& p) { return std::hash<int>()(p.x + p.y); };
auto eq = [](const Point& a, const Point& b) { return a.x == b.x && a.y == b.y; };
emhash7::HashMap<Point, int, decltype(hash), decltype(eq)> map(0, hash, eq);
#endif

More examples: docs/examples/


Version Selection Guide

VersionBest ForKey StrengthsWeaknesses
emhash5Integer keys, fast lookupSmall-size optimization (EMH_SMALL_SIZE), lowest probe countSlightly slower than emhash6 for high LF
emhash6Fast lookup/erase, integer keysFastest find/erase, linked-bucket with bitmaskMore memory for metadata
emhash7Insert-heavy, mixed workloadsNo tombstones, stable insert/erase, high load factorSlightly slower erase than emhash5/6
emhash8Iteration-heavy, large KV typesDense pairs array, near-zero iteration timeHigher memory for separate index array
emilib1/2/3Swiss-table style SIMD-accelerated lookupGroup-level SIMD probing, very fast findHigher memory overhead per bucket; emilib2ss may hang under extreme hash collision attack (all keys hashing to same bucket) — use emilib2o or emilib2s in such scenarios

See Performance Overview for detailed benchmark numbers.


Testing

The test suite is in tests/ and requires no third-party dependencies — only emhash/emilib headers.

Quick Start

cd tests

# Build and run quick validation tests (247,268 assertions, ~5 seconds)
cmake -B build && cmake --build build --config Release
./build/Release/test_verify.exe        # Windows
./build/test_verify                    # Linux/WSL

# Or use the custom targets
cmake --build build --target quick_test    # Quick validation
cmake --build build --target stress_test   # Stress tests
cmake --build build --target attack_test   # Hash attack tests
cmake --build build --target all_tests     # All tests

Test Categories

CategoryDirectoryDescription
Validationtests/verify/247K+ assertions, full API coverage, special key types
Stresstests/stress/35K trials, high load factor, bad hash scenarios
Attacktests/attack/Hash collision attacks (constant/small-range/linear)
Debugtests/debug/Debugging tools for chain corruption, probe bugs
Fuzztests/fuzz/LibFuzzer + ASan fuzzing (requires clang)

See tests/README.md for detailed instructions.


Documentation

DocumentDescription
Test SuiteTest organization, build instructions, coverage matrix
Test Analysis & Coverage ReportPer-file coverage data, test classification, CI integration
Performance OverviewBenchmark results, high load factor performance
API ReferenceConstructors, methods, iterators
Design PrinciplesCollision resolution, memory layout, implementation comparison
Usage NotesThread safety, reference stability, iteration, large values
Performance TipsCompile flags, pre-allocation, hash selection, anti-patterns
FAQFrequently asked questions
Migration GuideMigrating from std::unordered_map
Performance TrackingVersion-by-version benchmark history, regression policy
Architecture Decisions (ADR)Why we chose open addressing, header-only, emhash8 layout, etc.

Third-Party Benchmarks

emhash has been validated by multiple well-known third-party benchmarks:

Benchmark Code

Performance Charts

Historical performance charts are available in docs/images/:

  • int64_t*.png - Integer key benchmarks
  • string*.png - String key benchmarks
  • int_string.png - Mixed key/value type benchmarks

Interactive Performance Charts

Download all files in the bench/tsl_bench/ directory and open chartsAll.html in a browser to view interactive performance curves.


License

This project is open-sourced under the MIT License.

Copyright (c) 2019-2026 Huang Yuanbing & bailuzhou AT 163.com

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

Acknowledgments

Thanks to the following projects and authors for inspiration and comparison: