Usage Notes

June 19, 2026 · View on GitHub

0. Thread Safety

emhash is not thread-safe. Concurrent access from multiple threads requires external synchronization:

// ❌ Wrong: concurrent access without synchronization
// Thread 1: map[key] = val;
// Thread 2: map.find(key);

// ✅ Correct: use mutex for concurrent access
std::mutex mtx;
{
    std::lock_guard<std::mutex> lock(mtx);
    map[key] = val;
}

// ✅ Correct: concurrent read-only access is safe
// Multiple threads can call find()/contains()/count() simultaneously
// as long as no thread is modifying the map

1. Non-Node-Based Hash Table

emhash does not guarantee reference stability during insert/erase/rehash operations.

// ❌ Wrong: reference may become invalid after rehash
auto& ref = myhash[1];
myhash[2] = ref;  // Danger! May trigger rehash

// ✅ Correct: copy value first
auto val = myhash[1];
myhash.reserve(10000);  // Pre-allocate
myhash[2] = val;

2. Erasing During Iteration

Deleting elements during iteration may cause some keys to be visited twice or skipped.

// ❌ Wrong: direct deletion without updating iterator
for (const auto& [key, value] : myhash) {
    if (condition) myhash.erase(key);  // UB: modifying container during range-for
}
// ✅ Correct: use iterator erase and update
for (auto it = myhash.begin(); it != myhash.end(); ) {
    if (condition) {
        it = myhash.erase(it);  // Returns next valid iterator
    } else {
        ++it;
    }
}
// ✅ Correct: use erase_if for conditional removal (C++17)
auto removed = myhash.erase_if([](const auto& pair) {
    return pair.second < 0;
});

3. Large Value Type Optimization

For very large value types (e.g., > 100 bytes), consider using pointer storage:

// High memory usage
emhash7::HashMap<Key, LargeValue> map1;

// Optimized memory usage
emhash7::HashMap<Key, std::unique_ptr<LargeValue>> map2;

4. Load Factor and Rehashing

The default max load factor is 0.80. When the load factor is exceeded, the table automatically rehashes to a larger size.

emhash7::HashMap<int, int> map;
map.max_load_factor(0.95f);  // Allow higher density (fewer rehashes, slower lookups)

// Pre-allocate to avoid rehashing during critical paths
map.reserve(expected_size);

emhash7 supports load factors up to 0.999 without performance collapse. Other versions support high load factors with the EMH_HIGH_LOAD macro.

5. Choosing the Right Version

ScenarioRecommended VersionReason
General purpose, integer keysemhash7Best all-around performance
Complex/large keys or valuesemhash8Dense iteration, no per-bucket overhead
Read-heavy, SIMD-friendly keysemilib::HashMapH2 tag filtering reduces key comparisons
Need high load factor (>0.9)emhash7Native support up to 0.999
Frequent insert/erase cyclesemhash7No tombstones, no degradation

6. Custom Hash Function

For user-defined key types, provide a hash function and equality comparator:

struct Point {
    int x, y;
    bool operator==(const Point& o) const { return x == o.x && y == o.y; }
};

struct PointHash {
    size_t operator()(const Point& p) const {
        return std::hash<int>()(p.x) ^ (std::hash<int>()(p.y) << 1);
    }
};

emhash7::HashMap<Point, int, PointHash> point_map;

7. Exception Safety

emhash operations themselves do not throw exceptions. The exception safety level depends on the operation and the underlying key/value types. Below is the guarantee per operation category:

Insert / Emplace (Basic Guarantee)

If the key or value type's constructor, copy, or move operation throws during insertion, the table remains in a valid, consistent state. The element will simply not be inserted. No memory is leaked and all internal invariants are preserved.

// If ValueT's constructor throws, the map is still valid
// but the element was not inserted
try {
    map.emplace(key, value_that_may_throw);
} catch (...) {
    // map is still valid, element was not added
    assert(map.size() == original_size);
}

operator[] / insert_or_assign (Basic Guarantee)

Same as insert — if construction of the default value or assignment throws, the table state remains valid.

Emplace (Basic Guarantee)

If the forwarded constructor arguments throw during in-place construction, the table is unchanged.

Erase / clear (Noexcept / Strong Guarantee)

Erase and clear operations do not allocate memory and never throw (assuming key/value destructors do not throw, which is the C++ convention).

// Erase is noexcept in practice
map.erase(key);  // never throws
map.clear();     // never throws

reserve / rehash (Strong Guarantee or Basic Guarantee)

If the allocator's allocation or the element move/copy constructor throws during rehash:

  • If std::is_nothrow_move_constructible_v<value_type> is true (e.g., keys and values are primitive types or smart pointers): rehashing is safe and provides a strong guarantee — the table is rolled back to its original state on failure.
  • Otherwise: a basic guarantee is provided — the table remains valid but some elements may have been moved.
// Prefer nothrow-movable types for best rehash safety
static_assert(std::is_nothrow_move_constructible_v<std::pair<KeyT, ValueT>>,
              "rehash may be slower and less safe");

Iterator Operations (Noexcept)

All iterator operations (++, --, *, ->, comparisons) are noexcept.

Summary

OperationGuaranteeNotes
insert / emplace / operator[]BasicUnsuccessful insert leaves table unchanged
insert_unique / emplace_uniqueBasicSame as insert
erase / clearNoexceptDestructors assumed not to throw
find / contains / countNoexceptRead-only, no allocation
reserve / rehashStrong**If value_type is nothrow-move-constructible; otherwise Basic
swapNoexceptSwaps internal pointers
Iterator operationsNoexcept++, --, *, ->, comparisons

8. Iterator and Reference Invalidation

emhash is an open-addressing flat hash table. Because key-value pairs are stored inline in a contiguous array, many operations can move elements and invalidate iterators and references.

Invalidation Rules by Operation

OperationIterators InvalidatedReferences Invalidated
insert / emplace / operator[]All if rehash triggered; none otherwiseAll if rehash triggered; none otherwise
erase(it)Only it and any iterator pointing to the same elementOnly the erased element's reference
erase(key)Only the erased element's iteratorOnly the erased element's reference
erase_ifOnly iterators to erased elementsOnly references to erased elements
clearAllAll
reserve(n)All if rehash triggeredAll if rehash triggered
rehash(n)AllAll
shrink_to_fitAll (always rehashes)All (always rehashes)
swapSwapped (iterators refer to the other container after swap)Swapped (references refer to the other container after swap)
find / contains / countNone (read-only)None (read-only)
begin / endN/AN/A

Key Consequences

  1. Do not hold references across rehash operations. If an insert might trigger a rehash (e.g., you haven't reserved enough space), copy values rather than holding references.
// ❌ Wrong: reference may become invalid after insert triggers rehash
auto& ref = myhash[1];
myhash[2] = ref;  // Danger: myhash may rehash on insert

// ✅ Correct: copy value first, ensure capacity
auto val = myhash[1];
myhash.reserve(myhash.size() + 1);
myhash[2] = val;
  1. Erase during iteration — always update the iterator.
// ✅ Correct: use erase returning next iterator
for (auto it = myhash.begin(); it != myhash.end(); ) {
    if (condition(it)) {
        it = myhash.erase(it);
    } else {
        ++it;
    }
}
  1. Pre-reserve to avoid unexpected invalidation in hot paths.
// Pre-allocate enough buckets before a critical section
map.reserve(expected_max_size);

// Now inserts will not invalidate iterators/references
// (unless size exceeds expected_max_size)

Comparison with std::unordered_map

Propertyemhash (open addressing)std::unordered_map (node-based)
Reference stability after insert❌ (if rehash)✅ (always)
Iterator stability after insert❌ (if rehash)✅ (always)
Reference stability after erase
Iterator stability after erase✅ (erase returns next)✅ (only erased invalidated)
Memory locality✅ Excellent❌ Poor (linked list)

Rule of thumb: If you need stable references/iterators across insertions, use std::unordered_map. If you need performance and can manage invalidation, use emhash.

9. Known Issues

emilib2ss may hang under extreme hash collisions

emilib2ss (a variant of emilib2::HashMap) does not limit probe-sequence length when all keys hash to the same bucket. Under this crafted hash-collision attack, insertion/lookup can degenerate into a full table scan and appear to hang.

Affected scenario:

  • An adversary supplies keys that all produce the same hash value, or
  • A pathological custom hash function maps many distinct keys to the same bucket.

Mitigation:

  • Prefer emilib2o or emilib2s for untrusted input.
  • For adversarial workloads, use emhash7 or enable EMH_SAFE_PSL on the emilib variants that support it.
  • Do not expose emilib2ss to unvalidated user-provided keys in security-sensitive contexts.