funny_hash

December 18, 2017 ยท View on GitHub

Simple multiplicative hash function like Murmur3 but a bit safer.

It is safer cause it uses state twice larger than block size, feeds block twice at time and every bit affects at least 10/20 bits of a state. So it is impossible to revert change without introducing other change, and I think it is hardly possible to make seed independent collisions (which is the main defect of Murmur2/Murmur3).

So it is safe for use in a internal hash table implementations when nor seed nor hashsum is known to attacker.

(but it is certainly not cryptographic: I think attack with choosen plaintext and known hashsum will succeed with little effort)

There is two boxes of building blocks: 32bit hash for 32bit cpu and 64bit hash for 64bit cpu.

/* mix 32bit integer into 64bit state */
static inline void fh32_permute(uint32_t v, uint32_t *a, uint32_t *b);
/* hash string, seed with and produce 64bit state */
static inline void fh32_permute_string(const uint8_t *v, size_t len, uint32_t *a, uint32_t *b);
/* state finalization to 32bit value */
static inline uint32_t fh32_finalize(uint32_t a, uint32_t b);
/* convenient function to hash buffer with 32bit seed */
static inline uint32_t fh32_string_hash(const void *buf, size_t len, uint32_t seed);
/* convenient function to hash buffer with 64bit seed */
static inline uint32_t fh32_string_hash2(const void *buf, size_t len, uint32_t seed1, uint32_t seed2);

/* mix 64bit integer into 128bit state */
static inline void fh64_permute(uint64_t v, uint64_t *a, uint64_t *b);
/* hash string, seed with and produce 128bit state */
static inline void fh64_permute_string(const uint8_t *v, size_t len, uint64_t *a, uint64_t *b);
/* state finalization to 64bit value */
static inline uint64_t  fh64_finalize(uint64_t a, uint64_t b);
/* convenient function to hash buffer with 64bit seed */
static inline uint64_t  fh64_string_hash(const void *buf, size_t len, uint64_t seed);
/* convenient function to hash buffer with 128bit seed */
static inline uint64_t  fh64_string_hash2(const void *buf, size_t len, uint64_t seed1, uint64_t seed2);

Benchmark

Some benchmark results on Intel CPU 300M random blob (time relative to murmur3a, less is better):

x86_64

By 1-14byte substrings twice

functiongcc -O2clang -O2
funny320.900.87
funny640.770.75
fnv1a0.820.81
murmur321.001.00
murmur1281.351.24
sip241.961.62
sip131.671.37
sap241.961.54
sap131.601.25
lookup30.950.85
spooky1.030.88

10 times 300M at once

functiongcc -O2clang -O2
funny320.961.08
funny640.490.54
fnv1a3.073.12
murmur321.001.00
murmur1280.420.46
sip241.151.34
sip130.660.72
sap242.342.66
sap131.301.44
lookup31.050.99
spooky0.650.64

x86 (by -m32)

By 1-14byte substrings twice

functiongcc -O2clang -O2
funny320.990.90
funny641.401.26
fnv1a0.800.89
murmur321.001.00
murmur1281.982.31
sip243.873.24
sip132.882.29
sap241.861.79
sap131.611.46
lookup31.050.97
spooky2.261.59

10 times 300M at once

functiongcc -O2clang -O2
funny321.020.99
funny641.801.11
fnv1a3.193.17
murmur321.001.00
murmur1281.801.75
sip244.994.58
sip132.671.93
sap242.372.73
sap131.391.40
lookup31.091.07
spooky2.401.87