[erlang-questions] [ANN] jch-erl 0.1 - Fast minimal memory consistent hashing for Erlang/OTP.
Darach Ennis
<>
Fri Nov 7 22:08:07 CET 2014
Hey folk.
jch-erl 0.1 freshly baked.
NIF wrapper for Jump Consistent Hash algorithm by John Lamping and Eric
Veach developed at Google, Inc. Paper: "A Fast, Minimal Memory, Consistent
Hash Algorithm".
http://arxiv.org/ftp/arxiv/papers/1406/1406.2294.pdf
This implementation uses the xorshift64* pseudo-random number generator
rather than the linear congruential generator in the paper as it is
reasonably fast but, more importantly, memory efficient.
Usage
A single function jch:ch/2 is offered. Simply pass in a 64 bit long (or
less) integer key argument followed by the desired bucket or continuum
partition size. The function returns the partition allocated for the key.
Performance is very stable as bucket size increases and distribution across
the ring is stable (standard deviation for a reasonable sample size is
typically <2% relative to the mean).
Example
% ERL_LIBS=deps erl +sfwi 1 +scl false -pa ebin
Erlang R16B02 (erts-5.10.3) [source] [64-bit] [smp:8:8] [async-threads:10]
[hipe] [kernel-poll:false] [dtrace]
Eshell V5.10.3 (abort with ^G)
1> jch:ch(1,128).
29
2> jch:ch(13,128).
121
3> jch:ch(13,128).
121
4> jch:ch(trunc(math:pow(2,64))-1,128).
78
5> jch:ch(trunc(math:pow(2,64)),128). %% off by 1 mate
** exception error: bad argument
in function jch:ch/2
called as jch:ch(18446744073709551616,128)
6> %% off by 1 mate
6>
Access times are typically sub-microsecond. Performance tests of the erlang
NIF
and underlying C function are included.
Example performance for a space of 1 billion buckets:
- 1B Buckets. Hash performance.
ch1b N: 10 Min: -----1 Max: -----2 Median: -----1 Avg: -----1 Elapsed: 16
ch1b N: 100 Min: -----0 Max: -----1 Median: -----1 Avg: -----1 Elapsed: 102
ch1b N: 1000 Min: -----0 Max: -----9 Median: -----0 Avg: -----0 Elapsed:
1178
ch1b N: 10000 Min: -----0 Max: ----13 Median: -----0 Avg: -----0 Elapsed:
10146
ch1b N: 100000 Min: -----0 Max: ----59 Median: -----0 Avg: -----0 Elapsed:
109772
ch1b N: 1000000 Min: -----0 Max: ---886 Median: -----0 Avg: -----0 Elapsed:
1048561
ch1b N: 10000000 Min: -----0 Max: --3734 Median: -----0 Avg: -----0
Elapsed: 11757297
And a sample of how keys partition over the space:
- 32 Buckets. 10M hashes Uniform Distribution Check.
312292 312457 312660 312522 313805 311987 311431 312299
311975 312834 312233 312125 313391 312222 312170 312389
311811 313134 312986 312434 312002 313060 312784 312699
312114 312170 312385 312721 313163 311883 313060 312802
oOo| Min: 311431 Max: 313805 Median: 312389 Avg: 312500 Elapsed: 15748372
Worst: 99.2435 Med: 99.5488 Avg: 99.5841 RSD: 0.9139
The RSD here is the standard deviation relative to the mean (avg) which is
less than 1%
after 10M hash operations on random uniform test data. That's pretty good.
A pint (or fancy fizzy water, or cup of pant wetting tea...) to anyone who
can strip off an order of magnitude off the max outlier latencies and
explain through scheduler, gc or emulator tuning what insight I missed or
runtime vagary i've yet to master. Mainly written to toy with
micro-benchmarking and get practice with tuning the emulator.
Enjoy!
Cheers,
Darach.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20141107/e9daf1d2/attachment.html>
More information about the erlang-questions
mailing list