[erlang-questions] very large key lookup
ke han
ke.han@REDACTED
Tue Sep 26 03:57:54 CEST 2006
On Sep 26, 2006, at 9:29 AM, Bob Ippolito wrote:
> On 9/25/06, Richard A. O'Keefe <ok@REDACTED> wrote:
>> ke han <ke.han@REDACTED> wrote:
>> We will have hundreds of millions of these 32 byte keys in
>> the the
>> server's "DB" and each key's associated 128 byte key.
>> I am looking for advice on the format for these keys along
>> with which
>> libraries to use for balance between memory and lookup
>> performance.
>> The current format is 0-9 + a-z character strings.
>>
>> I note that there are 36 characters in the 0-9 + a-z character set.
>> This means that one can pack 3 of these characters into 2 bytes.
>> So 32 of these characters can be packed easily in 22 bytes, not 32.
>> To put it another way, a key space of /[0-9a-z]{32}/ is equivalent
>> to a key space of about 166 bits.
>>
>> The internal form of these keys could well be Erlang binaries;
>> there doesn't seem to be any reason why the external form should
>> change.
>>
>> I do wonder, though, why
>> 63,340,286,662,973,277,706,162,286,946,811,886,609,896,461,828,096
>> primary keys are needed; enough to give every human being now alive
>> 10**40 different keys. Even if we consider every human being who has
>> ever lived, and give each of them 3 million computers, that's still
>> 10**32 each. This *looks* like far more extra bits than are needed
>> for reliability (error detecting codes) and security. What am I
>> missing?
>
> 1> math:log(math:pow(36, 32)) / math:log(2).
> 165.438
thanks for the erlang math tutorial !! I've yet to dig into the math
libraries.
>
> Sounds like a cryptographic hash function. SHA-1 fits in 160 bits. MD5
> plus a 32-bit timestamp fits in 160 bits.
>
> I'm guessing the key space is going to be pretty sparse rather than
> serial. Probably easier to insert data into a table like that over a
> cluster since you can pretty much bet against collisions ever
> happening and you don't have to encode a node id into the key.. which
> lets the data can move around pretty freely as the cluster grows.
yes,. these keys are not for your typical sequential numbering of
records. They are generated by a new algorithm which I can't say
much about at the moment. There is only one piece of data associated
with each 32 byte key which is a 128 byte value which is used to
filter out false positives. My need is for very scalable key lookup;
I don't need atomic transactions for insert, update, etc...
thanks, ke han
>
> -bob
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions
More information about the erlang-questions
mailing list