[erlang-questions] Needed: Great big ordered int set.

Knut Nesheim knutin@REDACTED
Tue Jul 9 12:37:45 CEST 2013





On 09.07.2013, at 12:02, Richard Carlsson <carlsson.richard@REDACTED> wrote:

> On 2013-07-09 11:32 , Jesper Louis Andersen wrote:
>> Unless I am totally wrong here about what you want to get, why don't you
>> use a digital representation? A 64 bit integer is a stream of bits. You
>> can then share the common parts of the bit pattern in a tree-like
>> structure. This in turn means that compression is "automatic" since the
>> common bit patterns acts like the compression pattern. Add something
>> like "everything below this node is present" and you can compress common
>> parts of the tree.
>> 
>> In general, the solution is RADIX-trees, or something along that
>> pattern. They have, incidentally, also been used for quick
>> Integer-key-based maps and so on.
> 
> Assuming a 64-bit Erlang VM, there's not much point in using a radix tree, since the smallest unit of representation will use at least 64 bits anyway

Yes. Being a bit naive, I implemented such a tree structure in Erlang. The memory overhead is way too big and the performance quite low. I even tried the approach of using a binary for storage, available at https://github.com/knutin/btrie

Judy arrays and especially the C implementation from HP has all the tricks you can think of. As Richard points out, if the word size is 64 bits, just the overhead of the pointers in such a structure will trump the size of the data.

Regards
Knut


More information about the erlang-questions mailing list