[erlang-questions] Needed: Great big ordered int set.

Knut Nesheim knutin@REDACTED
Tue Jul 9 08:48:47 CEST 2013


Hi Alex,

I have had similar challenges in the past. I needed to efficiently store 250M 64 bit ints with a byte value and do 100k reads per second. Writes for me was less than 100 per second.

I came up with https://github.com/knutin/bisect It uses a big binary, keeps the data sorted and does a binary search. On a decent machine it can do 8M random reads per second (reads can also be parallelized). Inserts get slow when the structure grows big (lots of memory to copy around). Using multiple of these structures might be an option to keep the write perfomance high.

Regards
Knut



On 08.07.2013, at 19:11, Alex Arnon <alex.arnon@REDACTED> wrote:

> Hi All,
> 
> I need to implement a very large set of data, with the following requirements:
> - It will be populated EXCLUSIVELY by 64-bit integers.
> - The only operations will be: 
>   - add element,
>   - get number of elements, and
>   - fold/foreach over the SORTED dataset.
> - The invocation order will be strictly:
>   - create data structure,
>   - add elements sequentially,
>   - run one or more iteration operations,
>   - discard data structure.
> - The size of the dataset MUST scale to 500M elements, preferably billions should be possible too.
> - The data does not have to reside in memory - however, 32 to 64 GB of RAM may be allocated. (of course, these will be used by the OS buffer cache in case a file-based solution is chosen).
> 
> In summary: Performance is not a must, but volume and the ability to iterate over the ordered values is.
> 
> Thanks in advance!!!
> 
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions



More information about the erlang-questions mailing list