[erlang-questions] Needed: Great big ordered int set.
Tue Jul 9 08:48:47 CEST 2013
I have had similar challenges in the past. I needed to efficiently store 250M 64 bit ints with a byte value and do 100k reads per second. Writes for me was less than 100 per second.
I came up with https://github.com/knutin/bisect It uses a big binary, keeps the data sorted and does a binary search. On a decent machine it can do 8M random reads per second (reads can also be parallelized). Inserts get slow when the structure grows big (lots of memory to copy around). Using multiple of these structures might be an option to keep the write perfomance high.
On 08.07.2013, at 19:11, Alex Arnon <alex.arnon@REDACTED> wrote:
> Hi All,
> I need to implement a very large set of data, with the following requirements:
> - It will be populated EXCLUSIVELY by 64-bit integers.
> - The only operations will be:
> - add element,
> - get number of elements, and
> - fold/foreach over the SORTED dataset.
> - The invocation order will be strictly:
> - create data structure,
> - add elements sequentially,
> - run one or more iteration operations,
> - discard data structure.
> - The size of the dataset MUST scale to 500M elements, preferably billions should be possible too.
> - The data does not have to reside in memory - however, 32 to 64 GB of RAM may be allocated. (of course, these will be used by the OS buffer cache in case a file-based solution is chosen).
> In summary: Performance is not a must, but volume and the ability to iterate over the ordered values is.
> Thanks in advance!!!
> erlang-questions mailing list
More information about the erlang-questions