[erlang-questions] Needed: Great big ordered int set.

Alex Arnon alex.arnon@REDACTED
Mon Jul 8 20:14:26 CEST 2013


I'll probably end up doing just that, but was hoping I could resolve the thing in-process.


On 8 Jul 2013, at 20:36, Sergej Jurecko <sergej.jurecko@REDACTED> wrote:

> Why not just use mongodb, mysql or postgresql?
> 
> 
> Sergej
> 
> On Jul 8, 2013, at 7:29 PM, Alex Arnon wrote:
> 
>> - A single set of integers - 500M of them.
>> - This is a throwaway piece of data - once I've added all the values and iterated over them a couple of times, it is of no further use.
>> - Mutation (addition of an integer) speed is not very critical, however due to the size of the dataset, it should be "reasonable" - i.e. less than a millisecond per insertion on average.
>> 
>> 
>> On Mon, Jul 8, 2013 at 8:22 PM, Sergej Jurecko <sergej.jurecko@REDACTED> wrote:
>>> The data structure is a sorted list of integers. That 500M dataset number, is that over a single list of integers, or is that the sum of all lists of integers?
>>> What are the reliability requirements? Do you need redundancy and/or backups? It is a very different problem if a single server solution is enough, or if it requires a network of computers.
>>> 
>>> 
>>> Sergej
>>> 
>>> On Jul 8, 2013, at 7:11 PM, Alex Arnon wrote:
>>> 
>>> > Hi All,
>>> >
>>> > I need to implement a very large set of data, with the following requirements:
>>> > - It will be populated EXCLUSIVELY by 64-bit integers.
>>> > - The only operations will be:
>>> >   - add element,
>>> >   - get number of elements, and
>>> >   - fold/foreach over the SORTED dataset.
>>> > - The invocation order will be strictly:
>>> >   - create data structure,
>>> >   - add elements sequentially,
>>> >   - run one or more iteration operations,
>>> >   - discard data structure.
>>> > - The size of the dataset MUST scale to 500M elements, preferably billions should be possible too.
>>> > - The data does not have to reside in memory - however, 32 to 64 GB of RAM may be allocated. (of course, these will be used by the OS buffer cache in case a file-based solution is chosen).
>>> >
>>> > In summary: Performance is not a must, but volume and the ability to iterate over the ordered values is.
>>> >
>>> > Thanks in advance!!!
>>> >
>>> > _______________________________________________
>>> > erlang-questions mailing list
>>> > erlang-questions@REDACTED
>>> > http://erlang.org/mailman/listinfo/erlang-questions
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130708/3d0aa999/attachment.htm>


More information about the erlang-questions mailing list