<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">elevedb <div><a href="https://github.com/basho/eleveldb">https://github.com/basho/eleveldb</a></div><div><br></div><div>hanoidb</div><div><a href="https://github.com/krestenkrab/hanoidb">https://github.com/krestenkrab/hanoidb</a></div><div><br></div><div><br></div><div>Sergej</div><div><br><div><div>On Jul 8, 2013, at 8:14 PM, Alex Arnon wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><meta http-equiv="content-type" content="text/html; charset=utf-8"><div dir="auto"><div><div style="text-align: left;direction: ltr; "><span style="-webkit-text-size-adjust: auto;">I'll probably end up doing just that, but was hoping I could resolve the thing in-process.</span></div><br></div><div style="-webkit-text-size-adjust: auto; "><br>On 8 Jul 2013, at 20:36, Sergej Jurecko <<a href="mailto:sergej.jurecko@gmail.com">sergej.jurecko@gmail.com</a>> wrote:<br><br></div><blockquote type="cite" style="-webkit-text-size-adjust: auto; "><div>Why not just use mongodb, mysql or postgresql?<div><br></div><div><br></div><div>Sergej<br><div><br><div><div>On Jul 8, 2013, at 7:29 PM, Alex Arnon wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div dir="ltr">- A single set of integers - 500M of them.<div>- This is a throwaway piece of data - once I've added all the values and iterated over them a couple of times, it is of no further use.<br><div style="">- Mutation (addition of an integer) speed is not very critical, however due to the size of the dataset, it should be "reasonable" - i.e. less than a millisecond per insertion on average.<br>

</div></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, Jul 8, 2013 at 8:22 PM, Sergej Jurecko <span dir="ltr"><<a href="mailto:sergej.jurecko@gmail.com" target="_blank">sergej.jurecko@gmail.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">The data structure is a sorted list of integers. That 500M dataset number, is that over a single list of integers, or is that the sum of all lists of integers?<br>


What are the reliability requirements? Do you need redundancy and/or backups? It is a very different problem if a single server solution is enough, or if it requires a network of computers.<br>

<br>

<br>

Sergej<br>

<div><div class="h5"><br>

On Jul 8, 2013, at 7:11 PM, Alex Arnon wrote:<br>

<br>

> Hi All,<br>

><br>

> I need to implement a very large set of data, with the following requirements:<br>

> - It will be populated EXCLUSIVELY by 64-bit integers.<br>

> - The only operations will be:<br>

>   - add element,<br>

>   - get number of elements, and<br>

>   - fold/foreach over the SORTED dataset.<br>

> - The invocation order will be strictly:<br>

>   - create data structure,<br>

>   - add elements sequentially,<br>

>   - run one or more iteration operations,<br>

>   - discard data structure.<br>

> - The size of the dataset MUST scale to 500M elements, preferably billions should be possible too.<br>

> - The data does not have to reside in memory - however, 32 to 64 GB of RAM may be allocated. (of course, these will be used by the OS buffer cache in case a file-based solution is chosen).<br>

><br>

> In summary: Performance is not a must, but volume and the ability to iterate over the ordered values is.<br>

><br>

> Thanks in advance!!!<br>

><br>

</div></div>> _______________________________________________<br>

> erlang-questions mailing list<br>

> <a href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a><br>

> <a href="http://erlang.org/mailman/listinfo/erlang-questions" target="_blank">http://erlang.org/mailman/listinfo/erlang-questions</a><br>

<br>

</blockquote></div><br></div>

</blockquote></div><br></div></div></div></blockquote></div></blockquote></div><br></div></body></html>