[erlang-questions] Managing a huge binary file

Salvador Tamarit stamarit@REDACTED
Tue Apr 30 14:06:21 CEST 2013


2013/4/30 Jesper Louis Andersen <jesper.louis.andersen@REDACTED>:
>
> On Apr 30, 2013, at 3:23 AM, Salvador Tamarit <stamarit@REDACTED> wrote:
>
>> Suppose that we have a huge binary file (8GB or so) containing all the
>> integers from 0 to a given (very big) number. The numbers are unsorted
>> and some of them (70 numbers for instance) has been removed.
>>
>> The goal is to find the the Nth removed number (in order). For
>> instance, if the file contains the numbers from 0 to 99, and then
>> number 101, the 1st removed would be 100.
>
> Quick shot at a possible solution: Variant of a Fenwick-tree.
>
> Essentially, the idea is to count bit-patterns in the numbers and let each bit correspond to a slot in an array. Now you can "count" how many times a given pattern is there. With a bit of mangling, it may be possible to walk over all the data building up the counts in the array and then finally scrutinize it for 0'es and then work backwards what missing pattern you have.
>
> Another variant: Radix tree or trie structure on bit-patterns since this greatly compresses the in-memory representation. Hash Array Mapped Tries comes to mind here. You can easily walk it in order to determine "holes" in the structure afterwards.
>

Both solutions seems very interesting. Thanks Jesper and Ian for your
proposals. However, I can't load the file (in a reasonable time)
because it's too big, so your solutions will be useful for my second
step :)

> Jesper Louis Andersen
>   Erlang Solutions Ltd., Copenhagen
>
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions



More information about the erlang-questions mailing list