# [erlang-questions] Managing a huge binary file

Tue Apr 30 13:40:06 CEST 2013

```2013/4/30 Richard A. O'Keefe <>:
>
> On 30/04/2013, at 1:23 PM, Salvador Tamarit wrote:
>
>> Suppose that we have a huge binary file (8GB or so) containing all the
>> integers from 0 to a given (very big) number. The numbers are unsorted
>> and some of them (70 numbers for instance) has been removed.
>
> This sounds like a well known problem with an answer in
>

That's great. I was suspicious that it existed, but I didn't know how
to find it :P Thank you very much Richard.

> with the twist that you are asked to find a specific missing number
> (in value order) rather than all of them.
>
> When you talk about reading the numbers 4 bytes at a time, that tells
> me the numbers are not >that< "huge".
>
> One very simple step would be to set up
>     number-present : map (0..6535) -> nat := (all -> 0)_
>     for each number in the input
>         number-present(high 16 bits of that number) +:= 1
> Now you can tell which block of numbers must contain the
> number you are looking for, and you can make a second pass
> through the data counting numbers in that block, and then
> it's a simple scan through the block.

It would solve the problem of being able to load the data into memory,
but I still having problems reading the whole file. It takes too much
time. I know my function is very naive, however I'm not sure how could
I improve it. Only reading integers from file, without further
processing, takes a lot of time (minutes) in a fast laptop.

eof ->
ok
end.

>
> So if the numbers are 0..N-1, the problem can be solved in
> O(N**1/2) space in two passes, or in O(N**1/3) space in
> three passes, &c.
>
> This is _not_ mathematically optimal, but it's quite easy to
> program, and if you really had this problem, I'd rather have
> the data read twice by a simple program I trusted than once
> by a clever program that was almost certainly wrong.
>
>
> _______________________________________________
> erlang-questions mailing list
>
> http://erlang.org/mailman/listinfo/erlang-questions
```