[erlang-questions] Managing a huge binary file
Tue Apr 30 13:41:10 CEST 2013
On 30/04/2013 02:23, Salvador Tamarit wrote:
> hi all,
> I have a very interesting problem for us :)
> Suppose that we have a huge binary file (8GB or so) containing all the
> integers from 0 to a given (very big) number. The numbers are unsorted
> and some of them (70 numbers for instance) has been removed.
> The goal is to find the the Nth removed number (in order). For
> instance, if the file contains the numbers from 0 to 99, and then
> number 101, the 1st removed would be 100.
And the 100th number would be 101, so you can calculate that 1 number is
missing. (101 - 100)
I would approach this with a binary chop. See
Lets say you want the Nth missing number.
Your key is in two parts - most significant is the number of missing
numbers, next in significance is the number itself.
Binary search for (N,0) - and when your range shrinks to 1, it will
point to the smallest number with N or more missing
Lets call this Q. Now read the previous number (P).
The range of values from P+1 to Q-1 will include the Nth value. Simple
calculation will determine which you want.
(The breaks may not always be single numbers).
> I have been trying to solve this problem with different solutions, but
> I always get stucked in reading the file. I'm reading it sequentally.
> I open the file with:
> then I read each number with:
> file:read(FDevice, 4)
> this works fine, but too slow.
> Anyway, supposing that I would be able to read it, then I don't know
> how to manage the data (I think that it is impossible to load all this
> data in memory).
> How would you solve a problem like this?
> erlang-questions mailing list
More information about the erlang-questions