[erlang-questions] Managing a huge binary file

Ian hobson42@REDACTED
Tue Apr 30 13:41:10 CEST 2013


Hi Salvador,

On 30/04/2013 02:23, Salvador Tamarit wrote:
> hi all,
>
> I have a very interesting problem for us :)
>
> Suppose that we have a huge binary file (8GB or so) containing all the
> integers from 0 to a given (very big) number. The numbers are unsorted
> and some of them (70 numbers for instance) has been removed.
>
> The goal is to find the the Nth removed number (in order). For
> instance, if the file contains the numbers from 0 to 99, and then
> number 101, the 1st removed would be 100.
And the 100th number would be 101, so you can calculate that 1 number is 
missing. (101 - 100)

I would approach this with a binary chop. See 
http://en.wikipedia.org/wiki/Binary_search_algorithm

Lets say you want the Nth missing number.

Your key is in two parts - most significant is the number of missing 
numbers, next in significance is the number itself.

Binary search for (N,0) - and when your range shrinks to 1, it will 
point to  the smallest number with N or more missing

Lets call this Q. Now read the previous number (P).

The range of values from P+1  to Q-1 will include the Nth value. Simple 
calculation will determine which you want.

(The breaks may not always be single numbers).

Regards
Ian











>
> I have been trying to solve this problem with different solutions, but
> I always get stucked in reading the file. I'm reading it sequentally.
> I open the file with:
>
> file:open(FileName,[binary])
>
> then I read each number with:
>
> file:read(FDevice, 4)
>
> this works fine, but too slow.
>
> Anyway, supposing that I would be able to read it, then I don't know
> how to manage the data (I think that it is impossible to load all this
> data in memory).
>
> How would you solve a problem like this?
>
> Thanks,
> Salva
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
> .
>




More information about the erlang-questions mailing list