[erlang-questions] Managing a huge binary file

Wed May 1 05:20:24 CEST 2013

Some numbers:
  I wrote a C program to generate the numbers 0..2**31-1 with 256 of them
  missing, in a more-or-less random order.

  m% ctime a.out > gen8g.dat
  89.760 user + 25.320 system = 115.080 total in 228.005 real seconds.

This was on a 2.66 GHz desktop Mac with 8GB of 1.333 GHz DDR3 memory
and an NFS file system.  It was generating lots of random numbers and
the RNG was not particularly fast.  You'll notice that the CPU time
was about half the real time: the difference is waiting for the file
system.

  m% ctime wc gen8g.dat 
  41943037 268107752 8589933636 gen8g.dat
  79.400 user + 11.530 system = 90.930 total in 283.981 real seconds.

Here you see that the CPU time was less than a third of the real time.
Again, the difference is waiting for the file system.

I wrote a little Erlang module.  It counted

  8589933636 bytes

taking

  4261.280 user + 26.740 system = 4288.020 total in 4566.779 real seconds.

One problem here is that this particular Erlang build doesn't support
native compilation.

By the way, there is one special quirk of the problem as given to us
which makes (or might make) the "arithmetic" schemes useless:  we are
*not* told that there are no duplicated numbers.  The trick of
determining which 1 number is missing from a collection of numbers
relies on there being no duplicates.