Using ETS for large amounts of data?

Mon Aug 9 20:08:08 CEST 2010

Hi,

  I've got some data in a file of the form

start_integer|end_integer|data_field1|data_field2|..|data_fieldN

Where N is 12.  Most fields are actually smallish integers.
The total number of entries in the file is 9964217 lines.
and the total file size is 752265457.

I want to load these into an erlang process and essentially do lookups
of the form

  given an integer M

  if M >= start_integer && M <= end_integer then
    return data

I was planning on using ets with entries of the form

#data {
        key =  #range { begin = integer(), end   = integer() },
        data_field1 = Data1,
        ...
        data_fieldN = DataN
      }

Then using a match_spec to get the appropriate entry.

However, I'm running into problems.

I can't seem to actually load up that many entries into an ets table.
First I was trying an ordered_set because it seemed like that might
be faster, however the load seems to be okay for about the first 2
million entries, then gets very slow and eventually crashes.  So I
tried just set, but that seems to not even make it that far before it
chews up a lot of memory and then causes my machine to swap.

So is ets up to this task at all? and if not any suggestions?

Thanks,

-Anthony

-- 
------------------------------------------------------------------------
Anthony Molinaro                           <anthonym@REDACTED>