[erlang-questions] Using ETS for large amounts of data?

Paul Mineiro paul-trapexit@REDACTED
Mon Aug 9 23:22:15 CEST 2010


On Mon, 9 Aug 2010, Anthony Molinaro wrote:

> Hi,
>
>   I've got some data in a file of the form
>
> start_integer|end_integer|data_field1|data_field2|..|data_fieldN
>
> Where N is 12.  Most fields are actually smallish integers.
> The total number of entries in the file is 9964217 lines.
> and the total file size is 752265457.
>
> I want to load these into an erlang process and essentially do lookups
> of the form
>
>   given an integer M
>
>   if M >= start_integer && M <= end_integer then
>     return data
...
> So is ets up to this task at all? and if not any suggestions?

Here's my initial thoughts:

1) use an interval tree to encode (start, end) -> index
   1a) http://en.wikipedia.org/wiki/Interval_tree
   1b) if your intervals do not overlap you can use ets ordered set for
this part, but use the beginning of the interval for the key, and then use
ets:prev/2 to find the entry
2) put the rest of the data fields in a huge flat binary and look them up
by reading the index from the interval tree ... or hey just seek around a
disk file, and let the OS cache keep the hot spots warm.

-- p


More information about the erlang-questions mailing list