[erlang-questions] Using ETS for large amounts of data?

Jesper Louis Andersen jesper.louis.andersen@REDACTED
Mon Aug 9 20:44:25 CEST 2010


On Mon, Aug 9, 2010 at 8:08 PM, Anthony Molinaro
<anthonym@REDACTED> wrote:
> Hi,
>
>  I've got some data in a file of the form
>
> start_integer|end_integer|data_field1|data_field2|..|data_fieldN
>
> Where N is 12.  Most fields are actually smallish integers.
> The total number of entries in the file is 9964217 lines.
> and the total file size is 752265457.

Some math to get you started:

Assume the 12 fields are smallish integers. Integers are *at least* 8
bytes in size in our guessing game:

1> Lines = 9964217.
9964217
...
4> Lines * 14 * 8 / (1024 * 1024).
1064.293197631836

So a reasonable lower bound on your data is 1Gb. Now, that is assuming
we can pack the integers optimally.

A more precise bet on the bound can be had:

1> byte_size(term_to_binary({data, 1, 2, 1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12})).
38

So:

4> Lines * 38 / (1024 * 1024).
361.09947776794434

Or perhaps even:

5> byte_size(term_to_binary({data, 1000000, 2000000, 1000000, 2000000,
300, 4000, 5000, 600, 7000, 80000, 90000000, 10000000, 11000000,
12000000})).
80

Yielding:

6> Lines * 80 / (1024 * 1024).
760.2094268798828


One of the VM-people will definitely be able to shed more light on
what the implementation does.

The real killer is if your data is much larger than this and nothing
is done to compress stored terms in ETS.




-- 
J.


More information about the erlang-questions mailing list