[erlang-questions] Using ETS for large amounts of data?
Jesper Louis Andersen
jesper.louis.andersen@REDACTED
Mon Aug 9 20:44:25 CEST 2010
On Mon, Aug 9, 2010 at 8:08 PM, Anthony Molinaro
<anthonym@REDACTED> wrote:
> Hi,
>
> I've got some data in a file of the form
>
> start_integer|end_integer|data_field1|data_field2|..|data_fieldN
>
> Where N is 12. Most fields are actually smallish integers.
> The total number of entries in the file is 9964217 lines.
> and the total file size is 752265457.
Some math to get you started:
Assume the 12 fields are smallish integers. Integers are *at least* 8
bytes in size in our guessing game:
1> Lines = 9964217.
9964217
...
4> Lines * 14 * 8 / (1024 * 1024).
1064.293197631836
So a reasonable lower bound on your data is 1Gb. Now, that is assuming
we can pack the integers optimally.
A more precise bet on the bound can be had:
1> byte_size(term_to_binary({data, 1, 2, 1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12})).
38
So:
4> Lines * 38 / (1024 * 1024).
361.09947776794434
Or perhaps even:
5> byte_size(term_to_binary({data, 1000000, 2000000, 1000000, 2000000,
300, 4000, 5000, 600, 7000, 80000, 90000000, 10000000, 11000000,
12000000})).
80
Yielding:
6> Lines * 80 / (1024 * 1024).
760.2094268798828
One of the VM-people will definitely be able to shed more light on
what the implementation does.
The real killer is if your data is much larger than this and nothing
is done to compress stored terms in ETS.
--
J.
More information about the erlang-questions
mailing list