[erlang-questions] Using ETS for large amounts of data?

Ryan Zezeski rzezeski@REDACTED
Tue Sep 7 05:31:26 CEST 2010

On Mon, Aug 30, 2010 at 5:56 AM, Hynek Vychodil <hynek@REDACTED> wrote:

> I have very similar experience.
> So keeping it in one big binary and store only pointers will save you
> 300 - 400 MB of data depending of length of item (16-26B). Anyway
> using better tuned k/v storage would be better.
Did you check erlang:memory/0 by any chance?  If I run your example in the
shell directly vs. it it's own process I get drastically different results,
memory wise.  It seems when running your example in the shell that ERTS
holds onto all the binaries.

36> ets:info(T).
37> erlang:memory().

Note that I'm running a 64-bit VM.  Here is the same information when
running in a separate process spawned in the shell.

21> Pid ! {from,get}.
Info: [{memory,141554577},
22> erlang:memory().

Notice that the binary memory varies greatly between the two methods.

I recently ran a bunch of tests on binaries to try to understand their
behavior better.  I'm also using them to handle large (500M+) CSV files.  I
noticed that if I split the CSV file into a list of lists (i.e. a list of
all the column values) using the binary:split function it consumed a _lot_
of memory.  I thought that this would be efficient because after reading the
docs I was under the impression that the sub binaries would simply reference
the bigger binary off-heap but the behavior I noticed seemed to indicate
that depending on the size of the binary resulting from split it might
reside on the heap.  If that's actually what was happening, I'm not sure.  I
ended up re-writing my functions to transform the CSV a line at a time and
build up a new binary in an accumulator.  This performed much better and
uses much less memory.  Anyways, this is slightly tangential to the problem
being discussed so maybe I'll just post a new thread.


More information about the erlang-questions mailing list