[erlang-questions] list size 'causing VM "problems"

Ulf Wiger ulf.wiger@REDACTED
Mon Nov 23 11:54:48 CET 2009

Hendrik Visage wrote:
> Hi there,
> Yes, I know this code is not yet optimal (I'm still learning :), but
> it begs a few questions I'd like to understand from the VM etc.
> 1) I've run it fine with a small subset, but once I've loaded the 930k
> lines file, the VM sucks up a lot of RAM/Virtualmemory. Like a burst
> of about 2G (I have a 4G MacBookPro) and then once it returned in the
> erl shell, the VM starts to go balistic and consumes >7G of
> virtualmemory ;(
> Q1: why did the VM exhibit this behaviour? the garbage collector going bad/mad??

Since it consumes >7G, am I right in guessing that you're
running 64-bit Erlang?

If so (and in any case), you should really use 'binary' instead
of 'list' in the regular expression option list. Using list
representation of the string data, each byte will consume two
heap words of memory - 8 bytes in 32-bit Erlang and 16 bytes in

Regarding the GC, consider what it has to work with. You are
building a very large data structure in a tight loop. The
process will continuously run out of heap, triggering the GC.
The GC will copy live data (which is going to be most of it)
to another copy of the heap. If that's not enough, it will
run a fullsweep, also looking at data that survived the
previous GC (no garbage there, since the list just keeps
growing). This creates yet another heap copy.
Finally, it does a resize of the heap, if necessary.

It is possible to pre-size the heap using spawn_opt() and the
min_heap_size option. Given that you have a very large data
structure, this may still turn out problematic.

You should definitely try putting the data in ETS instead of
accumulating it on the heap.

Ulf W
Ulf Wiger
CTO, Erlang Training & Consulting Ltd

More information about the erlang-questions mailing list