[erlang-questions] How to return all records in dets

Ulf Wiger ulf@REDACTED
Mon Jun 2 08:45:18 CEST 2014

On 02 Jun 2014, at 08:33, Fred Hebert <mononcqc@REDACTED> wrote:

> Using a fold, you iterate over the table and can accumulate the sum as
> you go, never needing to build the intermediate list. That can help save
> on memory if your data set is particularly large.

Indeed. The fold functions in ets were added exactly because pulling the
entire data set onto the heap could have terrible consequences for
memory usage - esp. in the old days, when 512 MB was considered
“a lot” of memory. :)

Garbage collection generally has difficulty with constantly growing
data sets. The GC will trigger when the heap is full, and sweep for
garbage. If we’re building a large structure, there will be no garbage.
With generational GC, the “new heap” will be swept first. When that
fails, a fullsweep will be tried. When that fails, the heap will be 
resized. Rinse, repeat, until the whole data structure has been built.

Not only is this expensive; the memory required for the operation
can well reach 3x the size of the data set being built. At least this
used to be the rule of thumb in the good ol’ days.

It is possible to temporarily set the heap size to something very
big, build the structure, and then set it back. But the fold operations
can execute without growing the heap.

Ulf W

Ulf Wiger, Co-founder & Developer Advocate, Feuerlabs Inc.

More information about the erlang-questions mailing list