[erlang-questions] Garbage Collection, BEAM memory and Erlang memory

Fred Hebert mononcqc@REDACTED
Tue Jan 27 20:09:44 CET 2015


On 01/27, Roberto Ostinelli wrote:
> 
> I see consistent total, process and binary usage. Unfortunately the ratio
> falls:
> 
> 2> recon_alloc:memory(usage, current).
> 0.7353255106318904
> 
> ..after a while:
> 
> 3> recon_alloc:memory(usage, current).
> 0.5630988225908702
> 

Yeah that does look like some good indication there's an allocator
'leak' (or bad usage). You can possibly look at other recon functions to
try and figure things are wrong in specific ways (a given allocator is
worse than others -- if it's allocator 0, that's for NIFs and drivers --
or other ones)

> Why is the VM so eager on memory if the underlying erlang usage is stable?
> 
> Is there anything I can do? I honestly don't know where else to look.
> 
>    - Binaries are optimized (checked with +bin_opt_info).
>    - Erlang reported memory for total, process and binary is linear.
>    - I'm using some gimmicks like fullsweep_after 10 as a system flag.
>    - I hibernate the long living TCP connections (which is where the
>    problem comes from, since I ran tests on short lived connections and had no
>    issues).
> 
> Any help would be greatly appreciated.
> 

What this looks like from the usage metrics is possibly the need for
different memory allocation strategy. There'S unfortunately no super
easy way to do it, but if the problem shows up quickly, that at least
makes it a lot easier to experiment.

I have covered the topic in Section 7.3 of Erlang in Anger
(http://erlang-in-anger.com), Memory Fragmentation.

The steps are usually:

1. Find that you have fragmentation issues (done)
2. Find which allocator is to blame
3. Take note of what your usage pattern is for that allocator. Is data
   held for a long time? Only some of it? Does it get released in large
   blocks? What's the variation in datasize type? Max, min, p95 or p99
   size?
4. Check the different strategies available (p.71-73) and see if one
   could make sense for your usage.
5. Check your average block size (in recon_alloc) and see if you need to
   tweak yours up or down so more or less data gets to be used in the
   same operation (and may hold them back if they need to be released)
6. Experiment a lot and report back.

If you tend to see lots of holes, doing things like reducing block sizes
(while making sure your sbcs/mbcs ratio remains low enough) and looking
for some address-order strategy (rather than best fit) might end up
helping by reusing already-allocated blocks more, and also reducing how
much spread there is.

Anyway, that's more or less the extent of the experimenting I've done
that can be applied in a generic manner.



More information about the erlang-questions mailing list