[erlang-questions] Problem with Beam process not freeing memory

Tue Sep 17 02:39:41 CEST 2013

I've recently run in similar issues and have received a bit of help from
Lukas Larsson, which I'm glad to pass on to you. Whatever he taught me,
I tried to put into the recon library, currently on a different branch
awaiting review and prone to change:
https://github.com/ferd/recon/tree/allocators

1. Checking for binary leaks, I recommend calling `recon:bin_leak(N)`
where `N` is the number of 'highest results' you want. The function will
take a snapshot of the number of binary refs in your processes, then GC
the node entirely, and then take another snapshot, make diff, and return
the N biggest deltas in binaries. This will let you know what processes
hold the most references to stale refc binaries. I recommend hibernation
as a first way to try and fix this if it is the problem.

2. Check the reported/allocated memory with `recon_alloc:memory(Arg)`
where `Arg` can be:
 - `used` for the memory actively used (i.e. erlang:memory(total))
 - `allocated` for the memory reserved by individual allocators (sum)
 - `usage` for the percentage.
If the result of `allocated` is close to what the OS reports, you
probably have fragmentation issues. If not, you may have a NIF or driver
that allocates data outside of the ERTS allocators

3. check individual allocator usage levels with
`recon_alloc:fragmentation(current)`. It will return usage percentages
for mbcs and sbcs. Mbcs are multiblock carriers and are where data goes
by default. When the data allocated is too large (> than the single
block carrier threshold [sbct]), it goes into its own block. Compare the
results with what you get with `recon_alloc:fragmentation(max)`. If the
current values have very low usage but the max ones have large ones, you
may have lingering data, possibly held in long-term references or
whatever that blocks deallocation of specific carriers. Different
carrier strategies can help, which we can dive into if you see a problem
with this.

Feel free to read the comments in `recon_alloc` until I actually merge
it in master, they contain some of the details about what to do or look
for.

Lukas may want me to correct me on the content of this post. I'm going
from the limited knowledge he transmitted to me here, or rather, my
limited understanding of it :)

Regards,
Fred.

On 09/16, Tino Breddin wrote:
> Hi list,
> 
> I'm experiencing issues with a couple of Beam nodes where I see a huge gap
> between the node's reported memory usage and the underlying Linux Kernel's
> view.
> 
> This is using R15B01.
> 
> As a start an application in such a node stores a lot of tuples (containing
> atoms and binary data) in ETS tables. That proceeds until a point where
> memory usage is 70% (6GB) of the available memory. At that point
> erlang:memory() and top (or /proc/PID/status) agree roughly on the memory
> usage. Then an internal cleanup task is performed, which clears obsolete
> records from the ETS tables. Afterwards, erlang:memory() reports an
> expected low value of roughly 60MB memory usage. (This includes binary
> data). However, the kernel still reports the high memory usage values (both
> VmRss and VmTotal) for the node. The kernel's usage view will stay stable
> until the ETS tables are filled to a point where the real memory usage
> exceeds the kernel's view, then the kernel reported usage will grow as well.
> 
> Now having checked the node in some details I'm wondering what causes this
> difference between the BEAM's view and the Kernel's view on memory usage. I
> have 2 ideas which I'm checking right now.
> 
> (1) Not GC'ed binaries: Could it be that binary data is not GC'ed because
> the original dispatcher process which it was passed through before being
> stored in an ETS table is still alive. Thus there is still some reference
> to it? However, this would not explain why erlang:memory() reports a very
> low value for used memory for binaries.
> 
> (2) low-level memory leak: Some driver or NIF leaking memory, which would
> obviously not be reported by erlang:memory(). However, then it surprises me
> that the Kernel's view stays stable while the BEAM's actual memory usage is
> still below the Kernel's view. It should be continuously growing in this
> case imho.
> 
> I'd appreciate if anyone has some more insight or experience with such a
> behaviour, while I'm further digging into this.
> 
> Cheers,
> Tino

> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions