[erlang-questions] Problem with Beam process not freeing memory

Tue Sep 17 16:36:19 CEST 2013

Hi Robert,

I CC'd the mailing list on this post, because I felt it could be
interesting to share with everyone.

To get into some more details, the difference between a
garbage-collection and hibernating is that hibernation forces a
full-sweep and does compaction work. It is more likely to actually
remove old refs to binaries.

The tricky part about binary leaks in these cases is that if the process
you're garbage collecting holds some very long-lived references (or
takes a long while before enabling them), you will move the references
to the old heap, if my understanding is correct. Now if your process
that leaks resources is busy or bogged down by some task, there are
chances that manually calling GC at higher frequencies will force
short-lived references to the old heap.  Eventually, most of the
subsequent GCs are done for no good reason until there's a full sweep to
free the references, if my understanding is right.

In comparison, some hibernations may turn out to be beneficial due to
how they do the full sweep, especially on less active processes, without
changing spawn_opt values or whatever.

Both cases are not necessarily great, and I don't think there's one easy
way to deal with it.

One thing I've had in mind to try for a while was to run a function a
bit like:

-spec gc_count(non_neg_integer(), binary()) -> non_neg_integer().
gc_count(PreviousCounter, Bin) ->
    case byte_size(Bin) of
        N when N >= 64 -> % refc binary
            Count = N + PreviousCounter
            case NewCount >= ?THRESHOLD of
                true ->
                    erlang:garbage_collect(),
                    0;
               false ->
                   NewCount
            end;
        N -> % heap binary
            PreviousCounter+N
    end.

that could alternatively force some hibernation instead of GC'ing. This
one could basically track the size of all binaries seen manually and
force something when you go over a certain amount. It sucks, though,
because that's basically manually doing your collection, and it doesn't
mean that because you have seen a binary, it's ready to be GC'd. I've
thus avoided trying it in the real world for now.

In practice, at Heroku, we've decided to go for a hybrid approach in
logplex. We force hibernation on some important events that interrupt
our work flow no matter what (such as a socket disconnection, or long
periods of time [seconds] without activity for a process), and have put
a workaround in place to force VM-wide GCs when we're reaching critical
amounts of memory:
https://github.com/heroku/logplex/blob/master/src/logplex_leak.erl

The objective was to use global GC as a last measure in case individual
(unobtrusive) hibernates were not enough to save a node.

This later on prompted for exploring the allocators of the VM -- the
value used (erlang:memory(total)) didn't represent the OS-imposed limits
on the VM: nodes would be killed by going out of memory without first
having had the chance to run the global GC. This lead to discovering
things about fragmentation and characterizing our workloads to pick
better allocation strategies that seem to work decently so far, so that
erlang:memory(total), for one, has the right values, and also that we
have a better time releasing allocated blocks of memory when most
binaries vanish.

I hope we can remove both the artificial hibernation calls and the
workarounds to force some global GCs in the near future. Ideally, it
sounds like the VM should possibly do more when it comes to the weight
of refc binaries to individual processes' memory for GC, but I don't
have a good idea of how this should be done in practice without having
elephant-sized assumptions and holes in the solution without adding more
knobs to the VM to configure things. Plus I'd have no idea on how to
actually implement it.

Regards,
Fred.

On 09/17, Robert Virding wrote:
> Hi Fred,
>
> You recommend hibernating a process. Do you think this is better than calling the garbage collector in a process? I have no idea but hibernating seems more drastic, especially if the process is "in use"?
>
> Robert
>
> ----- Original Message -----
> > From: "Fred Hebert" <mononcqc@REDACTED>
> > To: "Tino Breddin" <tino.breddin@REDACTED>
> > Cc: "Erlang Questions Mailing List" <erlang-questions@REDACTED>
> > Sent: Tuesday, 17 September, 2013 2:39:41 AM
> > Subject: Re: [erlang-questions] Problem with Beam process not freeing memory
> >
> > I've recently run in similar issues and have received a bit of help from
> > Lukas Larsson, which I'm glad to pass on to you. Whatever he taught me,
> > I tried to put into the recon library, currently on a different branch
> > awaiting review and prone to change:
> > https://github.com/ferd/recon/tree/allocators
> >
> > 1. Checking for binary leaks, I recommend calling `recon:bin_leak(N)`
> > where `N` is the number of 'highest results' you want. The function will
> > take a snapshot of the number of binary refs in your processes, then GC
> > the node entirely, and then take another snapshot, make diff, and return
> > the N biggest deltas in binaries. This will let you know what processes
> > hold the most references to stale refc binaries. I recommend hibernation
> > as a first way to try and fix this if it is the problem.
> >
> > 2. Check the reported/allocated memory with `recon_alloc:memory(Arg)`
> > where `Arg` can be:
> >  - `used` for the memory actively used (i.e. erlang:memory(total))
> >  - `allocated` for the memory reserved by individual allocators (sum)
> >  - `usage` for the percentage.
> > If the result of `allocated` is close to what the OS reports, you
> > probably have fragmentation issues. If not, you may have a NIF or driver
> > that allocates data outside of the ERTS allocators
> >
> > 3. check individual allocator usage levels with
> > `recon_alloc:fragmentation(current)`. It will return usage percentages
> > for mbcs and sbcs. Mbcs are multiblock carriers and are where data goes
> > by default. When the data allocated is too large (> than the single
> > block carrier threshold [sbct]), it goes into its own block. Compare the
> > results with what you get with `recon_alloc:fragmentation(max)`. If the
> > current values have very low usage but the max ones have large ones, you
> > may have lingering data, possibly held in long-term references or
> > whatever that blocks deallocation of specific carriers. Different
> > carrier strategies can help, which we can dive into if you see a problem
> > with this.
> >
> > Feel free to read the comments in `recon_alloc` until I actually merge
> > it in master, they contain some of the details about what to do or look
> > for.
> >
> > Lukas may want me to correct me on the content of this post. I'm going
> > from the limited knowledge he transmitted to me here, or rather, my
> > limited understanding of it :)
> >
> > Regards,
> > Fred.
> >
> > On 09/16, Tino Breddin wrote:
> > > Hi list,
> > >
> > > I'm experiencing issues with a couple of Beam nodes where I see a huge gap
> > > between the node's reported memory usage and the underlying Linux Kernel's
> > > view.
> > >
> > > This is using R15B01.
> > >
> > > As a start an application in such a node stores a lot of tuples (containing
> > > atoms and binary data) in ETS tables. That proceeds until a point where
> > > memory usage is 70% (6GB) of the available memory. At that point
> > > erlang:memory() and top (or /proc/PID/status) agree roughly on the memory
> > > usage. Then an internal cleanup task is performed, which clears obsolete
> > > records from the ETS tables. Afterwards, erlang:memory() reports an
> > > expected low value of roughly 60MB memory usage. (This includes binary
> > > data). However, the kernel still reports the high memory usage values (both
> > > VmRss and VmTotal) for the node. The kernel's usage view will stay stable
> > > until the ETS tables are filled to a point where the real memory usage
> > > exceeds the kernel's view, then the kernel reported usage will grow as
> > > well.
> > >
> > > Now having checked the node in some details I'm wondering what causes this
> > > difference between the BEAM's view and the Kernel's view on memory usage. I
> > > have 2 ideas which I'm checking right now.
> > >
> > > (1) Not GC'ed binaries: Could it be that binary data is not GC'ed because
> > > the original dispatcher process which it was passed through before being
> > > stored in an ETS table is still alive. Thus there is still some reference
> > > to it? However, this would not explain why erlang:memory() reports a very
> > > low value for used memory for binaries.
> > >
> > > (2) low-level memory leak: Some driver or NIF leaking memory, which would
> > > obviously not be reported by erlang:memory(). However, then it surprises me
> > > that the Kernel's view stays stable while the BEAM's actual memory usage is
> > > still below the Kernel's view. It should be continuously growing in this
> > > case imho.
> > >
> > > I'd appreciate if anyone has some more insight or experience with such a
> > > behaviour, while I'm further digging into this.
> > >
> > > Cheers,
> > > Tino
> >
> > > _______________________________________________
> > > erlang-questions mailing list
> > > erlang-questions@REDACTED
> > > http://erlang.org/mailman/listinfo/erlang-questions
> >
> > _______________________________________________
> > erlang-questions mailing list
> > erlang-questions@REDACTED
> > http://erlang.org/mailman/listinfo/erlang-questions
> >