<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">erlang:garbage_collect/0,1 existing is
nice, but using it seems to only form a kludge. It seems much
better to focus on having short-lived Erlang processes to capture
the last reference to generated binaries (that are large enough to
be reference counted, so greater than 64 bytes). The pid options
for fullsweep, hibernate, erts command line options, and perhaps
erlang:garbage_collect/0,1 usage, seems more like tuning for
performance based on usage. Using short-lived processes in Erlang
is using Erlang for what it is good at.... why would you fight
that by having code blow-up memory-wise by leaving long-lived
Erlang processes in your source code?<br>
<br>
This usage of short-lived processes is what concerns me when we
talk about how Erlang pids are limited to 2^28 unique pids per
node (i.e., 28 bits, not yet changed to be 60 bits), but it is
good that the new algorithm for pid reuse in R16B01 will support
this approach (i.e., the Erlang pid per rpc-call approach).
(details are from the "pid representation in external term format"
email thread)<br>
<br>
On 09/17/2013 08:03 AM, Lukas Larsson wrote:<br>
</div>
<blockquote
cite="mid:CAP3zBqPUn+mmP18f0ige-46uHph8XQtb0xwCpgGmObTssrM7bQ@mail.gmail.com"
type="cite">
<meta http-equiv="Context-Type" content="text/html;
charset=ISO-8859-1">
<div dir="ltr">as far as I can see erlang:garbage_collect/0,1 does
do fullsweeps [1].<br>
<br>
[1]: <a moz-do-not-send="true"
href="https://github.com/erlang/otp/blob/maint/erts/emulator/beam/bif.c#L3770">https://github.com/erlang/otp/blob/maint/erts/emulator/beam/bif.c#L3770</a><br>
<div>
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">On Tue, Sep 17, 2013 at 4:53 PM,
Robert Virding <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:robert.virding@erlang-solutions.com"
target="_blank">robert.virding@erlang-solutions.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote">OK, I meant doing an
explicit garbage collectionen and was under the
impression that calling erlang:garbage_collect()
actually did a full sweep of the process which is why
hibernating seemed a bit of an overkill.<br>
<br>
Rober<br>
<div class="im"><br>
----- Original Message -----<br>
> From: "Fred Hebert" <<a moz-do-not-send="true"
href="mailto:mononcqc@ferd.ca">mononcqc@ferd.ca</a>><br>
</div>
<div class="im">> To: "Robert Virding" <<a
moz-do-not-send="true"
href="mailto:robert.virding@erlang-solutions.com">robert.virding@erlang-solutions.com</a>><br>
> Cc: "Erlang Questions Mailing List" <<a
moz-do-not-send="true"
href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a>><br>
</div>
<div class="">
<div class="h5">> Sent: Tuesday, 17 September, 2013
4:36:19 PM<br>
> Subject: Re: [erlang-questions] Problem with
Beam process not freeing memory<br>
><br>
> Hi Robert,<br>
><br>
> I CC'd the mailing list on this post, because I
felt it could be<br>
> interesting to share with everyone.<br>
><br>
> To get into some more details, the difference
between a<br>
> garbage-collection and hibernating is that
hibernation forces a<br>
> full-sweep and does compaction work. It is more
likely to actually<br>
> remove old refs to binaries.<br>
><br>
> The tricky part about binary leaks in these
cases is that if the process<br>
> you're garbage collecting holds some very
long-lived references (or<br>
> takes a long while before enabling them), you
will move the references<br>
> to the old heap, if my understanding is
correct. Now if your process<br>
> that leaks resources is busy or bogged down by
some task, there are<br>
> chances that manually calling GC at higher
frequencies will force<br>
> short-lived references to the old heap.
Eventually, most of the<br>
> subsequent GCs are done for no good reason
until there's a full sweep to<br>
> free the references, if my understanding is
right.<br>
><br>
> In comparison, some hibernations may turn out
to be beneficial due to<br>
> how they do the full sweep, especially on less
active processes, without<br>
> changing spawn_opt values or whatever.<br>
><br>
> Both cases are not necessarily great, and I
don't think there's one easy<br>
> way to deal with it.<br>
><br>
> One thing I've had in mind to try for a while
was to run a function a<br>
> bit like:<br>
><br>
> -spec gc_count(non_neg_integer(), binary())
-> non_neg_integer().<br>
> gc_count(PreviousCounter, Bin) -><br>
> case byte_size(Bin) of<br>
> N when N >= 64 -> % refc binary<br>
> Count = N + PreviousCounter<br>
> case NewCount >= ?THRESHOLD of<br>
> true -><br>
> erlang:garbage_collect(),<br>
> 0;<br>
> false -><br>
> NewCount<br>
> end;<br>
> N -> % heap binary<br>
> PreviousCounter+N<br>
> end.<br>
><br>
> that could alternatively force some hibernation
instead of GC'ing. This<br>
> one could basically track the size of all
binaries seen manually and<br>
> force something when you go over a certain
amount. It sucks, though,<br>
> because that's basically manually doing your
collection, and it doesn't<br>
> mean that because you have seen a binary, it's
ready to be GC'd. I've<br>
> thus avoided trying it in the real world for
now.<br>
><br>
> In practice, at Heroku, we've decided to go for
a hybrid approach in<br>
> logplex. We force hibernation on some important
events that interrupt<br>
> our work flow no matter what (such as a socket
disconnection, or long<br>
> periods of time [seconds] without activity for
a process), and have put<br>
> a workaround in place to force VM-wide GCs when
we're reaching critical<br>
> amounts of memory:<br>
> <a moz-do-not-send="true"
href="https://github.com/heroku/logplex/blob/master/src/logplex_leak.erl"
target="_blank">https://github.com/heroku/logplex/blob/master/src/logplex_leak.erl</a><br>
><br>
> The objective was to use global GC as a last
measure in case individual<br>
> (unobtrusive) hibernates were not enough to
save a node.<br>
><br>
> This later on prompted for exploring the
allocators of the VM -- the<br>
> value used (erlang:memory(total)) didn't
represent the OS-imposed limits<br>
> on the VM: nodes would be killed by going out
of memory without first<br>
> having had the chance to run the global GC.
This lead to discovering<br>
> things about fragmentation and characterizing
our workloads to pick<br>
> better allocation strategies that seem to work
decently so far, so that<br>
> erlang:memory(total), for one, has the right
values, and also that we<br>
> have a better time releasing allocated blocks
of memory when most<br>
> binaries vanish.<br>
><br>
> I hope we can remove both the artificial
hibernation calls and the<br>
> workarounds to force some global GCs in the
near future. Ideally, it<br>
> sounds like the VM should possibly do more when
it comes to the weight<br>
> of refc binaries to individual processes'
memory for GC, but I don't<br>
> have a good idea of how this should be done in
practice without having<br>
> elephant-sized assumptions and holes in the
solution without adding more<br>
> knobs to the VM to configure things. Plus I'd
have no idea on how to<br>
> actually implement it.<br>
><br>
> Regards,<br>
> Fred.<br>
><br>
><br>
> On 09/17, Robert Virding wrote:<br>
> > Hi Fred,<br>
> ><br>
> > You recommend hibernating a process. Do
you think this is better than<br>
> > calling the garbage collector in a
process? I have no idea but hibernating<br>
> > seems more drastic, especially if the
process is "in use"?<br>
> ><br>
> > Robert<br>
> ><br>
> > ----- Original Message -----<br>
> > > From: "Fred Hebert" <<a
moz-do-not-send="true"
href="mailto:mononcqc@ferd.ca">mononcqc@ferd.ca</a>><br>
> > > To: "Tino Breddin" <<a
moz-do-not-send="true"
href="mailto:tino.breddin@googlemail.com">tino.breddin@googlemail.com</a>><br>
> > > Cc: "Erlang Questions Mailing List"
<<a moz-do-not-send="true"
href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a>><br>
> > > Sent: Tuesday, 17 September, 2013
2:39:41 AM<br>
> > > Subject: Re: [erlang-questions]
Problem with Beam process not freeing<br>
> > > memory<br>
> > ><br>
> > > I've recently run in similar issues
and have received a bit of help from<br>
> > > Lukas Larsson, which I'm glad to pass
on to you. Whatever he taught me,<br>
> > > I tried to put into the recon
library, currently on a different branch<br>
> > > awaiting review and prone to change:<br>
> > > <a moz-do-not-send="true"
href="https://github.com/ferd/recon/tree/allocators"
target="_blank">https://github.com/ferd/recon/tree/allocators</a><br>
> > ><br>
> > > 1. Checking for binary leaks, I
recommend calling `recon:bin_leak(N)`<br>
> > > where `N` is the number of 'highest
results' you want. The function will<br>
> > > take a snapshot of the number of
binary refs in your processes, then GC<br>
> > > the node entirely, and then take
another snapshot, make diff, and return<br>
> > > the N biggest deltas in binaries.
This will let you know what processes<br>
> > > hold the most references to stale
refc binaries. I recommend hibernation<br>
> > > as a first way to try and fix this if
it is the problem.<br>
> > ><br>
> > > 2. Check the reported/allocated
memory with `recon_alloc:memory(Arg)`<br>
> > > where `Arg` can be:<br>
> > > - `used` for the memory actively
used (i.e. erlang:memory(total))<br>
> > > - `allocated` for the memory
reserved by individual allocators (sum)<br>
> > > - `usage` for the percentage.<br>
> > > If the result of `allocated` is close
to what the OS reports, you<br>
> > > probably have fragmentation issues.
If not, you may have a NIF or driver<br>
> > > that allocates data outside of the
ERTS allocators<br>
> > ><br>
> > > 3. check individual allocator usage
levels with<br>
> > > `recon_alloc:fragmentation(current)`.
It will return usage percentages<br>
> > > for mbcs and sbcs. Mbcs are
multiblock carriers and are where data goes<br>
> > > by default. When the data allocated
is too large (> than the single<br>
> > > block carrier threshold [sbct]), it
goes into its own block. Compare the<br>
> > > results with what you get with
`recon_alloc:fragmentation(max)`. If the<br>
> > > current values have very low usage
but the max ones have large ones, you<br>
> > > may have lingering data, possibly
held in long-term references or<br>
> > > whatever that blocks deallocation of
specific carriers. Different<br>
> > > carrier strategies can help, which we
can dive into if you see a problem<br>
> > > with this.<br>
> > ><br>
> > > Feel free to read the comments in
`recon_alloc` until I actually merge<br>
> > > it in master, they contain some of
the details about what to do or look<br>
> > > for.<br>
> > ><br>
> > > Lukas may want me to correct me on
the content of this post. I'm going<br>
> > > from the limited knowledge he
transmitted to me here, or rather, my<br>
> > > limited understanding of it :)<br>
> > ><br>
> > > Regards,<br>
> > > Fred.<br>
> > ><br>
> > > On 09/16, Tino Breddin wrote:<br>
> > > > Hi list,<br>
> > > ><br>
> > > > I'm experiencing issues with a
couple of Beam nodes where I see a huge<br>
> > > > gap<br>
> > > > between the node's reported
memory usage and the underlying Linux<br>
> > > > Kernel's<br>
> > > > view.<br>
> > > ><br>
> > > > This is using R15B01.<br>
> > > ><br>
> > > > As a start an application in
such a node stores a lot of tuples<br>
> > > > (containing<br>
> > > > atoms and binary data) in ETS
tables. That proceeds until a point where<br>
> > > > memory usage is 70% (6GB) of the
available memory. At that point<br>
> > > > erlang:memory() and top (or
/proc/PID/status) agree roughly on the<br>
> > > > memory<br>
> > > > usage. Then an internal cleanup
task is performed, which clears<br>
> > > > obsolete<br>
> > > > records from the ETS tables.
Afterwards, erlang:memory() reports an<br>
> > > > expected low value of roughly
60MB memory usage. (This includes binary<br>
> > > > data). However, the kernel still
reports the high memory usage values<br>
> > > > (both<br>
> > > > VmRss and VmTotal) for the node.
The kernel's usage view will stay<br>
> > > > stable<br>
> > > > until the ETS tables are filled
to a point where the real memory usage<br>
> > > > exceeds the kernel's view, then
the kernel reported usage will grow as<br>
> > > > well.<br>
> > > ><br>
> > > > Now having checked the node in
some details I'm wondering what causes<br>
> > > > this<br>
> > > > difference between the BEAM's
view and the Kernel's view on memory<br>
> > > > usage. I<br>
> > > > have 2 ideas which I'm checking
right now.<br>
> > > ><br>
> > > > (1) Not GC'ed binaries: Could it
be that binary data is not GC'ed<br>
> > > > because<br>
> > > > the original dispatcher process
which it was passed through before<br>
> > > > being<br>
> > > > stored in an ETS table is still
alive. Thus there is still some<br>
> > > > reference<br>
> > > > to it? However, this would not
explain why erlang:memory() reports a<br>
> > > > very<br>
> > > > low value for used memory for
binaries.<br>
> > > ><br>
> > > > (2) low-level memory leak: Some
driver or NIF leaking memory, which<br>
> > > > would<br>
> > > > obviously not be reported by
erlang:memory(). However, then it<br>
> > > > surprises me<br>
> > > > that the Kernel's view stays
stable while the BEAM's actual memory<br>
> > > > usage is<br>
> > > > still below the Kernel's view.
It should be continuously growing in<br>
> > > > this<br>
> > > > case imho.<br>
> > > ><br>
> > > > I'd appreciate if anyone has
some more insight or experience with such<br>
> > > > a<br>
> > > > behaviour, while I'm further
digging into this.<br>
> > > ><br>
> > > > Cheers,<br>
> > > > Tino<br>
> > ><br>
> > > >
_______________________________________________<br>
> > > > erlang-questions mailing list<br>
> > > > <a moz-do-not-send="true"
href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a><br>
> > > > <a moz-do-not-send="true"
href="http://erlang.org/mailman/listinfo/erlang-questions"
target="_blank">http://erlang.org/mailman/listinfo/erlang-questions</a><br>
> > ><br>
> > >
_______________________________________________<br>
> > > erlang-questions mailing list<br>
> > > <a moz-do-not-send="true"
href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a><br>
> > > <a moz-do-not-send="true"
href="http://erlang.org/mailman/listinfo/erlang-questions"
target="_blank">http://erlang.org/mailman/listinfo/erlang-questions</a><br>
> > ><br>
><br>
_______________________________________________<br>
erlang-questions mailing list<br>
<a moz-do-not-send="true"
href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a><br>
<a moz-do-not-send="true"
href="http://erlang.org/mailman/listinfo/erlang-questions"
target="_blank">http://erlang.org/mailman/listinfo/erlang-questions</a><br>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
erlang-questions mailing list
<a class="moz-txt-link-abbreviated" href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a>
<a class="moz-txt-link-freetext" href="http://erlang.org/mailman/listinfo/erlang-questions">http://erlang.org/mailman/listinfo/erlang-questions</a>
</pre>
</blockquote>
<br>
</body>
</html>