<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">On 01/28/2015 08:34 AM, Roberto
Ostinelli wrote:<br>
</div>
<blockquote
cite="mid:CAM5fRyopJjrrKwWSKxkH9YK9qtedm60tZvZwAbpmw+-WfPXFFA@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">On Wed, Jan 28, 2015 at 5:19 PM, Fred
Hebert <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:mononcqc@ferd.ca" target="_blank">mononcqc@ferd.ca</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span
class="">On 01/28, Roberto Ostinelli wrote:<br>
> Here's the erl_crash.dump analysis, done with the
nice Fred's script:<br>
><br>
> analyzing erl_crash.dump, generated on: Wed Jan 28
13:59:36 2015<br>
><br>
> Slogan: Received SIGUSR1<br>
><br>
</span><span class="">> File descriptors open:<br>
> ===<br>
> UDP: 0<br>
> TCP: 180071<br>
> Files: 6<br>
> ---<br>
> Total: 180077<br>
><br>
<br>
</span>So uh, is 180,000 TCP ports the number you have or
expect? Could you be<br>
running out of these and then massive erroring takes place
or anything<br>
like that? What are the limits in place for your system?<br>
</blockquote>
<div><br>
</div>
<div>That's exactly the number of long lived connections to
the system, so yes it's expected.</div>
<div>ulimit (hard and soft) are set to 1M.</div>
<div> </div>
<div><br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span
class="">
> Do you see anything wrong there? I honestly don't.<br>
><br>
<br>
</span>Nothing looks obviously wrong. At this point my
number one suspsicion<br>
would be some process(es) that suddenly get lots of
messages (links,<br>
monitors, node monitors, general events) and fill their
mailboxes. This<br>
then prompts for GCs, which suspends the process, copies
the mailbox to<br>
the heap, and then runs again.<br>
<br>
This kind of stuff can sometimes spiral out of control.<br>
</blockquote>
<div><br>
</div>
<div><br>
</div>
<div>So, here's the thing. I got a bigger box, a 30GB. I'm
launching the same test, and What I have is the following:</div>
<div><a moz-do-not-send="true"
href="https://cldup.com/1M4qzvbLp_-3000x3000.png">https://cldup.com/1M4qzvbLp_-3000x3000.png</a><br>
</div>
<div><br>
</div>
<div>Basically what happens is that the memory "ramp-up"
happens in two phases. The second phase was the one
previously making my VM blowup.</div>
<div>With this bigger box, this phase continues but then
stabilizes.</div>
<div><br>
</div>
<div>Not sure why this happens in this way, but with a
bigger box, as you can see, when 22.3 GB of memory are
reached, everything is stable.<br>
</div>
<div>This was confusing, but it simply looks like there's
some time needed before the memory gets allocated. I'm
assuming this has to do with fullsweep_after. With it set
to 10, we get to the 22.3GB illustrated; with it set to 5,
we get to 19.3GB (3GB less).</div>
<div><br>
</div>
<div>I guess this is it for now. I can only thank you for
your time.</div>
<div><br>
</div>
<div>Any additional remarks?</div>
<div><br>
</div>
<div>Best,</div>
<div>r.</div>
<div><br>
</div>
</div>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
erlang-questions mailing list
<a class="moz-txt-link-abbreviated" href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a>
<a class="moz-txt-link-freetext" href="http://erlang.org/mailman/listinfo/erlang-questions">http://erlang.org/mailman/listinfo/erlang-questions</a>
</pre>
</blockquote>
<tt>As I mentioned previously, your long-lived processes are
generating too much garbage with your current architecture to be
able to scale as high as you are attempting. You should be able
to see the problem with less load and usage of </tt><a class="moz-txt-link-freetext" href="http://www.erlang.org/doc/man/instrument.html">http://www.erlang.org/doc/man/instrument.html</a>
which would then lead to source code changes that split monolithic
Erlang processes that generate too much garbage (long-lived) into
smaller ones that are able to trigger garbage collection faster
(short-lived). The Erlang process message queues are a secondary
concern due to the possibility of overloading bottleneck processes,
but that problem can be discovered and solved with this approach.<br>
</body>
</html>