[erlang-questions] Garbage Collection, BEAM memory and Erlang memory
Wed Jan 28 19:37:49 CET 2015
On 01/28/2015 08:34 AM, Roberto Ostinelli wrote:
> On Wed, Jan 28, 2015 at 5:19 PM, Fred Hebert <mononcqc@REDACTED <mailto:mononcqc@REDACTED>> wrote:
> On 01/28, Roberto Ostinelli wrote:
> > Here's the erl_crash.dump analysis, done with the nice Fred's script:
> > analyzing erl_crash.dump, generated on: Wed Jan 28 13:59:36 2015
> > Slogan: Received SIGUSR1
> > File descriptors open:
> > ===
> > UDP: 0
> > TCP: 180071
> > Files: 6
> > ---
> > Total: 180077
> So uh, is 180,000 TCP ports the number you have or expect? Could you be
> running out of these and then massive erroring takes place or anything
> like that? What are the limits in place for your system?
> That's exactly the number of long lived connections to the system, so yes it's expected.
> ulimit (hard and soft) are set to 1M.
> > Do you see anything wrong there? I honestly don't.
> Nothing looks obviously wrong. At this point my number one suspsicion
> would be some process(es) that suddenly get lots of messages (links,
> monitors, node monitors, general events) and fill their mailboxes. This
> then prompts for GCs, which suspends the process, copies the mailbox to
> the heap, and then runs again.
> This kind of stuff can sometimes spiral out of control.
> So, here's the thing. I got a bigger box, a 30GB. I'm launching the same test, and What I have is the following:
> Basically what happens is that the memory "ramp-up" happens in two phases. The second phase was the one previously making my VM blowup.
> With this bigger box, this phase continues but then stabilizes.
> Not sure why this happens in this way, but with a bigger box, as you can see, when 22.3 GB of memory are reached, everything is stable.
> This was confusing, but it simply looks like there's some time needed before the memory gets allocated. I'm assuming this has to do with fullsweep_after. With it set to 10, we get to the 22.3GB illustrated; with it set to 5, we get to 19.3GB (3GB less).
> I guess this is it for now. I can only thank you for your time.
> Any additional remarks?
> erlang-questions mailing list
As I mentioned previously, your long-lived processes are generating too much garbage with your current architecture to be able to scale as high as you are attempting. You should be able to see the problem with less load and usage of http://www.erlang.org/doc/man/instrument.html which would then lead to source code changes that split monolithic Erlang processes that generate too much garbage (long-lived) into smaller ones that are able to trigger garbage collection faster (short-lived). The Erlang process message queues are a secondary concern due to the possibility of overloading bottleneck processes, but that problem can be discovered and solved with this approach.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the erlang-questions