[erlang-questions] Help debugging binary memory usage
Michael Martin
mmartin4242@REDACTED
Sun Oct 16 22:12:50 CEST 2016
Possible message leak? Check for unhandled messages, and log them. See
the section on unhandled messages here
<https://www.safaribooksonline.com/library/view/designing-for-scalability/9781449361556/ch04.html>.
On 10/16/2016 03:05 AM, Paul Oliver wrote:
> Hey Luca,
>
> Check out https://github.com/ferd/recon and
> http://dieswaytoofast.blogspot.com/2012/12/erlang-binaries-and-garbage-collection.html
> <http://dieswaytoofast.blogspot.co.nz/2012/12/erlang-binaries-and-garbage-collection.html>
>
> Cheers,
> Paul.
>
> On Sun, Oct 16, 2016 at 8:53 PM Luca Spiller <luca@REDACTED
> <mailto:luca@REDACTED>> wrote:
>
> Hi everyone,
>
> One of our nodes seems to have a memory leak. After a couple of
> days the memory usage gets so high that the OOM killer kills it,
> and it's restarted. It seems to have been going on for a few
> years, as it works fine the whole time so nobody noticed - it just
> uses up all the memory on the box.
>
> A bit of background: the node is making hundreds of HTTP requests
> per second. There are a thousand or so worker processes
> responsible for this, which make a request, inspect the response
> headers, and based on these start other processes. The process
> then sleeps for X time (seconds to minutes) and does the same
> again. The response body can be any size, but we don't care about
> that in the application (but I'd assume it gets converted to a
> binary by lhttpc). I should also note that some of the requests
> are made over TLS.
>
> https://dl.dropboxusercontent.com/u/21557257/20161016-erl/observer-system.png
>
> This is the output from Observer, as you can see it shows that
> binaries are using 2569 MB of RAM. When the node has been
> restarted and running for a few minutes this is usually < 10 MB.
> Most of the worker processes (95%+) which make the requests are
> started shortly after the node starts and hang around forever.
>
> https://dl.dropboxusercontent.com/u/21557257/20161016-erl/observer-processes.png
>
> This is the process list from Observer, sorted by memory, it
> doesn't appear to show anything interesting. The worker processes
> (XXX:init/1) use roughly the same amount of memory after they've
> been running for a few minutes.
>
> As I understand large binaries stick around until the system is
> under 'high memory pressure' before being GCed. In my case the
> node uses up half the swap, and all the RAM - is that not high
> enough? After that the OOM killer jumps in and deals with it forcibly.
>
> So... what can I do to debug this?
>
> Thanks,
>
> Luca Spiller
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED <mailto:erlang-questions@REDACTED>
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20161016/a7398ada/attachment.htm>
More information about the erlang-questions
mailing list