[erlang-questions] Help debugging binary memory usage

Michael Martin mmartin4242@REDACTED
Sun Oct 16 22:12:50 CEST 2016


Possible message leak? Check for unhandled messages, and log them. See 
the section on unhandled messages here 
<https://www.safaribooksonline.com/library/view/designing-for-scalability/9781449361556/ch04.html>.


On 10/16/2016 03:05 AM, Paul Oliver wrote:
> Hey Luca,
>
> Check out https://github.com/ferd/recon and 
> http://dieswaytoofast.blogspot.com/2012/12/erlang-binaries-and-garbage-collection.html 
> <http://dieswaytoofast.blogspot.co.nz/2012/12/erlang-binaries-and-garbage-collection.html>
>
> Cheers,
> Paul.
>
> On Sun, Oct 16, 2016 at 8:53 PM Luca Spiller <luca@REDACTED 
> <mailto:luca@REDACTED>> wrote:
>
>     Hi everyone,
>
>     One of our nodes seems to have a memory leak. After a couple of
>     days the memory usage gets so high that the OOM killer kills it,
>     and it's restarted. It seems to have been going on for a few
>     years, as it works fine the whole time so nobody noticed - it just
>     uses up all the memory on the box.
>
>     A bit of background: the node is making hundreds of HTTP requests
>     per second. There are a thousand or so worker processes
>     responsible for this, which make a request, inspect the response
>     headers, and based on these start other processes. The process
>     then sleeps for X time (seconds to minutes) and does the same
>     again. The response body can be any size, but we don't care about
>     that in the application (but I'd assume it gets converted to a
>     binary by lhttpc). I should also note that some of the requests
>     are made over TLS.
>
>     https://dl.dropboxusercontent.com/u/21557257/20161016-erl/observer-system.png
>
>     This is the output from Observer, as you can see it shows that
>     binaries are using 2569 MB of RAM. When the node has been
>     restarted and running for a few minutes this is usually < 10 MB.
>     Most of the worker processes (95%+) which make the requests are
>     started shortly after the node starts and hang around forever.
>
>     https://dl.dropboxusercontent.com/u/21557257/20161016-erl/observer-processes.png
>
>     This is the process list from Observer, sorted by memory, it
>     doesn't appear to show anything interesting. The worker processes
>     (XXX:init/1) use roughly the same amount of memory after they've
>     been running for a few minutes.
>
>     As I understand large binaries stick around until the system is
>     under 'high memory pressure' before being GCed. In my case the
>     node uses up half the swap, and all the RAM - is that not high
>     enough? After that the OOM killer jumps in and deals with it forcibly.
>
>     So... what can I do to debug this?
>
>     Thanks,
>
>     Luca Spiller
>     _______________________________________________
>     erlang-questions mailing list
>     erlang-questions@REDACTED <mailto:erlang-questions@REDACTED>
>     http://erlang.org/mailman/listinfo/erlang-questions
>
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20161016/a7398ada/attachment.htm>


More information about the erlang-questions mailing list