[erlang-questions] Help debugging binary memory usage
Sun Oct 16 09:44:13 CEST 2016
One of our nodes seems to have a memory leak. After a couple of days the
memory usage gets so high that the OOM killer kills it, and it's restarted.
It seems to have been going on for a few years, as it works fine the whole
time so nobody noticed - it just uses up all the memory on the box.
A bit of background: the node is making hundreds of HTTP requests per
second. There are a thousand or so worker processes responsible for this,
which make a request, inspect the response headers, and based on these
start other processes. The process then sleeps for X time (seconds to
minutes) and does the same again. The response body can be any size, but we
don't care about that in the application (but I'd assume it gets converted
to a binary by lhttpc). I should also note that some of the requests are
made over TLS.
This is the output from Observer, as you can see it shows that binaries are
using 2569 MB of RAM. When the node has been restarted and running for a
few minutes this is usually < 10 MB. Most of the worker processes (95%+)
which make the requests are started shortly after the node starts and hang
This is the process list from Observer, sorted by memory, it doesn't appear
to show anything interesting. The worker processes (XXX:init/1) use roughly
the same amount of memory after they've been running for a few minutes.
As I understand large binaries stick around until the system is under 'high
memory pressure' before being GCed. In my case the node uses up half the
swap, and all the RAM - is that not high enough? After that the OOM killer
jumps in and deals with it forcibly.
So... what can I do to debug this?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the erlang-questions