[erlang-questions] Help debugging binary memory usage

Mon Oct 17 15:12:46 CEST 2016

Seems you have been bitten by a topic I recently discussed.  This is a common pitfall with Erlang's share nothing heap.  Look at the recent command line args added http://erlang.org/doc/man/erl.html specifically hmax.

For more details check out the readme https://github.com/vans163/stargate, specifically the Websocket Example section.

Tl;Dr The most likely reason is that you have a long living process that is processing large binaries, large binaries fragment the shared process heap beyond GC cleanup. Only solution is to kill the long living process from time to time. 

    On Sunday, October 16, 2016 4:13 PM, Michael Martin <mmartin4242@REDACTED> wrote:

  Possible message leak? Check for unhandled messages, and log them. See the section on unhandled messages here.

 On 10/16/2016 03:05 AM, Paul Oliver wrote:

 Hey Luca, 
  Check out https://github.com/ferd/recon and http://dieswaytoofast.blogspot.com/2012/12/erlang-binaries-and-garbage-collection.html 
  Cheers, Paul.  
  On Sun, Oct 16, 2016 at 8:53 PM Luca Spiller <luca@REDACTED> wrote:

 Hi everyone, 
  One of our nodes seems to have a memory leak. After a couple of days the memory usage gets so high that the OOM killer kills it, and it's restarted.  It seems to have been going on for a few years, as it works fine the whole time so nobody noticed - it just uses up all the memory on the box. 
  A bit of background: the node is making hundreds of HTTP requests per second. There are a thousand or so worker processes responsible for this,  which make a request, inspect the response headers, and based on these start other processes. The process then sleeps for X time (seconds to minutes) and does the same again. The response body can be any size, but we don't care about that in the application (but I'd assume it gets converted to a binary by lhttpc). I should also note that some of the requests are made over TLS. 
  https://dl.dropboxusercontent.com/u/21557257/20161016-erl/observer-system.png

  This is the output from Observer, as you can see it shows that binaries are using 2569 MB of RAM. When the node has been restarted and running for a  few minutes this is usually < 10 MB. Most of the worker processes (95%+) which make the requests are started shortly after the node starts and hang around forever. 
  https://dl.dropboxusercontent.com/u/21557257/20161016-erl/observer-processes.png

  This is the process list from Observer, sorted by memory, it doesn't appear to show anything interesting. The worker processes (XXX:init/1)  use roughly the same amount of memory after they've been running for a few minutes. 
  As I understand large binaries stick around until the system is under 'high memory pressure' before being GCed. In my case the node uses up half the  swap, and all the RAM - is that not high enough? After that the OOM killer jumps in and deals with it forcibly. 
  So... what can I do to debug this?

  Thanks, 
  Luca Spiller  _______________________________________________
 erlang-questions mailing list
 erlang-questions@REDACTED
 http://erlang.org/mailman/listinfo/erlang-questions

 _______________________________________________
erlang-questions mailing list
erlang-questions@REDACTED
http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
erlang-questions@REDACTED
http://erlang.org/mailman/listinfo/erlang-questions

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20161017/99198c49/attachment.htm>