[erlang-questions] Garbage Collection, BEAM memory and Erlang memory

Robert Virding rvirding@REDACTED
Thu Jan 22 18:00:37 CET 2015


One thing you can see is that the size of the binary data is growing. This
space contains the large binaries (> 64 bytes) which are sent in messages
between processes. While this means that the messages become (much) smaller
and faster to send it takes a much longer time to detect that they are no
longer alive and can be reclaimed. Basically it takes until all the
processes they have passed through does a full garbage collection. Setting
fullsweep_after to 0 and doing explicit garbage collects speeds up
reclaiming the binaries.

You could be much more selective in which processes you set fullsweep_after
to 0 and which ones you explicitly garbage collect.

I don't know if the is *the* problem but it is *a* problem you have.

Robert



On 22 January 2015 at 17:33, Roberto Ostinelli <roberto@REDACTED> wrote:

> Dear List,
> I'm having some troubles in pinpointing why a node is crashing due to
> memory issues.
> For info, when it crashes, it does not produce a crash dump. However I've
> monitored live and I've seen the .beam process eat up all memory until it
> abruptly exits.
>
> The system is a big router that relays data coming from TCP connections,
> into other TCP connections.
> I'm using cowboy as the HTTP server that initiates the long-lived TCP
> connections.
>
> I've done all the obvious:
>
>    - Checked the States of my gen_servers and processes.
>    - Checked my processes mailboxes (the ones with the longest queue have
>    1 item in the inbox).
>    - My ETS table memory is constant (see below).
>
> I put the system under controlled load, and I can see with
> length(processes()). that my process count is stable, always around
> 120,000.
>
> I check the processes that are using most memory with this call:
>
> MostMemory = fun(N) ->
>   lists:sublist(
>     lists:sort(
>       fun({_, _, V1}, {_, _, V2}) -> V1 >= V2 end,
>       [try
>         [{memory, Mem}, {registered_name, RegName}] =
> erlang:process_info(Pid, [memory, registered_name]),
>         {Pid, RegName, Mem}
>       catch _:_ ->
>         {Pid, undefined, 0}
>       end || Pid <- processes(), Pid =/= self()]
>     ), N)
>   end.
>
> Which always returns very similar numbers:
>
> 1> MostMemory(20).
> [{<0.96.0>,[],5180448},
>  {<0.78.0>,tls_connection_sup,4525096},
>  {<0.6.0>,error_logger,743776},
>  {<0.7.0>,application_controller,372592},
>  {<0.77.0>,ssl_manager,284640},
>  {<0.11.0>,kernel_sup,176712},
>  {<0.26.0>,code_server,176272},
>  {<0.33.0>,[],143064},
>  {<0.419.0>,[],142896},
>  {<0.420.0>,[],142896},
>  {<0.421.0>,[],142896},
>  {<0.422.0>,[],142896},
>  {<0.423.0>,[],142896},
>  {<0.424.0>,[],142896},
>  {<0.425.0>,[],142896},
>  {<0.426.0>,[],142896},
>  {<0.427.0>,[],142896},
>  {<0.428.0>,[],142896},
>  {<0.429.0>,[],142896},
>  {<0.430.0>,[],142896}]
>
> See the last processes there with all identical memory? These are the
> processes handling the connections, and they stay stable with the same
> identical number throughout all test.
>
> I get the pid of the .beam process, and I check its reported RES memory
> with top -p beam-pid-here.
> I get my erlang memory with this simple call (I just convert everything to
> GB, thanks to Ferd and his article
> https://blog.heroku.com/archives/2013/11/7/logplex-down-the-rabbit-hole):
>
> [{K,V / math:pow(1024,3)} || {K,V} <- erlang:memory()].
>
> This is what I get (at random time intervals):
>
> - BEAM process RES memory:* 2.751 GB*
> - Erlang memory:
> [{total,2.11871287971735},
>  {processes,1.6582859307527542},
>  {processes_used,1.6581560596823692},
>  {system,0.4604269489645958},
>  {atom,4.000673070549965e-4},
>  {atom_used,3.846092149615288e-4},
>  {binary,0.29880597442388535},
>  {code,0.009268132038414478},
>  {ets,0.004808835685253143}]
>
> - BEAM process RES memory:* 3.039 GB*
> - Erlang memory:
> [{total,2.2570599243044853},
>  {processes,1.7243007272481918},
>  {processes_used,1.7241046279668808},
>  {system,0.5327591970562935},
>  {atom,4.000673070549965e-4},
>  {atom_used,3.846092149615288e-4},
>  {binary,0.37129393219947815},
>  {code,0.009268132038414478},
>  {ets,0.004808835685253143}]
>
> - BEAM process RES memory:* 3.630 GB*
> - Erlang memory:
> [{total,2.677028402686119},
>  {processes,2.1421403884887695},
>  {processes_used,2.142106533050537},
>  {system,0.5348880141973495},
>  {atom,4.000673070549965e-4},
>  {atom_used,3.846092149615288e-4},
>  {binary,0.37329262495040894},
>  {code,0.009268132038414478},
>  {ets,0.004808835685253143}]
>
> - BEAM process RES memory:* 3.807 GB*
> - Erlang memory:
> [{total,2.9233806803822517},
>  {processes,2.277688652276993},
>  {processes_used,2.277618482708931},
>  {system,0.6456920281052589},
>  {atom,4.000673070549965e-4},
>  {atom_used,3.846092149615288e-4},
>  {binary,0.48407071083784103},
>  {code,0.009268132038414478},
>  {ets,0.004808835685253143}]
>
>
> - BEAM process RES memory:* 4.026 GB*
> - Erlang memory:
> [{total,2.8762372359633446},
>  {processes,2.100425034761429},
>  {processes_used,2.1003194376826286},
>  {system,0.7758122012019157},
>  {atom,4.000673070549965e-4},
>  {atom_used,3.846092149615288e-4},
>  {binary,0.6143399104475975},
>  {code,0.009268132038414478},
>  {ets,0.004808835685253143}]
>
>
> - BEAM process RES memory:* 4.136 GB*
> - Erlang memory:
> [{total,2.9030912443995476},
>  {processes,2.028559662401676},
>  {processes_used,2.0283572375774384},
>  {system,0.8745315819978714},
>  {atom,4.000673070549965e-4},
>  {atom_used,3.847004845738411e-4},
>  {binary,0.7129654437303543},
>  {code,0.00929991528391838},
>  {ets,0.004809550940990448}]
>
>
> - BEAM process RES memory:* 4.222 GB*
> - Erlang memory:
> [{total,2.785604253411293},
>  {processes,1.875294029712677},
>  {processes_used,1.8752291351556778},
>  {system,0.910310223698616},
>  {atom,4.000673070549965e-4},
>  {atom_used,3.847004845738411e-4},
>  {binary,0.7487552836537361},
>  {code,0.00929991528391838},
>  {ets,0.004809550940990448}]
>
>
> As you can see, at the beginning both the BEAM RES memory and the total
> Erlang memory increase, but after a while it becomes clear that the BEAM
> process memory keeps increasing while the memory reported as used by Erlang
> stabilizes, and even decreases.
> Erlang reported memory never surpasses 3 GB.
>
> At this point I tried forcing a Garbage Collection:
>
> [erlang:garbage_collect(Pid) || Pid <- processes()]
>
> After that, we went back to:
>
> - BEAM process RES memory:* 3.336 GB*
> - Erlang memory:
> [{total,1.9107630401849747},
>  {processes,1.5669479593634605},
>  {processes_used,1.5668926388025284},
>  {system,0.34381508082151413},
>  {atom,4.000673070549965e-4},
>  {atom_used,3.847004845738411e-4},
>  {binary,0.18235664814710617},
>  {code,0.00929991528391838},
>  {ets,0.004809550940990448}]
>
> However after that, I let the system go and it kept on having the same
> behavior (and increasing the BEAM memory).
>
> What puzzles me is that you can clearly see that:
>
>    - The total memory used by processes is increasing, however the top
>    processes always use the same amount of memory (and the process count is
>    always stable).
>    - Binary consumption also increases, but in proportion with process
>    memory (and my data is <64K so I don't anticipate it being an issue of
>    Refc-binaries not being garbage collected).
>
> I already hibernate most of the long-term open connections.
> I also added a periodic garbage collector on the main router, since it
> touches all the binaries that go through it, to ensure that all
> Refc-binaries that hold a reference to the router are garbage collected.
>
> So I tried the hard approach, and I've set fullsweep_after to 0 as a
> system flag (passed in as an environment variable -env
> ERL_FULLSWEEP_AFTER 0).
>
> After this, I could see notable improvements:
>
> - BEAM process RES memory:* 2.049 GB*
> - Erlang memory:
> [{total,1.597476489841938},
>  {processes,1.2037805244326591},
>  {processes_used,1.2036690935492516},
>  {system,0.39369596540927887},
>  {atom,4.000673070549965e-4},
>  {atom_used,3.846092149615288e-4},
>  {binary,0.2321353331208229},
>  {code,0.009268132038414478},
>  {ets,0.004821933805942535}]
>
> - BEAM process RES memory:* 1.919 GB*
> - Erlang memory:
> [{total,1.549286112189293},
>  {processes,1.1740453317761421},
>  {processes_used,1.1739420965313911},
>  {system,0.3752407804131508},
>  {atom,4.000673070549965e-4},
>  {atom_used,3.846092149615288e-4},
>  {binary,0.2134672999382019},
>  {code,0.009268132038414478},
>  {ets,0.004821933805942535}]
>
> - BEAM process RES memory:* 2.004 GB*
> - Erlang memory:
> [{total,1.6023957282304764},
>  {processes,1.2192133665084839},
>  {processes_used,1.219102293252945},
>  {system,0.3831823617219925},
>  {atom,4.000673070549965e-4},
>  {atom_used,3.846092149615288e-4},
>  {binary,0.22155668586492538},
>  {code,0.009268132038414478},
>  {ets,0.004821933805942535}]
>
> - BEAM process RES memory:* 2.456 GB*
> - Erlang memory:
> [{total,1.7860298827290535},
>  {processes,1.4158401936292648},
>  {processes_used,1.4157484397292137},
>  {system,0.37018968909978867},
>  {atom,4.000673070549965e-4},
>  {atom_used,3.846092149615288e-4},
>  {binary,0.20867645740509033},
>  {code,0.009268132038414478},
>  {ets,0.004821933805942535}]
>
> - BEAM process RES memory:* 2.455 GB*
> - Erlang memory:
> [{total,1.8919306173920631},
>  {processes,1.4726912006735802},
>  {processes_used,1.4726523533463478},
>  {system,0.41923941671848297},
>  {atom,4.000673070549965e-4},
>  {atom_used,3.846092149615288e-4},
>  {binary,0.25766071677207947},
>  {code,0.009268132038414478},
>  {ets,0.004821933805942535}]
>
>
> However, the down size to this is obviously that the CPU load increased
> almost of a point.
>
> I also have a GC "guardian" similar to the one that Fred implemented in
> Heroku's logplex:
> https://github.com/heroku/logplex/blob/master/src/logplex_leak.erl
>
> But this obviously is a guard, not a solution per se.
>
> Can anyone give me some pointers on how I can process to identify what is
> going on?
>
> Thank you,
> r.
>
>
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20150122/f24f10ed/attachment.htm>


More information about the erlang-questions mailing list