[erlang-questions] Garbage Collection, BEAM memory and Erlang memory
Robert Virding
rvirding@REDACTED
Thu Jan 22 18:00:37 CET 2015
One thing you can see is that the size of the binary data is growing. This
space contains the large binaries (> 64 bytes) which are sent in messages
between processes. While this means that the messages become (much) smaller
and faster to send it takes a much longer time to detect that they are no
longer alive and can be reclaimed. Basically it takes until all the
processes they have passed through does a full garbage collection. Setting
fullsweep_after to 0 and doing explicit garbage collects speeds up
reclaiming the binaries.
You could be much more selective in which processes you set fullsweep_after
to 0 and which ones you explicitly garbage collect.
I don't know if the is *the* problem but it is *a* problem you have.
Robert
On 22 January 2015 at 17:33, Roberto Ostinelli <roberto@REDACTED> wrote:
> Dear List,
> I'm having some troubles in pinpointing why a node is crashing due to
> memory issues.
> For info, when it crashes, it does not produce a crash dump. However I've
> monitored live and I've seen the .beam process eat up all memory until it
> abruptly exits.
>
> The system is a big router that relays data coming from TCP connections,
> into other TCP connections.
> I'm using cowboy as the HTTP server that initiates the long-lived TCP
> connections.
>
> I've done all the obvious:
>
> - Checked the States of my gen_servers and processes.
> - Checked my processes mailboxes (the ones with the longest queue have
> 1 item in the inbox).
> - My ETS table memory is constant (see below).
>
> I put the system under controlled load, and I can see with
> length(processes()). that my process count is stable, always around
> 120,000.
>
> I check the processes that are using most memory with this call:
>
> MostMemory = fun(N) ->
> lists:sublist(
> lists:sort(
> fun({_, _, V1}, {_, _, V2}) -> V1 >= V2 end,
> [try
> [{memory, Mem}, {registered_name, RegName}] =
> erlang:process_info(Pid, [memory, registered_name]),
> {Pid, RegName, Mem}
> catch _:_ ->
> {Pid, undefined, 0}
> end || Pid <- processes(), Pid =/= self()]
> ), N)
> end.
>
> Which always returns very similar numbers:
>
> 1> MostMemory(20).
> [{<0.96.0>,[],5180448},
> {<0.78.0>,tls_connection_sup,4525096},
> {<0.6.0>,error_logger,743776},
> {<0.7.0>,application_controller,372592},
> {<0.77.0>,ssl_manager,284640},
> {<0.11.0>,kernel_sup,176712},
> {<0.26.0>,code_server,176272},
> {<0.33.0>,[],143064},
> {<0.419.0>,[],142896},
> {<0.420.0>,[],142896},
> {<0.421.0>,[],142896},
> {<0.422.0>,[],142896},
> {<0.423.0>,[],142896},
> {<0.424.0>,[],142896},
> {<0.425.0>,[],142896},
> {<0.426.0>,[],142896},
> {<0.427.0>,[],142896},
> {<0.428.0>,[],142896},
> {<0.429.0>,[],142896},
> {<0.430.0>,[],142896}]
>
> See the last processes there with all identical memory? These are the
> processes handling the connections, and they stay stable with the same
> identical number throughout all test.
>
> I get the pid of the .beam process, and I check its reported RES memory
> with top -p beam-pid-here.
> I get my erlang memory with this simple call (I just convert everything to
> GB, thanks to Ferd and his article
> https://blog.heroku.com/archives/2013/11/7/logplex-down-the-rabbit-hole):
>
> [{K,V / math:pow(1024,3)} || {K,V} <- erlang:memory()].
>
> This is what I get (at random time intervals):
>
> - BEAM process RES memory:* 2.751 GB*
> - Erlang memory:
> [{total,2.11871287971735},
> {processes,1.6582859307527542},
> {processes_used,1.6581560596823692},
> {system,0.4604269489645958},
> {atom,4.000673070549965e-4},
> {atom_used,3.846092149615288e-4},
> {binary,0.29880597442388535},
> {code,0.009268132038414478},
> {ets,0.004808835685253143}]
>
> - BEAM process RES memory:* 3.039 GB*
> - Erlang memory:
> [{total,2.2570599243044853},
> {processes,1.7243007272481918},
> {processes_used,1.7241046279668808},
> {system,0.5327591970562935},
> {atom,4.000673070549965e-4},
> {atom_used,3.846092149615288e-4},
> {binary,0.37129393219947815},
> {code,0.009268132038414478},
> {ets,0.004808835685253143}]
>
> - BEAM process RES memory:* 3.630 GB*
> - Erlang memory:
> [{total,2.677028402686119},
> {processes,2.1421403884887695},
> {processes_used,2.142106533050537},
> {system,0.5348880141973495},
> {atom,4.000673070549965e-4},
> {atom_used,3.846092149615288e-4},
> {binary,0.37329262495040894},
> {code,0.009268132038414478},
> {ets,0.004808835685253143}]
>
> - BEAM process RES memory:* 3.807 GB*
> - Erlang memory:
> [{total,2.9233806803822517},
> {processes,2.277688652276993},
> {processes_used,2.277618482708931},
> {system,0.6456920281052589},
> {atom,4.000673070549965e-4},
> {atom_used,3.846092149615288e-4},
> {binary,0.48407071083784103},
> {code,0.009268132038414478},
> {ets,0.004808835685253143}]
>
>
> - BEAM process RES memory:* 4.026 GB*
> - Erlang memory:
> [{total,2.8762372359633446},
> {processes,2.100425034761429},
> {processes_used,2.1003194376826286},
> {system,0.7758122012019157},
> {atom,4.000673070549965e-4},
> {atom_used,3.846092149615288e-4},
> {binary,0.6143399104475975},
> {code,0.009268132038414478},
> {ets,0.004808835685253143}]
>
>
> - BEAM process RES memory:* 4.136 GB*
> - Erlang memory:
> [{total,2.9030912443995476},
> {processes,2.028559662401676},
> {processes_used,2.0283572375774384},
> {system,0.8745315819978714},
> {atom,4.000673070549965e-4},
> {atom_used,3.847004845738411e-4},
> {binary,0.7129654437303543},
> {code,0.00929991528391838},
> {ets,0.004809550940990448}]
>
>
> - BEAM process RES memory:* 4.222 GB*
> - Erlang memory:
> [{total,2.785604253411293},
> {processes,1.875294029712677},
> {processes_used,1.8752291351556778},
> {system,0.910310223698616},
> {atom,4.000673070549965e-4},
> {atom_used,3.847004845738411e-4},
> {binary,0.7487552836537361},
> {code,0.00929991528391838},
> {ets,0.004809550940990448}]
>
>
> As you can see, at the beginning both the BEAM RES memory and the total
> Erlang memory increase, but after a while it becomes clear that the BEAM
> process memory keeps increasing while the memory reported as used by Erlang
> stabilizes, and even decreases.
> Erlang reported memory never surpasses 3 GB.
>
> At this point I tried forcing a Garbage Collection:
>
> [erlang:garbage_collect(Pid) || Pid <- processes()]
>
> After that, we went back to:
>
> - BEAM process RES memory:* 3.336 GB*
> - Erlang memory:
> [{total,1.9107630401849747},
> {processes,1.5669479593634605},
> {processes_used,1.5668926388025284},
> {system,0.34381508082151413},
> {atom,4.000673070549965e-4},
> {atom_used,3.847004845738411e-4},
> {binary,0.18235664814710617},
> {code,0.00929991528391838},
> {ets,0.004809550940990448}]
>
> However after that, I let the system go and it kept on having the same
> behavior (and increasing the BEAM memory).
>
> What puzzles me is that you can clearly see that:
>
> - The total memory used by processes is increasing, however the top
> processes always use the same amount of memory (and the process count is
> always stable).
> - Binary consumption also increases, but in proportion with process
> memory (and my data is <64K so I don't anticipate it being an issue of
> Refc-binaries not being garbage collected).
>
> I already hibernate most of the long-term open connections.
> I also added a periodic garbage collector on the main router, since it
> touches all the binaries that go through it, to ensure that all
> Refc-binaries that hold a reference to the router are garbage collected.
>
> So I tried the hard approach, and I've set fullsweep_after to 0 as a
> system flag (passed in as an environment variable -env
> ERL_FULLSWEEP_AFTER 0).
>
> After this, I could see notable improvements:
>
> - BEAM process RES memory:* 2.049 GB*
> - Erlang memory:
> [{total,1.597476489841938},
> {processes,1.2037805244326591},
> {processes_used,1.2036690935492516},
> {system,0.39369596540927887},
> {atom,4.000673070549965e-4},
> {atom_used,3.846092149615288e-4},
> {binary,0.2321353331208229},
> {code,0.009268132038414478},
> {ets,0.004821933805942535}]
>
> - BEAM process RES memory:* 1.919 GB*
> - Erlang memory:
> [{total,1.549286112189293},
> {processes,1.1740453317761421},
> {processes_used,1.1739420965313911},
> {system,0.3752407804131508},
> {atom,4.000673070549965e-4},
> {atom_used,3.846092149615288e-4},
> {binary,0.2134672999382019},
> {code,0.009268132038414478},
> {ets,0.004821933805942535}]
>
> - BEAM process RES memory:* 2.004 GB*
> - Erlang memory:
> [{total,1.6023957282304764},
> {processes,1.2192133665084839},
> {processes_used,1.219102293252945},
> {system,0.3831823617219925},
> {atom,4.000673070549965e-4},
> {atom_used,3.846092149615288e-4},
> {binary,0.22155668586492538},
> {code,0.009268132038414478},
> {ets,0.004821933805942535}]
>
> - BEAM process RES memory:* 2.456 GB*
> - Erlang memory:
> [{total,1.7860298827290535},
> {processes,1.4158401936292648},
> {processes_used,1.4157484397292137},
> {system,0.37018968909978867},
> {atom,4.000673070549965e-4},
> {atom_used,3.846092149615288e-4},
> {binary,0.20867645740509033},
> {code,0.009268132038414478},
> {ets,0.004821933805942535}]
>
> - BEAM process RES memory:* 2.455 GB*
> - Erlang memory:
> [{total,1.8919306173920631},
> {processes,1.4726912006735802},
> {processes_used,1.4726523533463478},
> {system,0.41923941671848297},
> {atom,4.000673070549965e-4},
> {atom_used,3.846092149615288e-4},
> {binary,0.25766071677207947},
> {code,0.009268132038414478},
> {ets,0.004821933805942535}]
>
>
> However, the down size to this is obviously that the CPU load increased
> almost of a point.
>
> I also have a GC "guardian" similar to the one that Fred implemented in
> Heroku's logplex:
> https://github.com/heroku/logplex/blob/master/src/logplex_leak.erl
>
> But this obviously is a guard, not a solution per se.
>
> Can anyone give me some pointers on how I can process to identify what is
> going on?
>
> Thank you,
> r.
>
>
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20150122/f24f10ed/attachment.htm>
More information about the erlang-questions
mailing list