[erlang-questions] eheap cannot allocate for which process?
Mon Mar 17 15:57:46 CET 2014
On Mon, Mar 17, 2014 at 08:51:13AM +0000, J?zsef B?rces wrote:
> I receive the classic ?eheap_alloc: Cannot allocate?? message. It wants to
> allocate ~1GB memory and that fails. That is fine, I am doing something wrong.
> So I took the crash dump and tried to find out which one of my processes is
> the guilty one.
Some months ago I've had a similar problem: application running happily
with ~400Mb RAM (on 2Gb RAM host) mostly consisting of four "major
consumers" ("huge state" FSMs, ~80Mb each) started crashing with the
same eheap_alloc: Cannot allocate 729810240 bytes of memory (of type "heap").
message. After some investigation (and switching from stock SASL error_logger
to lager) I found that "guilty" processes were error_logger and gproc
and that this problem is a bit deeper.
Some screens: after FSM crash, restart and rebuilding its state I saw:
Pid Initial Call Heap Reds Msgs
Registered Current Function Stack
<0.5.0> gen_event:init_it/6 59786060 5199822 0
error_logger gen_event:fetch_msg/5 8
<0.47.0> gproc:init/1 19590700 1650033 0
gproc gen_server:loop/6 9
<0.221.0> ebgp_conn:init/1 19590700 20382184 0
where 0.221.0 is my "fat FSM" after crash, restart and "state download".
but after manual call to garbage_collect(pid(0, 5, 0)) heap usage
and the same memory decrease happened with gproc.
How can I explain VM crash (not 100% sure, still consider myself as a
novice in Erlang): when process crashes, it's state sent to all processes
monitoring this one (gproc in this case) and to error_logger. State is big
in my case (and in yours too). And there are no shared memory in Erlang.
So, it's pretty logical that state of failed process was duplicated (may
be even triplicated if copy happens while original process heap is not
freed at this moment) and this duplication can cause eheap error.
Especially in case when more than one "fat" process crashes instantly..
Lesson learned: while "let it crash" approach is generally good, it is
not so good with "fat" processes, especially with heavily linked/monitored
PS: and error_logger and gproc are of course not guilty. They just
efficient enough, so their garbage collector was not yet called.
> Unfortunately, I cannot tell it from the crash dump.
> The memory section says:
> total: 15447352528
> processes: 15140232809
> processes_used: 15140005610
> system: 307119719
> atom: 512601
> atom_used: 496586
> binary: 148574400
> code: 21228007
> ets: 119988984
> I have 16GB RAM, so the processes use almost all. There are 4010 processes. 1
> garbing, 31 scheduled, 3978 waiting. If I sum stack+heap of all the processes
> then I get ~700MB. That is very far from 16GB. Here are the top 10 stack+heap
> Pid State Reductions Stack+heap MsgQ Length
> Garbing (limited 1,508,838,180 145,962,050 1
> <0.21060.67> info)
> <0.25689.27> Waiting 86,670,344 145,962,050 0
> <0.10003.68> Waiting 1,363,039 38,263,080 0
> <0.15943.66> Waiting 1,882,465,380 30,610,465 0
> <0.15879.68> Waiting 471,549 30,610,465 0
> <0.31854.67> Waiting 154,500,777 24,488,375 0
> <0.16221.68> Waiting 262,114 24,488,375 0
> <0.16628.68> Waiting 117,268 24,488,375 0
> <0.15878.68> Waiting 453,490 19,590,700 0
> <0.16235.68> Waiting 181,968 19,590,700 0
> Any ideas how to tell which process needs ~1GB memory?
> erlang-questions mailing list
In theory, there is no difference between theory and practice.
But, in practice, there is.
More information about the erlang-questions