[erlang-questions] eheap cannot allocate for which process?
Fred Hebert
mononcqc@REDACTED
Mon Mar 17 16:48:46 CET 2014
For this issue, see the format_status callback OTP behaviours contain:
http://www.erlang.org/doc/man/gen_server.html#Module:format_status-2
They should let you do things the way you want and reduce the size of
the messages logged.
More generally, however, if you need to dig into a crash dump, I
recommend using the scripts I added to recon:
https://github.com/ferd/recon/tree/master/script One of them will do a
quick diagnostic over the crashdump and output the info I've always
found useful while debugging, and the awk script will output all
functions that were running if mailboxes were huge.
If the node is still running when you see problems appearing, I'd
suggest looking into recon as a whole (docs: http://ferd.github.io/recon/)
and see if the issues can be related to the total memory size and how
it's allocated (see recon_alloc), binary memory "leaks" (see
recon:bin_leak/1), and so on.
The binary memory stuff wouldn't necessarily surprise me if you find out
GCs tend to solve problems, but that's always system-specific.
Regards,
Fred.
On 03/17, Alexandre Snarskii wrote:
> On Mon, Mar 17, 2014 at 08:51:13AM +0000, J?zsef B?rces wrote:
> > I receive the classic ?eheap_alloc: Cannot allocate?? message. It wants to
> > allocate ~1GB memory and that fails. That is fine, I am doing something wrong.
> > So I took the crash dump and tried to find out which one of my processes is
> > the guilty one.
>
> Some months ago I've had a similar problem: application running happily
> with ~400Mb RAM (on 2Gb RAM host) mostly consisting of four "major
> consumers" ("huge state" FSMs, ~80Mb each) started crashing with the
> same eheap_alloc: Cannot allocate 729810240 bytes of memory (of type "heap").
> message. After some investigation (and switching from stock SASL error_logger
> to lager) I found that "guilty" processes were error_logger and gproc
> and that this problem is a bit deeper.
>
> Some screens: after FSM crash, restart and rebuilding its state I saw:
>
> Pid Initial Call Heap Reds Msgs
> Registered Current Function Stack
> <0.5.0> gen_event:init_it/6 59786060 5199822 0
> error_logger gen_event:fetch_msg/5 8
> <0.47.0> gproc:init/1 19590700 1650033 0
> gproc gen_server:loop/6 9
> <0.221.0> ebgp_conn:init/1 19590700 20382184 0
> gen_fsm:loop/7 10
>
> where 0.221.0 is my "fat FSM" after crash, restart and "state download".
> process_info(pid(0,5,0)) shows
>
> {total_heap_size,107614910},
> {heap_size,59786060},
> {stack_size,8},
> {reductions,5199822},
> {garbage_collection,[{min_bin_vheap_size,46368},
> {min_heap_size,233},
> {fullsweep_after,65535},
> {minor_gcs,1}]},
>
> but after manual call to garbage_collect(pid(0, 5, 0)) heap usage
> decreased significantly:
>
> {total_heap_size,233},
> {heap_size,233},
> {stack_size,8},
> {reductions,5199822},
>
> and the same memory decrease happened with gproc.
>
> How can I explain VM crash (not 100% sure, still consider myself as a
> novice in Erlang): when process crashes, it's state sent to all processes
> monitoring this one (gproc in this case) and to error_logger. State is big
> in my case (and in yours too). And there are no shared memory in Erlang.
> So, it's pretty logical that state of failed process was duplicated (may
> be even triplicated if copy happens while original process heap is not
> freed at this moment) and this duplication can cause eheap error.
> Especially in case when more than one "fat" process crashes instantly..
>
> Lesson learned: while "let it crash" approach is generally good, it is
> not so good with "fat" processes, especially with heavily linked/monitored
> "fat" processes.
>
> PS: and error_logger and gproc are of course not guilty. They just
> efficient enough, so their garbage collector was not yet called.
>
>
> >
> >
> >
> > Unfortunately, I cannot tell it from the crash dump.
> >
> >
> >
> > The memory section says:
> >
> > =memory
> >
> > total: 15447352528
> >
> > processes: 15140232809
> >
> > processes_used: 15140005610
> >
> > system: 307119719
> >
> > atom: 512601
> >
> > atom_used: 496586
> >
> > binary: 148574400
> >
> > code: 21228007
> >
> > ets: 119988984
> >
> >
> >
> > I have 16GB RAM, so the processes use almost all. There are 4010 processes. 1
> > garbing, 31 scheduled, 3978 waiting. If I sum stack+heap of all the processes
> > then I get ~700MB. That is very far from 16GB. Here are the top 10 stack+heap
> > processes:
> >
> > Pid State Reductions Stack+heap MsgQ Length
> >
> > Garbing (limited 1,508,838,180 145,962,050 1
> > <0.21060.67> info)
> >
> > <0.25689.27> Waiting 86,670,344 145,962,050 0
> >
> > <0.10003.68> Waiting 1,363,039 38,263,080 0
> >
> > <0.15943.66> Waiting 1,882,465,380 30,610,465 0
> >
> > <0.15879.68> Waiting 471,549 30,610,465 0
> >
> > <0.31854.67> Waiting 154,500,777 24,488,375 0
> >
> > <0.16221.68> Waiting 262,114 24,488,375 0
> >
> > <0.16628.68> Waiting 117,268 24,488,375 0
> >
> > <0.15878.68> Waiting 453,490 19,590,700 0
> >
> > <0.16235.68> Waiting 181,968 19,590,700 0
> >
> >
> >
> >
> > Any ideas how to tell which process needs ~1GB memory?
> >
> >
> >
> > Thanks,
> >
> > Jozsef
> >
>
> > _______________________________________________
> > erlang-questions mailing list
> > erlang-questions@REDACTED
> > http://erlang.org/mailman/listinfo/erlang-questions
>
>
> --
> In theory, there is no difference between theory and practice.
> But, in practice, there is.
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
More information about the erlang-questions
mailing list