[erlang-questions] Huge erl_crash.dump (2 gigs) - looking for advice

Robert Raschke rtrlists@REDACTED
Thu Jul 3 15:43:53 CEST 2014


Hi David,

one thing to be aware of when using sasl and error_logger is that process
crashes get logged by sasl with the complete state of the process that just
died, and error_logger tries to pretty print that state; this can take huge
amounts of memory if the crashed process state is large.

I would recommend monitoring processes with high memory usage and figuring
out the usage patterns. In general it is a good idea to try and limit the
amount of state a process holds. But this is obviously application
dependent.

Hope this helps point in the right direction,
Robby



On 3 July 2014 14:22, David Welton <davidnwelton@REDACTED> wrote:

> The kind people in #erlang have given me some suggestions, but I'm
> going to write here to appeal to a wider audience.  I've got a huge
> erl_crash.dump, that's larger than 2 gigs, and I'm trying to figure
> out anything I can about the crash, which came out of the blue.
>
> Slogan: eheap_alloc: Cannot allocate 1080371408 bytes of memory (of
> type "heap_frag").
> System version: Erlang R16B03-1 (erts-5.10.4) [source] [64-bit]
> [smp:4:4] [async-threads:10] [kernel-poll:true]
> Compiled: Sun Mar 16 05:25:57 2014
> Taints: crypto
>
> Digging further, I found this:
>
> =proc:<0.6.0>
> State: Running
> Name: error_logger
> Spawned as: proc_lib:init_p/5
> Last scheduled in for: gen_event:handle_msg/5
> Spawned by: <0.2.0>
> Started: Wed Jul  2 11:45:41 2014
> Message queue length: 1
> Number of heap fragments: 0
> Heap fragment data: 0
> Link list: [<0.0.0>, <0.98.0>, <0.32.0>]
> Reductions: 19480050
> Stack+heap: 137319567
> OldHeap: 28690
> Heap unused: 2273119
> OldHeap unused: 2581
> Memory: 1098789880
> Program counter: 0x00007f1e51f3ed70 (gen_event:handle_msg/5 + 8)
> CP: 0x0000000000000000 (invalid)
>
> So that's what actually blew things up.  But how did it get all that
> memory?
>
> =proc_dictionary:<0.6.0>
> H7F1E4D062F68
> H7F1E4D062F80
> =proc_stack:<0.6.0>
> 0x00007f1dccc223a0:SReturn addr 0x5204B398 (gen_server:do_cast/2 + 128)
> y0:H7F1DCBACA990
> y1:AF:lager_crash_log
> y2:SCatch 0x5204D188 (gen_server:do_send/2 + 112)
> 0x00007f1dccc223c0:SReturn addr 0x4F2936C8
> (error_logger_lager_h:log_event/2 + 10328)
> 0x00007f1dccc223c8:SReturn addr 0x51F422F0 (gen_event:server_update/4 +
> 272)
> y0:N
> y1:N
> y2:N
> y3:N
> y4:A8:emulator
> y5:H7F1DCBACA8C8
> y6:H7F1DCBACA888
> y7:H7F1DCBACA930
> 0x00007f1dccc22410:SReturn addr 0x51F41ED0 (gen_event:server_notify/4 +
> 136)
> y0:AC:error_logger
> y1:H7F1DCBACA8F8
> y2:H7F1D8B478038
> y3:H7F1D8B478068
> y4:A14:error_logger_lager_h
> y5:SCatch 0x51F422F0 (gen_event:server_update/4 + 272)
> 0x00007f1dccc22448:SReturn addr 0x51F3EE68 (gen_event:handle_msg/5 + 256)
> y0:AC:error_logger
> y1:AC:handle_event
> y2:H7F1DCBACA8F8
> y3:N
> 0x00007f1dccc22470:SReturn addr 0x51F359B0 (proc_lib:init_p_do_apply/3 +
> 56)
> y0:N
> y1:AC:error_logger
> y2:P<0.2.0>
> 0x00007f1dccc22490:SReturn addr 0x842688 (<terminate process normally>)
> y0:SCatch 0x51F359D0 (proc_lib:init_p_do_apply/3 + 88)
>
> The strack trace seems to indicate that it's trying to log something;
> perhaps someone sent it a very  large message?  But I wonder where it
> came from in the first place...
>
> I tried using crashdump_viewer, but it chokes when I click on the
> process and it tries to load up the enormous =proc_heap section:
>
> =proc_heap:<0.6.0>
> 7F1DCBACA990:t2:A9:$gen_cast,H7F1DCBACA978
> 7F1DCBACA978:t2:A3:log,H7F1DCBACA8F8
> 7F1DCBACA8F8:t3:A5:error,A6:noproc,H7F1DCBACA8D8
> 7F1DCBACA8D8:t3:A8:emulator,H7F1DCBACA8C8,H7F1DCBACA888
> 7F1DCBACA888:lH7F1DCBACA878|N
> 7F1DCBACA878:lI39|H7F1DCBACA868
> 7F1DCBACA868:lI103|H7F1DCBACA858
> 7F1DCBACA858:lI115|H7F1DCBACA848
> 7F1DCBACA848:lI100|H7F1DCBACA838
> 7F1DCBACA838:lI95|H7F1DCBACA828
> 7F1DCBACA828:lI119|H7F1DCBACA818
> 7F1DCBACA818:lI101|H7F1DCBACA808
> 7F1DCBACA808:lI98|H7F1DCBACA7F8
> 7F1DCBACA7F8:lI64|H7F1DCBACA7E8
> 7F1DCBACA7E8:lI108|H7F1DCBACA7D8
> 7F1DCBACA7D8:lI111|H7F1DCBACA7C8
> 7F1DCBACA7C8:lI99|H7F1DCBACA7B8
> 7F1DCBACA7B8:lI97|H7F1DCBACA7A8
> ... and on and on for thousands of lines ...
>
> davidw@REDACTED:~$ grep -n '^=proc_heap' erl_crash.dump
> 15835:=proc_heap:<0.0.0>
> 16133:=proc_heap:<0.3.0>
> 17424:=proc_heap:<0.6.0>
> 67540816:=proc_heap:<0.7.0>
>
>
> Incidentally,  what *are* all those lines like
>
> 7F1DCBACA7F8:lI64|H7F1DCBACA7E8
>
> anyway?
>
> Is there any way to hack something up that will process those 67
> million lines to tell me something useful about what's going on?
>
> Other ideas about how to extract something meaningful about who
> plopped this massive message in the logger?
>
> Thank you
> --
> David N. Welton
>
> http://www.welton.it/davidw/
>
> http://www.dedasys.com/
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20140703/a0ba70bc/attachment.htm>


More information about the erlang-questions mailing list