[erlang-questions] output to standard_error crashes sometimes on R18
Lukas Larsson
garazdawi@REDACTED
Fri Aug 7 18:58:50 CEST 2015
Hello,
Many thanks for the detailed analysis. I've gotten the similar reports from a couple of places.
I'll take a look at it when I get back from summer vacation.
Lukas
> On 6 aug 2015, at 12:41, Andreas Schultz <aschultz@REDACTED> wrote:
>
> Hi,
>
> Since I upgraded to 18.0.2, my Erlang application Linux sometimes crash
> with egain in standard error. The crash is triggered by something like this:
>
> io:format(standard_error, "started ~s~n", [application]).
>
> The actual crash looks like this:
>
> ** Generic server standard_error_sup terminating
> ** Last message in was {'EXIT',<0.28.0>,eagain}
> ** When Server state == {state,standard_error,undefined,<0.28.0>,{local,standard_error_sup}}
> ** Reason for termination ==
> ** eagain
> 2015-08-06 09:14:41 =CRASH REPORT====
> crasher:
> initial call: supervisor_bridge:standard_error/1
> pid: <0.27.0>
> registered_name: standard_error_sup
> exception exit: {eagain,[{gen_server,terminate,7,[{file,"gen_server.erl"},{line,826}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}
> ancestors: [kernel_sup,<0.10.0>]
> messages: []
> links: [<0.11.0>]
> dictionary: []
> trap_exit: true
> status: running
> heap_size: 376
> stack_size: 27
> reductions: 155
> neighbours:
> 2015-08-06 09:14:41 =SUPERVISOR REPORT====
> Supervisor: {local,kernel_sup}
> Context: child_terminated
> Reason: eagain
> Offender: [{pid,<0.27.0>},{id,standard_error},{mfargs,{standard_error,start_link,[]}},{restart_type,temporary},{shutdown,2000},{child_type,supervisor}]
>
> I have straced erl and it does get an EAGAIN on fd 2:
>
> 14209 09:14:41.170429 writev(2, [{"started gen_listener_tcp\n", 25}], 1 <unfinished ...>
> 14209 09:14:41.170446 <... writev resumed> ) = -1 EAGAIN (Resource temporarily unavailable)
>
> So whats going on here???
>
> After some searching I found a nice explanation about duped fd's and file status flags
> here: http://stackoverflow.com/a/9677130 (a quick test program verifies that this indeed
> happens on Linux).
>
> So according to that, setting any of the stdio file descriptors to non-blocking, would
> set all of them to non-blocking. And sure enough, strace shows this:
>
> 14170 09:14:39.787004 fcntl(1, F_SETFL, O_RDWR|O_NONBLOCK|O_LARGEFILE) = 0
>
> (pid 14170 is the main erl process, pid 14209 one of the threads in erl)
>
> I have stopped digging through this at that point. Clearly, the standard_error
> process or the underlying port driver should handle the EAGAIN gracefully, but
> fail to do so.
>
> Andreas
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
More information about the erlang-questions
mailing list