[erlang-questions] output to standard_error crashes sometimes on R18
Andreas Schultz
aschultz@REDACTED
Thu Aug 6 12:41:22 CEST 2015
Hi,
Since I upgraded to 18.0.2, my Erlang application Linux sometimes crash
with egain in standard error. The crash is triggered by something like this:
io:format(standard_error, "started ~s~n", [application]).
The actual crash looks like this:
** Generic server standard_error_sup terminating
** Last message in was {'EXIT',<0.28.0>,eagain}
** When Server state == {state,standard_error,undefined,<0.28.0>,{local,standard_error_sup}}
** Reason for termination ==
** eagain
2015-08-06 09:14:41 =CRASH REPORT====
crasher:
initial call: supervisor_bridge:standard_error/1
pid: <0.27.0>
registered_name: standard_error_sup
exception exit: {eagain,[{gen_server,terminate,7,[{file,"gen_server.erl"},{line,826}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}
ancestors: [kernel_sup,<0.10.0>]
messages: []
links: [<0.11.0>]
dictionary: []
trap_exit: true
status: running
heap_size: 376
stack_size: 27
reductions: 155
neighbours:
2015-08-06 09:14:41 =SUPERVISOR REPORT====
Supervisor: {local,kernel_sup}
Context: child_terminated
Reason: eagain
Offender: [{pid,<0.27.0>},{id,standard_error},{mfargs,{standard_error,start_link,[]}},{restart_type,temporary},{shutdown,2000},{child_type,supervisor}]
I have straced erl and it does get an EAGAIN on fd 2:
14209 09:14:41.170429 writev(2, [{"started gen_listener_tcp\n", 25}], 1 <unfinished ...>
14209 09:14:41.170446 <... writev resumed> ) = -1 EAGAIN (Resource temporarily unavailable)
So whats going on here???
After some searching I found a nice explanation about duped fd's and file status flags
here: http://stackoverflow.com/a/9677130 (a quick test program verifies that this indeed
happens on Linux).
So according to that, setting any of the stdio file descriptors to non-blocking, would
set all of them to non-blocking. And sure enough, strace shows this:
14170 09:14:39.787004 fcntl(1, F_SETFL, O_RDWR|O_NONBLOCK|O_LARGEFILE) = 0
(pid 14170 is the main erl process, pid 14209 one of the threads in erl)
I have stopped digging through this at that point. Clearly, the standard_error
process or the underlying port driver should handle the EAGAIN gracefully, but
fail to do so.
Andreas
More information about the erlang-questions
mailing list