[erlang-questions] output to standard_error crashes sometimes on R18

Steve Vinoski vinoski@REDACTED
Mon Aug 10 14:48:38 CEST 2015


This fixes it:

https://github.com/erlang/otp/pull/807

--steve


On Fri, Aug 7, 2015 at 12:58 PM, Lukas Larsson <garazdawi@REDACTED> wrote:

> Hello,
>
> Many thanks for the detailed analysis. I've gotten the similar reports
> from a couple of places.
>
> I'll take a look at it when I get back from summer vacation.
>
> Lukas
>
>
> > On 6 aug 2015, at 12:41, Andreas Schultz <aschultz@REDACTED> wrote:
> >
> > Hi,
> >
> > Since I upgraded to 18.0.2, my Erlang application Linux sometimes crash
> > with egain in standard error. The crash is triggered by something like
> this:
> >
> >   io:format(standard_error, "started ~s~n", [application]).
> >
> > The actual crash looks like this:
> >
> > ** Generic server standard_error_sup terminating
> > ** Last message in was {'EXIT',<0.28.0>,eagain}
> > ** When Server state ==
> {state,standard_error,undefined,<0.28.0>,{local,standard_error_sup}}
> > ** Reason for termination ==
> > ** eagain
> > 2015-08-06 09:14:41 =CRASH REPORT====
> >  crasher:
> >    initial call: supervisor_bridge:standard_error/1
> >    pid: <0.27.0>
> >    registered_name: standard_error_sup
> >    exception exit:
> {eagain,[{gen_server,terminate,7,[{file,"gen_server.erl"},{line,826}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}
> >    ancestors: [kernel_sup,<0.10.0>]
> >    messages: []
> >    links: [<0.11.0>]
> >    dictionary: []
> >    trap_exit: true
> >    status: running
> >    heap_size: 376
> >    stack_size: 27
> >    reductions: 155
> >  neighbours:
> > 2015-08-06 09:14:41 =SUPERVISOR REPORT====
> >     Supervisor: {local,kernel_sup}
> >     Context:    child_terminated
> >     Reason:     eagain
> >     Offender:
>  [{pid,<0.27.0>},{id,standard_error},{mfargs,{standard_error,start_link,[]}},{restart_type,temporary},{shutdown,2000},{child_type,supervisor}]
> >
> > I have straced erl and it does get an EAGAIN on fd 2:
> >
> > 14209 09:14:41.170429 writev(2, [{"started gen_listener_tcp\n", 25}], 1
> <unfinished ...>
> > 14209 09:14:41.170446 <... writev resumed> ) = -1 EAGAIN (Resource
> temporarily unavailable)
> >
> > So whats going on here???
> >
> > After some searching I found a nice explanation about duped fd's and
> file status flags
> > here: http://stackoverflow.com/a/9677130 (a quick test program verifies
> that this indeed
> > happens on Linux).
> >
> > So according to that, setting any of the stdio file descriptors to
> non-blocking, would
> > set all of them to non-blocking. And sure enough, strace shows this:
> >
> >  14170 09:14:39.787004 fcntl(1, F_SETFL, O_RDWR|O_NONBLOCK|O_LARGEFILE)
> = 0
> >
> > (pid 14170 is the main erl process, pid 14209 one of the threads in erl)
> >
> > I have stopped digging through this at that point. Clearly, the
> standard_error
> > process or the underlying port driver should handle the EAGAIN
> gracefully, but
> > fail to do so.
> >
> > Andreas
> > _______________________________________________
> > erlang-questions mailing list
> > erlang-questions@REDACTED
> > http://erlang.org/mailman/listinfo/erlang-questions
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20150810/f49f5d40/attachment.htm>


More information about the erlang-questions mailing list