[erlang-questions] Strange interaction between Docker and Erlangs ports (exit_status lost)

Lukas Larsson garazdawi@REDACTED
Thu Dec 17 17:51:22 CET 2015


Hello,

I did some digging into this and it appears that some extra process (I
don't know which one) sends a SIGCHLD to beam.smp which is caught and does
not match any pid that the spawn driver is interested in. The spawn driver
is built around the assumption that no extra SIGCHLD arrives so after
receiving the extra SIGCHLD it does not go looking for more as it thinks it
has gotten all it should and thus the exit_status of the ls command is
never received. I changed the spawn driver to no longer assume that it is
interested in each SIGCHLD but then it starts spinning like crazy over
waitpid so we really have to figure out what that extra process is in order
to do anything about it.

For some reason strace in the docker images I built is very very broken. If
someone who is better at working with docker wants to pick it up and have a
look I've forked and added some things to André's repo:
https://github.com/garazdawi/docker-erlang-bug and the "fix" with trace
output in here:
https://github.com/garazdawi/otp/tree/lukas/erts/docker-rogue-process-fix-kinda

The output I get is:
child sleep
Signal chld waiter
23: About to execute exec inet_gethost 4
child died 23 0
child died 24 10
25: About to execute exec ls
25: ready_input read 67
child died 25 0
25: report_exit_status 0 -> 0x7f23ab9c0c2025: report_exit_status 0 ->
0x7f23ab9c0c2025: ready_input read 0
25: port_inp_failure 0
Dockerfile
README.md
erlang-OTP-18.2.tar.gz
test.beam
test.erl

SUCCESS

the questions is what is this child 24 that dies with status 10? It seems
to be sticking together with inet_gethost, but I don't understand why it
should generate extra SIGCHLDs.

Lukas

On Thu, Dec 17, 2015 at 12:47 PM, André Cruz <andre@REDACTED> wrote:

> On 17 Dec 2015, at 10:31, Alexey Lebedeff <binarin@REDACTED> wrote:
> >
> > Ah, docker at its best )
> >
> > $ for iter in $(seq 1 100); do echo -n "$iter " 1>&2 ; docker run --rm
> edevil/docker-erlang-bug bash -c "sleep 1; erl -noshell -s test run -s init
> stop" 2>/dev/null; done | sort | uniq -c
> >     100 SUCCESS
> >
> > but
> >
> > for iter in $(seq 1 100); do echo -n "$iter " 1>&2 ; docker run --rm
> edevil/docker-erlang-bug erl -noshell -s test run -s init stop 2>/dev/null;
> done | sort | uniq -c
> >      12 FAILED
> >      88 SUCCESS
> >
> > So you should either use bash/sleep trick or try find a bug in docker.
> Honestly, I just gave up ) Especially given that it's not very convinient
> to use erlang distribution inside docker containers without something like
> weavedns.
>
> There are some subtle changes that somehow mitigate the problem, for
> example:
>
> $ docker run edevil/docker-erlang-bug bash -c "erl -noshell -s test run -s
> init stop 1>&1"
> SUCCESS
>
> Notice the strange stdout redirect. Without it:
>
> $ docker run edevil/docker-erlang-bug bash -c "erl -noshell -s test run -s
> init stop"
> FAILED
>
> It seems to me that the Erlang port is not aware that the external command
> has completed. Can we be sure this is a Docker problem and not some
> incorrect assumption by the Beam VM about its environment? This recent
> e-mail
> http://erlang.org/pipermail/erlang-questions/2015-October/086590.html
> talks about launched processes being on another process session, can this
> be related?
>
> André
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20151217/da0956cd/attachment.htm>


More information about the erlang-questions mailing list