[erlang-questions] Strange interaction between Docker and Erlangs ports (exit_status lost)

Alexey Lebedeff binarin@REDACTED
Thu Dec 17 20:59:36 CET 2015


Hi,

You're on a right track here. This behaviour is due to kernel reparenting
of orphaned processes to pid 1 (init) in current pid namespace. For zombies
it even sends SIGCHLD to this pid 1 process when reparenting.

So I think it's possible to reproduce this even without docker, just with
NIF that does fork(2).

So the assumption about no extra SIGCHLD is wrong and needs to be fixed.
Are you willing to do this? Or I could give it a try.

Best,
Alexey
17 дек. 2015 г. 19:51 пользователь "Lukas Larsson" <garazdawi@REDACTED>
написал:

> Hello,
>
> I did some digging into this and it appears that some extra process (I
> don't know which one) sends a SIGCHLD to beam.smp which is caught and does
> not match any pid that the spawn driver is interested in. The spawn driver
> is built around the assumption that no extra SIGCHLD arrives so after
> receiving the extra SIGCHLD it does not go looking for more as it thinks it
> has gotten all it should and thus the exit_status of the ls command is
> never received. I changed the spawn driver to no longer assume that it is
> interested in each SIGCHLD but then it starts spinning like crazy over
> waitpid so we really have to figure out what that extra process is in order
> to do anything about it.
>
> For some reason strace in the docker images I built is very very broken.
> If someone who is better at working with docker wants to pick it up and
> have a look I've forked and added some things to André's repo:
> https://github.com/garazdawi/docker-erlang-bug and the "fix" with trace
> output in here:
> https://github.com/garazdawi/otp/tree/lukas/erts/docker-rogue-process-fix-kinda
>
> The output I get is:
> child sleep
> Signal chld waiter
> 23: About to execute exec inet_gethost 4
> child died 23 0
> child died 24 10
> 25: About to execute exec ls
> 25: ready_input read 67
> child died 25 0
> 25: report_exit_status 0 -> 0x7f23ab9c0c2025: report_exit_status 0 ->
> 0x7f23ab9c0c2025: ready_input read 0
> 25: port_inp_failure 0
> Dockerfile
> README.md
> erlang-OTP-18.2.tar.gz
> test.beam
> test.erl
>
> SUCCESS
>
> the questions is what is this child 24 that dies with status 10? It seems
> to be sticking together with inet_gethost, but I don't understand why it
> should generate extra SIGCHLDs.
>
> Lukas
>
> On Thu, Dec 17, 2015 at 12:47 PM, André Cruz <andre@REDACTED> wrote:
>
>> On 17 Dec 2015, at 10:31, Alexey Lebedeff <binarin@REDACTED> wrote:
>> >
>> > Ah, docker at its best )
>> >
>> > $ for iter in $(seq 1 100); do echo -n "$iter " 1>&2 ; docker run --rm
>> edevil/docker-erlang-bug bash -c "sleep 1; erl -noshell -s test run -s init
>> stop" 2>/dev/null; done | sort | uniq -c
>> >     100 SUCCESS
>> >
>> > but
>> >
>> > for iter in $(seq 1 100); do echo -n "$iter " 1>&2 ; docker run --rm
>> edevil/docker-erlang-bug erl -noshell -s test run -s init stop 2>/dev/null;
>> done | sort | uniq -c
>> >      12 FAILED
>> >      88 SUCCESS
>> >
>> > So you should either use bash/sleep trick or try find a bug in docker.
>> Honestly, I just gave up ) Especially given that it's not very convinient
>> to use erlang distribution inside docker containers without something like
>> weavedns.
>>
>> There are some subtle changes that somehow mitigate the problem, for
>> example:
>>
>> $ docker run edevil/docker-erlang-bug bash -c "erl -noshell -s test run
>> -s init stop 1>&1"
>> SUCCESS
>>
>> Notice the strange stdout redirect. Without it:
>>
>> $ docker run edevil/docker-erlang-bug bash -c "erl -noshell -s test run
>> -s init stop"
>> FAILED
>>
>> It seems to me that the Erlang port is not aware that the external
>> command has completed. Can we be sure this is a Docker problem and not some
>> incorrect assumption by the Beam VM about its environment? This recent
>> e-mail
>> http://erlang.org/pipermail/erlang-questions/2015-October/086590.html
>> talks about launched processes being on another process session, can this
>> be related?
>>
>> André
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20151217/2b9f08a4/attachment.htm>


More information about the erlang-questions mailing list