Implications of setting SIGCHLD in relation to NIFs

José Valim jose.valim@REDACTED
Mon Nov 16 18:54:55 CET 2020


Hi everyone,

I am working on Tensorflow bindings and, at some point, Tensorflow forks a
child process to invoke a separate program. Unfortunately, when running
inside the Erlang VM, Tensorflow fails when calling waitpid, in exactly
this line
<https://github.com/tensorflow/tensorflow/blob/7b637feb1d145d606a7b69481fd4943f3086d5a2/tensorflow/core/platform/default/subprocess.cc#L314-L323>
.

After some debugging, we found out the root cause is because the Erlang VM
sets SIGCHLD to SIG_IGN. According to waitpid docs
<https://www.mkssoftware.com/docs/man3/waitpid.3.asp>:

> If the calling process sets SIGCHLD to SIG_IGN, and the process has no
unwaited for children that were transformed into zombie processes, the
calling thread blocks until all of the children of the process terminate,
at which time waitpid() returns -1 with errno set to ECHILD.

Setting os:set_signal(sigchld, default) fixes the issue, however, it leaves
me wondering:

1. Is it safe to set sigchld back to default? Or is the VM expecting it to
be ignored? Are there any implications we should be aware of?

2. In case it is safe to have it as a default, why is it being ignored in
the first place?

Thank you,

*José Valimhttps://dashbit.co/ <https://dashbit.co/>*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20201116/d37eae57/attachment.htm>


More information about the erlang-questions mailing list