Fun with open_port

Gerd Flaig gerd@REDACTED
Sun Mar 21 16:38:59 CET 2004


Shawn Pearce <spearce@REDACTED> writes:

> Short is, I'm not seeing what you are, and I shouldn't either, nobody
> should.  Compiling and loading a module should have no bearing on
> the exit_status message being delivered by a port program, and the
> port program should not be creating zombies...  perhaps something is
> wrong with your operating system?

I did some stracing. According to strace, SIGCHLD is delivered to the
emulator also in the failure case. But there is a difference: I see no
waitpid() then. Maybe I managed to trigger an obscure bug that only
shows up with certain combinations of glibc and gcc.

When stracing, q() just hangs there after 'Process 30049 suspended'.

% ldd /usr/lib/erlang/erts-5.3/bin/erlexec 
        libc.so.6 => /lib/libc.so.6 (0x4001c000)
        /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
% strace -f -o /tmp/erlbug.trace erl
Process 30046 attached
Process 30047 attached
Process 30048 attached
Process 30047 detached
Process 30046 suspended
Process 30046 resumed
Process 30048 detached
Process 30046 detached
Process 30049 attached
Process 30050 attached
Erlang (BEAM) emulator version 5.3 [source] [hipe] [threads:0]

Eshell V5.3  (abort with ^G)
1> dreck:r().
Process 30051 attached
                      Process 30051 detached
                                            {#Port<0.30>,{exit_status,0}}
2> c(dreck).
{ok,dreck}
3> dreck:r().
Process 30052 attached
                      Process 30052 detached
                                            timeout
4> q().
ok
5> Process 30050 detached
Process 30049 suspended

30051 _exit(0)                          = ?
30045 <... poll resumed> [{fd=3, events=POLLIN}, {fd=0, events=POLLIN}, {fd=7, events=POLLIN, revents=POLLHUP}], 3, 4987) = 1
30045 --- SIGCHLD (Child exited) ---
30045 sigreturn()                       = ? (mask now [RTMIN])
30045 times({tms_utime=56, tms_stime=7, tms_cutime=3, tms_cstime=0}) = 346574446
30045 gettimeofday({1079882310, 628455}, NULL) = 0
30045 read(7, "", 65536)                = 0
30045 rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0
30045 waitpid(-1, [WIFEXITED(s) && WEXITSTATUS(s) == 0], WNOHANG) = 30051
30045 waitpid(-1, 0xbffff990, WNOHANG)  = -1 ECHILD (No child processes)
30045 rt_sigprocmask(SIG_UNBLOCK, [CHLD], NULL, 8) = 0

As you can see, there are two calls to waitpid, the first of which
gives us the exit status of /bin/true (pid 30051).

30045 <... poll resumed> [{fd=3, events=POLLIN}, {fd=0, events=POLLIN}, {fd=7, events=POLLIN, revents=POLLHUP}], 3, 4991) = 1
30045 --- SIGCHLD (Child exited) ---
30045 sigreturn()                       = ? (mask now [RTMIN])
30045 times({tms_utime=101, tms_stime=10, tms_cutime=4, tms_cstime=1}) = 346575193
30045 gettimeofday({1079882318, 92096}, NULL) = 0
30045 read(7, "", 65536)                = 0
30045 times({tms_utime=101, tms_stime=10, tms_cutime=4, tms_cstime=1}) = 346575193
30045 gettimeofday({1079882318, 92740}, NULL) = 0
30045 poll([{fd=3, events=POLLIN}, {fd=0, events=POLLIN}, {fd=7, events=POLLIN|POLLOUT, revents=POLLOUT|POLLHUP}], 3, 4820) = 1

The second time, there is no call to waitpid. I've put the whole
strace (97159 bytes bzipped) up on http://nxdomain.org/erlbug.trace.bz2

ii  libc6                        2.3.2.ds1-11                 GNU C Library: Shared libraries and Timezone data
ii  erlang                       9.2-5                        A
real-time, concurrent and distributed functional language (self-compiled)


        Goodbyte, Gerd.
-- 
Gerd Flaig                     Technik                gerd@REDACTED
Bei Schlund + Partner AG       Brauerstraße 48      D-76135 Karlsruhe
 Physics is like sex: sure, it may give some practical results,
 but that's not why we do it. -- Richard Feynman



More information about the erlang-questions mailing list