[erlang-questions] Sending signals to non-erlang processes

Sat Oct 17 02:17:42 CEST 2015

On Fri, Oct 16, 2015 at 01:18:40PM +0200, Jesper Louis Andersen wrote:
> On Fri, Oct 16, 2015 at 11:30 AM, Nicolas Martyanoff <khaelin@REDACTED>
> wrote:
> 
> > I also cannot find a way to actually stop the spawned application,
> > port_close() does do it. The "UNIX way" is to send SIGTERM, wait for a bit,
> > then send SIGKILL if the application did not stop. But I cannot find an
> > erlang
> > function to send a signal to an external process. I made a temporary fix
> > using
> > os:cmd("kill ..."), but it feels like a hack.
> >
> 
> Two options: misbehaving programs can be handled through Aleynikov's
> https://github.com/saleyn/erlexec which wraps[0] them in a C++ helper
> process which understands how to gracefully communicate to the Erlang world.
> 
> Port programs normally communicate through a set of file descriptors, so
> the program you spawn should detect and terminate if there are errors when
> reading on the fd. I've been down this rabbit hole before, but I'm afraid I
> forgot, again, how it all works. Perhaps this is good to document in a "How
> to write behaving port programs" document and make it part of the OTP
> documentation.

A well behaved port program is like a process started from inetd.
When the port is closed:

* the read from stdin will return 0 bytes (EOF) or an error

* writing to stdout will result in a SIGPIPE being sent to the port
  process or return an error (EPIPE)

In both cases, it is up to the port process to exit if there is an error
condition.

A simple way of testing the behaviour of closing stdin:

    % Start a port running the cat command
    1> Port = open_port({spawn, "/bin/cat"}, []).
    #Port<0.21242>

    % Send some data and get a response
    2> port_command(Port, "test\n", []).
    true
    3> flush().
    Shell got {#Port<0.21242>,{data,"test\n"}}
    ok

    % Get the PID of the command
    4> erlang:port_info(Port).
    [{name,"/bin/cat"},
     {links,[<0.60.0>]},
      {id,12197},
      {connected,<0.60.0>},
      {input,5},
      {output,5},
      {os_pid,7584}]

    % Close the port
    5> port_close(Port).
    true

    % Confirm the port has exited
    6> os:cmd("kill -0 7584").
    "/bin/sh: 1: kill: No such process\n\n"

Similarly, we can test the behaviour when writing to stdout:

    % Write to stdout
    1> Port = open_port({spawn, "/usr/bin/yes"}, []), Info = erlang:port_info(Port), port_close(Port), Info.
    [{name,"/usr/bin/yes"},
     {links,[<0.60.0>]},
     {id,12197},
     {connected,<0.60.0>},
     {input,0},
     {output,0},
     {os_pid,7845}]

    2> os:cmd("kill -0 7845").
    "/bin/sh: 1: kill: No such process\n\n"

Here is an example of a badly behaving port:

    #!/bin/bash

    while :; do
        echo test
        sleep 1
    done

Observing the behaviour under strace:

    % erlang shell
    1> Port = open_port({spawn, "bad.sh"}, []).
    #Port<0.21242>
    2> flush().
    Shell got {#Port<0.21242>,{data,"test\n"}}
    Shell got {#Port<0.21242>,{data,"test\n"}}
    Shell got {#Port<0.21242>,{data,"test\n"}}
    Shell got {#Port<0.21242>,{data,"test\n"}}
    ok
    3> port_close(Port).
    true

    # strace
    $ strace -p 8323
    write(1, "test\n", 5)                   = 5
    clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb6f133c8) = 8420
    wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 8420
    --- SIGCHLD (Child exited) @ 0 (0) ---
    sigreturn()                             = ? (mask now [QUIT ILL TRAP BUS SEGV USR2 CHLD STOP TSTP])
    write(1, "test\n", 5)                   = -1 EPIPE (Broken pipe)
    --- SIGPIPE (Broken pipe) @ 0 (0) ---
    clone(child_stack=0,
    flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb6f133c8) = 8421
    wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 8421
    --- SIGCHLD (Child exited) @ 0 (0) ---
    sigreturn()                             = ? (mask now [QUIT ILL TRAP BUS SEGV USR2 CHLD STOP TSTP])
    write(1, "test\n", 5)                   = -1 EPIPE (Broken pipe)
    --- SIGPIPE (Broken pipe) @ 0 (0) ---

So the shell script is running the echo in a forked process. When the
child writes to stdout, it is being killed by SIGPIPE but the shell is
ignoring the child's non-zero exit status.

And the same shell script modified to exit when stdin has been closed:

    #!/bin/bash

    while read l; do
        echo "test"
        sleep 1
    done

So:

* a port process must be written to exit if stdin or stdout are closed

* badly behaving port processes may not exit. They may be unkillable
  for many reasons: masking signals, running commands in a subprocess or
  setuid and running as different user.

If we're running port processes, why not run another port process to
clean up? For example:

    #!/bin/sh

    # Port = open_port({spawn, "kill.sh"}, []), port_command(Port, "1234 9", []).

    while read l; do
        set -- $l
        kill -s $2 $1
    done

Which is equivalent to running:

    os:cmd("kill -9 1234").