[erlang-bugs] Strange thing in lib/kernel/src/group.erl

Mon May 6 15:34:21 CEST 2013

On 05/06, Stefan Zegenhagen wrote:
> 
> Unfortunately, this is only half the truth. I/O requests will usually
> not cause the shell to not listen to 'EXIT' requests from the
> group_leader() because I/O requests are implemented as message exchange
> between those two. Additionally, the io.erl module (which does the I/O
> requests), is terribly careful to not miss any exit signal sent by the
> group_leader() / I/O channel. It does the following:
>  - create a monitor for the group_leader() (or the supplied I/O channel)
>  - send an {io_request, *} message to the I/O channel
>  - listen for
>    * an {io_reply, *}
>    * the 'DOWN' message from the process monitor
>    * any 'EXIT' message from the I/O channel
> 
> If any matching 'DOWN' or 'EXIT' message is received, the correponding
> opposite is fetched from the message queue as well and {error,
> terminated} is returned to the caller. This is already bad by itself
> because it drops the (possibly important) reason of the error.
> 
> In conclusion, by using the io module for input/output a shell can never
> get stuck in a state where it is unkillable by doing an I/O request.
> But it is true that an I/O request blocks the shell for calls/messages
> from *OTHER* processes than the I/O channel.

The key point here is 'usually'. In practice, with the 'io' module,
things are gonna be safe. I think most if not all functions of the
'file' module also make use of the io protocol to write to files through
the 'io' module directly and are generally safe for that.

However, I'm looking at it only from within the group.erl implementation
and the documented protocol at
http://erlang.org/doc/apps/stdlib/io_protocol.html (in Erlang, if it's
not documented, it doesn't exist). If you're basing yourself only on the
protocol, you can't assume the other side will monitor you, although
it's probably what any reasonable Erlang programmer would do.

I'm guessing that if the shell had documentation and a notice warning
for this usage, there would be no argument that could be made against
it.

> 
> I can see that it might be wanted to get rid of the shell for sure. One
> might imagine a case where the shell is trapping exits but "refuses to
> die" in response to a trappable exit signal. But then, it is not clear
> to me, why the same measure (e.g. exit_shell(kill)) is not taken in the
> case where the group.erl's server process is *NOT* executing an I/O
> request right now and the shell might truely be blocked by activities
> that prevent it from reacting on the exit signal.
> 

It is indeed not very clear. My guess would be that you can make
assumptions about your part of the communication and protocol, but not
the others.

A simpler explanation is probably that sometimes back, there was a
problem with either implementation and it was simpler to fix with a kill
than by adding other ways to handling code (say, before monitoring was
added to the language, but while trap_exits were available).

If this is the case, then there would be no reason to keep things the
way they are right now IMO, and it would be possible to go with the
other exit.

> 
> But back to the original issue: there are several, discinct reasons why
> we might need to forcedly terminate a shell session *AND* to do an
> appropriate logging IFF a user is currently logged on (for
> security/auditing reasons), e.g.:
>  - the serial cable is being unplugged while a user is logged on
>  - someone tries to interfere with the system by sending huge amounts
>    of binary data over the serial port (possible denial-of-service)
>  - ...
> 
> Our user_drv.erl replacement exits with an appropriate reason in those
> cases and our shell implementation needs to know the exit reason to do
> the right thing depending on the situation. This is currently impossible
> and I was wondering whether anything could be done about it.

That is definitely a nice use case and I would be personally more open
to allowing that than leaving the 'kill' here. I am however not in the
OTP team, and do not know everything that has to do with the shell, so
this is only my personal opinion.

A possible workaround if things do not come to fruition would be to add
layers of indirection -- a process that monitors the shell and the
group.erl process and reports the most useful message. Ideally this
would not need to be written, although it might still be needed if you
deal with older implementations after the fix.

> 
> Whether this works would certainly depend on the timing. The shell
> process should be given enough time to have a chance to process the
> first exit signal before being forcedly killed by the second one. Can
> this be guaranteed?
> 

The two-kill approach should work well in the event where the other
process is not trapping exits. In that case, the order of signals should
be guaranteed, and the first one will kill the process cleanly.

If the process is trapping exits, though, then the first (non-kill)
signal will be converted to a message and you're absolutely unlikely to
be able to have the time to process the first one before being killed by
the second one.

The cleanest solution is obviously to be able to just exit/2 with the
right reason.

I don't know if the OTP team has managed to transfer all the changelogs
relating to the shells when they moved over to git, but I'd be
interested to figure out if the exit(Pid,kill) in there is older than
monitors -- if so, it would mean that it was probably a workaround for
the io module which is no longer necessary today (because it can monitor
without altering links or exits being trapped).

Regards,
Fred.