[erlang-questions] "New" vs. "old" console behavior: bug or feature?

Tue Apr 23 21:36:54 CEST 2013

Hi, all.  I can't figure out if this message should be sent to the
erlang-bugs list or the erlang-questions list ... so I'll go for the
more general audience.

Summary: Starting Erlang with a tty/pseudo-tty can get you a different
console shell ("new" and "old", respectively) without you realizing
it.(*) If you don't know that you're using the old shell, and if a
process tries to send output to the 'user' registered process(**),
e.g. io:format(user, "Some message with ~p extra\n", [Extra]), then it
is possible that the io:format() call will not return for
seconds/minutes/hours/ever.

My question: Is the kind of indefinite blocking on I/O described below a
             bug or a feature?

I have a test case that can reproduce this behavior.  An automated
version (using Expect) can be found at:

    https://gist.github.com/slfritchie/ad8e5cf1603cbe326be7

The basics of the reproducing the hang are:

    SSH session #1                      SSH session #2
    --------------                      --------------
    Start an Erlang daemon
    using "run_erl".

    Attach to the daemon's console
    using "to_erl".

                                        Start another Erlang VM
                                        and connect to the first
                                        VM via "-remsh".

    At the console, type the
    following and press ENTER:
        {term1, 

                                        Run this command:
                                            io:format(user, "Hey!\n", []).

The io:format/3 call in session #2 will behave differently if session
#1's "run_erl" command runs with a tty/pseudo-tty or without.

    A. With a tty/pty: The io:format() call returns immediately.
    B. Without a tty/pty: The io:format() call will hang indefinitely.
       It will remain blocked until the Erlang term parser in session #1
       has returned.  For example, finishing the term with "term2}." and
       then pressing ENTER.

The same effect can be seen by forcing the use of the old shell, without
using SSH, by simply running "erl -oldshell" for session #1 (in an Xterm
or other terminal window, or at the machine's hardware console) instead
of using SSH + "run_erl" + "to_erl".

Riak was the application that triggered this bug hunt (in conjunction
with the Lager app)(***).  Finding it has taken much longer than anyone
guessed.  The reason is that the necessary precondition, starting Erlang
via 'run_erl' via SSH without an associated tty/pseudo-tty, is not
common.  (Riak's packaging uses "sudo", which refuses to run if there
isn't a tty/pty available.)

All attempts to duplicate the behavior failed because we didn't
understand that the root cause of the bad behavior was the old console
being silently chosen at VM startup when not tty/pty is available.

-Scott

(*) See
https://github.com/erlang/otp/blob/maint/lib/kernel/src/user_drv.erl#L103
for how the choice is made.

(**) From the 'io' man page:

       There is always a process registered under the name of user. This
       can be used for sending output to the user.

... where "output to the user" really means "output to the Erlang
virtual machine console."

(***) For source code of Riak and Lager, respectively, see:
    https://github.com/basho/riak
    https://github.com/basho/lager