[erlang-questions] "New" vs. "old" console behavior: bug or feature?

Wed Apr 24 20:05:38 CEST 2013

Strange because both user.erl and group.erl "should" be able to handle output requests in the middle of getting input. But it is a little difficult to see in group as there is all this tricky search code. :-)

Robert

----- Original Message -----
> From: "Fred Hebert" <mononcqc@REDACTED>
> To: "Scott Lystig Fritchie" <fritchie@REDACTED>
> Cc: erlang-questions@REDACTED
> Sent: Wednesday, 24 April, 2013 9:46:02 AM
> Subject: Re: [erlang-questions] "New" vs. "old" console behavior: bug or feature?
> 
> Hi Scott,
> 
> The IO world of Erlang is a fun crazy thing :)
> 
> I've spent time trying to document how the shell works back at
> http://ferd.ca/repl-a-bit-more-and-less-than-that.html. I'll do a
> quick
> roundup of things just to be clear on everything.
> 
> Before going into the difference between 'new' and 'old' shells,
> there
> is a 'user' process, which you mentioned, part of the IO system. The
> 'user' process acts as a default top-level group leader for all the
> output coming from a process. All group leaders are inherited from
> the
> process' parent. They can also be modified, so that you may have
> different group leaders across a VM: they are local processes,
> middle-men (like application_controller), or remote processes (this
> is
> how RPC calls get printed to everyone any time).
> 
> By default, every OTP app will put its controller as a group leader
> for
> all sub-processes. This group leader will redirect output, but
> overload
> the feature to kill rogue processes on shutdown (it makes a list of
> all
> processes, inspects their group leader, and if it's the current app's
> pid, kills said process). Other tools like eunit and Common Test will
> have the possibility of injecting themselves above test cases and
> pick
> what to print or not. By sending IO directly to 'user', we bypass
> that
> hierarchy and go straight to the node's main IO process. Other
> special
> cases can be used, such as 'standard_error', which will redirect
> output
> to the error channel.
> 
> That being said, there are two default implementations of a process
> that
> registers itself as 'user' on a node: the new (current) shell, and
> the
> 'old' shell. The choice of which one to pick is determined at boot
> time
> by the user_sup.erl module (part of kernel) through system flags:
> 
> - If the node is a slave node, the 'user' module will point to a
> remote
>   process.
> - If the node is started with no special flag, the new shell is
> started
>   through 'user_drv'. This 'user' proc will act as a middle-man
>   between
>   input and output with a tty program and the different Erlang groups
>   (see group.erl in kernel) to allow multiple jobs and concurrent
>   shells
>   without messed up output. Evaluation is handled by shell.erl
>   (stdlib)
> - If the node is started with the -oldshell flag, the process in
> charge
>   is 'user.erl', which uses special IO devices ({fd,0,1} for IO) to
>   deal
>   with the input and output channels for the node directly. It will
>   send
>   the evaluation to shell.erl also.
> - If the node is started with -noshell, the 'user.erl' module is
> still
>   booted, but will not evaluate any input nor forward it.
> - If the node is started in -noinput mode, the 'user.erl' module is
>   still booted, but it will not forward any input, only output from
>   the
>   node. It's a superset of -noshell and a bit safer because it opens
>   the
>   IO port in a way that only has the 'out' channel open.
> - There is an undocumented -nouser flag. Such a flag makes sure that
>   neither user.erl nor user_drv.erl are started. The node will crash
>   unless you specifically decide to start a process that registers
>   itself as 'user' and decides to handle IO for your node. This is
>   what
>   you should use were you planning to provide your own Erlang shell
>   and
>   boot it as 'erl -nouser -s custom_shell'.
> - If it's not possible to boot the tty used by 'user_drv', it should
>   fall-back to 'user.erl' as an IO leader.
> 
> Alright. That covers most of it for the basics.
> 
> To figure out why it blocks, we need to figure out the evaluation.
> The
> evaluation itself happens in a shell.erl process, which does an io
> request to the 'user' process (technically, its own group_leader, so
> that anyone may use the evaluator where they want. It just happens to
> be
> the 'user' process in this case).
> 
>  Input --> user.erl <---> shell.erl
> 
> The shell does an io-request to user, which asks to read characters.
> The user.erl process forwards that data to the shell. The shell
> attempts to evaluate it, and if there's not enough data, it asks for
> more. user.erl then blocks until it can get more data to respond to
> the
> io request.
> 
> When output is sent to 'user' it's sent as an additional io request,
> as
> a message. This message will not be read until the shell can answer
> the
> previous request. This is where you block.
> 
>  Input --> user.erl <---> shell.erl
>             ^----> other proc
> 
> The new shell does things differently by using a 'group.erl' process
> for
> each IO group. Now each group.erl process has the same potential to
> block, with the exception that user_drv.erl will start one very
> specific
> 'group.erl' process to be 'user', and will not return it as a
> potential
> shell.erl input source (it would be 0 in '^G -> j', and it is not
> possible to select it). user_drv will also consider it to be a
> special
> group that can *always* output to tty, wheras other groups will only
> have their output dumped by default if they're not the currently
> active
> one (hence you do not get other shells' output by default when you
> switch tasks). This means that while you could block things by
> finding
> the specific 'group.erl' you're currently sending IO requests to by
> default, it's unlikely to happen by accident, and 'user' is now a
> safe
> process to send IO requests to.
> 
> I hope this explains things. I would find it difficult to call it a
> bug
> given a solution exists to the problem already, but I do see why the
> fallback to the old shell when no tty is available could be
> problematic.
> I'm guessing it would be possible to make a 'raw shell', which does
> tasks similar to user_drv, but using a user.erl-like adapter instead
> of
> a tty program to communicate with and starting it with 'erl -nouser
> -s
> rawshell' or something, or eventually making it the default user_drv
> falls back to instead of 'user:start()'. I'm guessing this would be a
> very low priority for the OTP team, though.
> 
> I hope this lengthy response answers your questions!
> 
> Regards,
> Fred.
> 
> On 04/23, Scott Lystig Fritchie wrote:
> > Hi, all.  I can't figure out if this message should be sent to the
> > erlang-bugs list or the erlang-questions list ... so I'll go for
> > the
> > more general audience.
> > 
> > Summary: Starting Erlang with a tty/pseudo-tty can get you a
> > different
> > console shell ("new" and "old", respectively) without you realizing
> > it.(*) If you don't know that you're using the old shell, and if a
> > process tries to send output to the 'user' registered process(**),
> > e.g. io:format(user, "Some message with ~p extra\n", [Extra]), then
> > it
> > is possible that the io:format() call will not return for
> > seconds/minutes/hours/ever.
> > 
> > My question: Is the kind of indefinite blocking on I/O described
> > below a
> >              bug or a feature?
> > 
> > I have a test case that can reproduce this behavior.  An automated
> > version (using Expect) can be found at:
> > 
> >     https://gist.github.com/slfritchie/ad8e5cf1603cbe326be7
> > 
> > The basics of the reproducing the hang are:
> > 
> >     SSH session #1                      SSH session #2
> >     --------------                      --------------
> >     Start an Erlang daemon
> >     using "run_erl".
> > 
> >     Attach to the daemon's console
> >     using "to_erl".
> > 
> >                                         Start another Erlang VM
> >                                         and connect to the first
> >                                         VM via "-remsh".
> > 
> >     At the console, type the
> >     following and press ENTER:
> >         {term1,
> > 
> >                                         Run this command:
> >                                             io:format(user,
> >                                             "Hey!\n", []).
> > 
> > The io:format/3 call in session #2 will behave differently if
> > session
> > #1's "run_erl" command runs with a tty/pseudo-tty or without.
> > 
> >     A. With a tty/pty: The io:format() call returns immediately.
> >     B. Without a tty/pty: The io:format() call will hang
> >     indefinitely.
> >        It will remain blocked until the Erlang term parser in
> >        session #1
> >        has returned.  For example, finishing the term with
> >        "term2}." and
> >        then pressing ENTER.
> > 
> > The same effect can be seen by forcing the use of the old shell,
> > without
> > using SSH, by simply running "erl -oldshell" for session #1 (in an
> > Xterm
> > or other terminal window, or at the machine's hardware console)
> > instead
> > of using SSH + "run_erl" + "to_erl".
> > 
> > Riak was the application that triggered this bug hunt (in
> > conjunction
> > with the Lager app)(***).  Finding it has taken much longer than
> > anyone
> > guessed.  The reason is that the necessary precondition, starting
> > Erlang
> > via 'run_erl' via SSH without an associated tty/pseudo-tty, is not
> > common.  (Riak's packaging uses "sudo", which refuses to run if
> > there
> > isn't a tty/pty available.)
> > 
> > All attempts to duplicate the behavior failed because we didn't
> > understand that the root cause of the bad behavior was the old
> > console
> > being silently chosen at VM startup when not tty/pty is available.
> > 
> > -Scott
> > 
> > (*) See
> > https://github.com/erlang/otp/blob/maint/lib/kernel/src/user_drv.erl#L103
> > for how the choice is made.
> > 
> > (**) From the 'io' man page:
> > 
> >        There is always a process registered under the name of user.
> >        This
> >        can be used for sending output to the user.
> > 
> > ... where "output to the user" really means "output to the Erlang
> > virtual machine console."
> > 
> > (***) For source code of Riak and Lager, respectively, see:
> >     https://github.com/basho/riak
> >     https://github.com/basho/lager
> > _______________________________________________
> > erlang-questions mailing list
> > erlang-questions@REDACTED
> > http://erlang.org/mailman/listinfo/erlang-questions
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>