[erlang-questions] is this a shell bug?

Tue Apr 24 21:24:04 CEST 2012

On Mon, Apr 23, 2012 at 3:56 AM, Attila Rajmund Nohl
<attila.r.nohl@REDACTED> wrote:
> 2012/4/23 吴磊 <mjollnir.ray@REDACTED>:
>> Erlang R15B01 (erts-5.9.1) [source] [64-bit] [smp:4:4]
>> [async-threads:0] [hipe] [kernel-poll:false]
>>
>> Eshell V5.9.1  (abort with ^G)
>> 1> self().
>> <0.32.0>
>> 2> process_flag(trap_exit, true).
>> false
>> 3>  spawn_link(fun() -> ok end).
>> <0.36.0>
>> 4> process_info(self(), messages).
>> {messages,[{'EXIT',<0.36.0>,normal}]}
>> 5> receive X -> X end.
>> {'EXIT',<0.36.0>,normal}
>> 6>  spawn_link(fun() -> ok end).
>> <0.40.0>
>> 7> process_info(self(), messages).
>> {messages,[{'EXIT',<0.40.0>,normal}]}
>> 8> receive X -> X end.
>>
>> shell SUSPENDING .... WHAT's happed here?
>
> X was already bound, so you were waiting for {'EXIT',<0.36.0>,normal},
> but got {'EXIT',<0.40.0>,normal}

That much is true, but there's something else interesting here. Look
more carefully at the output from the second shell in the original
message, repeated below:

^G
User switch command
 --> s
 --> c
Eshell V5.9.1  (abort with ^G)
1> process_info(pid(0,32,0), current_function).
{current_function,{erl_eval,receive_clauses,6}}
2> process_info(pid(0,32,0), messages).
{messages,[]}

The second call to process_info shows the first shell's message queue
to be empty, when in fact it isn't. Could this be the problem to which
the original author was actually referring?

Here's a variant of the same. First, set X and then spawn a new
process, same as in the original posting:

Eshell V5.9.1  (abort with ^G)
1> self().
<0.31.0>
2> process_flag(trap_exit, true).
false
3> spawn_link(fun() -> ok end).
<0.35.0>
4> process_info(self(), messages).
{messages,[{'EXIT',<0.35.0>,normal}]}
5> receive X -> X end.
{'EXIT',<0.35.0>,normal}
6> spawn_link(fun() -> ok end).
<0.39.0>

At this point we should have an EXIT tuple in this shell's message
queue. Next, start a new shell, and use it to verify that the first
shell's message queue is not empty:

^G
User switch command
 --> s
 --> c
Eshell V5.9.1  (abort with ^G)
1> process_info(pid(0,31,0), messages).
{messages,[{'EXIT',<0.39.0>,normal}]}

Just what we expect to see. Now, get the pid of the second shell, and
then switch back to the first shell. In the first shell, start a
receive like in the original message, but add a 30 second timeout, and
send a 'done' message to the second shell after the timeout:

2> self().
<0.43.0>

^G
User switch command
 --> c 1
7> receive X -> X after 30000 -> pid(0,43,0) ! done end.

Now quickly switch back to the second shell, check the first shell's
message queue, and then wait for the 'done' message:

User switch command
 --> c 2
3> process_info(pid(0,31,0), messages).
{messages,[]}
4> receive done -> process_info(pid(0,31,0), messages) end.
{messages,[{'EXIT',<0.39.0>,normal}]}

Check that out -- while the first shell is trying to receive X,
examining its message queue from the second shell shows that queue to
be empty, contrary to what we saw the first time we checked it from
the second shell, where it held the EXIT tuple for the second spawned
process. But after the first shell's receive times out and the second
shell gets the 'done' message, checking the first shell's message
queue again shows it to contain the same EXIT tuple as before.

In section 8.6 of his book, Joe describes a "save queue" used during
selective receive, and this behavior matches his description. But if
you write a program that does the same thing, it acts differently;
unlike the shell, you won't observe an empty message queue during a
receive. So this behavior must be a shell thing. I don't know the
details of why it acts this way -- chatting about it with Scott
Fritchie, he guessed it might have something to do with the other
messaging the shell has to do, but neither of us knows for sure. Can
someone explain why the shell seems to use an actual "save queue"
during a receive?

--steve