[erlang-questions] Issues with stdin on ports

Thu Aug 1 02:18:59 CEST 2013

On 1/08/2013, at 4:55 AM, Per Hedeland wrote:
> 
> True, but not relevant to the original request. However since the issue
> has been mentioned in some other comments: open_port(spawn) obviously
> doesn't use popen(), and it carefully records the pid of the process
> that it forks.

Yes, but that may well be the wrong process.

>> (4) To send attention signals of some sort, you need to use
>>   pty(4)s.
> 
> I assume that by "attention signals" (not a term I have seen in this
> context) you mean "signals generated by sending a specific character on
> the communication channel". If so, true, for this and many other things
> you need a pty - but irrelevant.

The term (an old mainframe term) was not meant to be precise.

>> Per Hedeland wrote: "Erlang open_port/2 *by default* creates
>> a bi-directional, deadlock-free communication channel to the
>> external process."  That's still not quite right.  The thing
>> that governs whether a deadlock is possible or not is the
>> *protocol* the communicating processes use.  And it is still
>> the case that an Erlang process may be deadlocked this way;
>> it's just that the whole Erlang system won't be tied up if it
>> happens.
> 
> No, the statement is quite correct, see above.

No, it really really does depend on the protocol.
It doesn't matter how the communication is implemented,
if both processes start by trying to read, that's a
deadlock.

You appear to be saying that an Erlang process (but not
the Erlang node containing it) can be blocked forever,
but somehow that's not a deadlock.  I don't understand.
> 
> But you're still re-iterating issues that, if they were genuine, would
> be with the open_port(spawn) functionality *that already exists*.

They are genuine, and they do apply to the current functionality,
as I demonstrate below.

> No-one
> is suggesting that *it* be implemented, since it already is. And you can
> take advantage of that fact to refute your own assertions by simply
> using it, instead of posting them here. Here's a little something to get
> you started:
> 
> 1> P = open_port({spawn, "/bin/sh"}, []).
> #Port<0.504>
> 2> P ! {self(), {command, "echo " ++ lists:duplicate(100000, $a) ++ "\n"}}, ok.
> ok
> 3> Rec = fun (F, Port, Acc) -> receive {Port, {data, Data}} -> F(F, Port, Acc ++ Data) after 0 -> Acc end end.
> #Fun<erl_eval.18.82930912>
> 4> Got = Rec(Rec, P, []), ok.
> ok
> 5> length(Got).
> 100001

This is *NOT* the situation I was talking about.
I was talking about *cyclic* scenarios involving
*two* processes.

 	+----+                              +----+
   +--->| P1 | ---> (port) ---> (pipe) ---> | P2 |---+
   | 	+----+       / ^                    +----+   |
   +-----------------    \----- (pipe) <-------------+

where the specially written port process is just a part
of the communication machinery; how many concurrent
activities there are inside it is pretty much irrelevant.
_Functionally_, it's as if we had

 	+----+                              +----+
   +--->| P1 | ---------------> (pipe) ---> | P2 |---+
   | 	+----+                              +----+   |
   +--------------------------- (pipe) <-------------+

The issue is that we have two processes connected by
bounded buffers.  The fact that one of them is an Erlang
process and one of them a Unix process doesn't affect the
logic.  The fact that the bounded buffers are UNIX pipes
rather than some sort of internal data structure is also
irrelevant to the logic.

In general, this won't work UNLESS the two processes
communicate through an appropriate protocol.  And your
example is a perfect instance of such a protocol, the
case I labelled (1).  Let me repeat what I wrote:
> 
> (1) the child process first reads all the data without writing
>   and some time after receiving EOF writes all its results
>   without reading,

FIRST, P1 sends "echo " ++ lists:duplicate(100000, $a) ++ "\n"
to P2, which reads *all* of it before writing anything.
THEN P2 sends output back to P1.

I should have been explicit that this complete-send-then-
complete-receive can be embedded in a cycle:

	loop
	    P1 sends a complete request to P2 without reading
	    anything.  This request is unambiguously terminated.
	    P2 reads the complete request, processes it, and
	    sends a complete response without reading anything.
	    This response is unambiguously terminated.
	    P1 reads the complete response without writing
	    anything.
	end loop

The unambiguous termination could be closing the connection.
In this case, it's the "\n" that ends the command to the
shell and the "\n" that ends the output of "echo".

Now let's change your example ever so slightly.

1> P = open_port({spawn, "/bin/cat"}, []).
2> P ! {self(), {command, lists:duplicate(100000, $a) ++ "\n"}}.
3> Rec = fun (F, Port, Acc) ->
             receive {Port, {data, Data}} -> F(F, Port, Acc ++ Data) end
         end.
4> Got = Rec(Rec, P, []), ok.

I didn't get as far as step 5, because this deadlocked.

The big difference here is that cat reads a chunk and echoes
it back (but P1 isn't interested in reading yet), reads another
chunk, tries to echo it back, and is blocked because the pipe
is full.  Then P1 tries to send some more, and gets blocked,
because the pipe is full.

My assertions are not refuted, but proven.

Your example works because it does what I said works.
My example deadlocks because it does what I said deadlocks.

Cyclic communications through bounded buffers *do* require
careful design and it *is* easy to get deadlock that way
using natural code, and the hard work that went into Erlang's
port processes is no panacea.