[erlang-questions] Issues with stdin on ports

Mon Jul 29 16:13:15 CEST 2013

"Richard A. O'Keefe" <ok@REDACTED> wrote:
>
>On 29/07/2013, at 4:14 PM, Anthony Grimes wrote:
>
>> and communicating with external processes in Erlang. They seem to have
>> at least one particular fatal flaw which prevents them from being very
>> useful to me, and that is that there is no way to close stdin (and send
>> EOF) and then also read from the process's stdout. For example, I cannot
>> use a port to start the 'cat' program which listens on stdin for data
>> and waits for EOF and then echos that data back to you. I can do the
>> first part, which is send it data on stdin, but the only way for me to
>> close it is to call port_close and close the entire process.

FWIW, I definitely agree that this is a missing piece of functionality.
I'm not sure how useful/important it is in the grand scheme of things,
but personally I could have used it on a couple of occasions. As you
mentioned, it's basically the equivalent of TCP shutdown() that is
needed, although shutdown() is perhaps a bit over-engineered - I've
never seen anyone use SHUT_RD or SHUT_RDWR...

Also the "opposite" functionality is already available for ports via the
'eof' option - i.e. you get informed that the other end has closed its
write side, but can still write data in the other direction.

>Note that "only send data to a command" and "only receive data from a
>command" are the traditional ways for a UNIX program to communicate
>with another over a pipe.

Well, it's basically the definition of the traditional pipeline concept
of the Unix shells, and pipes are obviously what you need to implement
it - but that doesn't preclude other uses of pipes. The zsh shell even
allows you to set up "bidirectional pipes" on the commandline.

> popen(<command>, "r") reads the output of
>the command and popen(<command>, "w") writes to the input of the command.

popen() is effectively a convenience function to abstract away the
somewhat non-trivial application of pipe(), fork(), close(), and
execve() that is required to set things up correctly for two particular
and common usages of pipes in application code. (It is not used by
common shells to implement pipelines though.)

>There isn't even any standard _term_ for talking about connecting to both
>stdin and stdout of a command in UNIX, and that's because it's an
>incredibly easy way to deadlock.

There is no need to have a term for it, since all you need is two pipes,
one for each direction - and it's probably uncommon enough to not
warrant its own convenience function. And you can indeed easily deadlock
if you don't think about what you're doing, but I really doubt that this
is the reason for any absence of terminology or functions.

But anyway, I don't see how any of this is relevant to the question at
hand. Opening a bi-directional connection between two processes by means
of a pair of pipes is exactly what erlang:open_port/2 *already does*
when you use 'spawn' (or 'spawn_executable' these days) to start an
external process. And it has been doing this since day one, and I can't
recall anyone complaining how this is hopelessly dangerous due to the
risk of deadlock (the risk is of course reduced due to the fact that the
VM does non-blocking I/O).

>> This issue prevents Erlang users from doing any even slightly more than
>> trivial communication with external processes without having some kind
>> of middleman program that handles the creation of the actual process you
>> need to talk to and looks for a specific byte sequence to indicate 'EOF'.
>
>Just like it prevents C users from doing the same thing.

No, there is nothing that prevents C users from doing the same thing.
And even if they have to go to some effort to do it, it just means
having to write a bit more C - whereas the Erlang user can't write a bit
more Erlang to do just the small addition of "close the write side of
one of the pipes", even though the pipe pair is already there...

>Unix anonymous pipes are simply the wrong tool for the job in _any_
>programming language.
>
>The historic way to do "slightly more than trivial communication with
>external processes" has been to set the external processes up as C nodes
>or to use sockets.

Using (TCP) sockets instead of pipes doesn't really change the "risk of
deadlock". In the case of the Erlang VM (i.e. open_port vs gen_tcp), it
may actually increase it, due to the existence of the passive and
{active, once} modes for sockets - another piece of functionality that
is "missing" from ports.

--Per Hedeland