[erlang-questions] Issues with stdin on ports

Tue Jul 30 08:23:25 CEST 2013

Also, fwiw, here are links to other people with the same issue over the
years:

http://erlang.org/pipermail/erlang-questions/2010-November/054330.html
http://erlang.org/pipermail/erlang-questions/2010-October/053944.html
http://erlang.org/pipermail/erlang-questions/2009-March/042123.html
http://stackoverflow.com/questions/8792376/erlang-ports-interfacing-with-a-wc-like-program

And here are hacks that have been written to get around this limitation:

https://github.com/mattsta/erlang-stdinout-pool#why-is-this-special
I assume this does as well: https://github.com/saleyn/erlexec

Cheers!

-Anthony

> Richard A. O'Keefe <mailto:ok@REDACTED>
> July 29, 2013 10:55 PM
> On 30/07/2013, at 2:13 AM, Per Hedeland wrote:
>> FWIW, I definitely agree that this is a missing piece of functionality.
>
> Oh it's present in several languages.  You could argue that it's missing.
> But is it missing like a car radio in a car that lacks one,
> or is it missing like an ejector seat in a car that lacks one?
>
>>> Note that "only send data to a command" and "only receive data from a
>>> command" are the traditional ways for a UNIX program to communicate
>>> with another over a pipe.
>> Well, it's basically the definition of the traditional pipeline concept
>> of the Unix shells, and pipes are obviously what you need to implement
>> it - but that doesn't preclude other uses of pipes. The zsh shell even
>> allows you to set up "bidirectional pipes" on the commandline.
>
> There are reasons why I don't use zsh.  That's one of them.
>>> popen(<command>, "r") reads the output of
>>> the command and popen(<command>, "w") writes to the input of the command.
>> popen() is effectively a convenience function to abstract away the
>> somewhat non-trivial application of pipe(), fork(), close(), and
>> execve() that is required to set things up correctly for two particular
>> and common usages of pipes in application code. (It is not used by
>> common shells to implement pipelines though.)
>
> Having implemented popen() for two other high level languages, I know that.
> The fact that common shells do not use it is irrelevant.
>>> There isn't even any standard _term_ for talking about connecting to both
>>> stdin and stdout of a command in UNIX, and that's because it's an
>>> incredibly easy way to deadlock.
>> There is no need to have a term for it, since all you need is two pipes,
>> one for each direction
>
> Non-sequitur.  If it's a thing you need to do often, it's a thing you
> need to be able to talk about.  When I've thought of doing it, I've
> used the word David Bacon used in his SETL2 system, "pump".  And then
> I've used the phrase "looming disaster" and done something else.
>
>> - and it's probably uncommon enough to not
>> warrant its own convenience function. And you can indeed easily deadlock
>> if you don't think about what you're doing, but I really doubt that this
>> is the reason for any absence of terminology or functions.
>
> It's not that you can easily deadlock,
> it's that it's hard *NOT* to deadlock.
>
>>  (the risk is of course reduced due to the fact that the
>> VM does non-blocking I/O).
>
> And *that* is the thing that saves Erlang.  Of course, avoiding the
> coding complexity of dealing with nonblocking I/O is one of the reasons
> for using a multithreading language like Erlang.
>>> Just like it prevents C users from doing the same thing.
>> No, there is nothing that prevents C users from doing the same thing.
>
> You may have misunderstood me.
>>>>> THE POPEN INTERFACE <<<< prevents C users doing this.
> Yes, all the other functions are there, and yes, if you desperately
> want to program it, you can.  But it is enough work that nobody
> ever does this lightly.
>
> For that matter, it's not beyond the abilities of, say, the glibc
> authors, to extend their implementation of popen to support "r+"
> or "w+" modes, if there were much demand for it.  (Oddly enough,
> Mac OS X 10.7.5 _does_ support "r+" mode, but the Linux I checked
> does not.  Weird.)  I have _never_ been able to understand the
> differences between Mac OS X and POSIX.
>
> Per Hedeland <mailto:per@REDACTED>
> July 29, 2013 7:13 AM
> "Richard A. O'Keefe" <ok@REDACTED> wrote:
>> On 29/07/2013, at 4:14 PM, Anthony Grimes wrote:
>>
>>> and communicating with external processes in Erlang. They seem to have
>>> at least one particular fatal flaw which prevents them from being very
>>> useful to me, and that is that there is no way to close stdin (and send
>>> EOF) and then also read from the process's stdout. For example, I cannot
>>> use a port to start the 'cat' program which listens on stdin for data
>>> and waits for EOF and then echos that data back to you. I can do the
>>> first part, which is send it data on stdin, but the only way for me to
>>> close it is to call port_close and close the entire process.
>
> FWIW, I definitely agree that this is a missing piece of functionality.
> I'm not sure how useful/important it is in the grand scheme of things,
> but personally I could have used it on a couple of occasions. As you
> mentioned, it's basically the equivalent of TCP shutdown() that is
> needed, although shutdown() is perhaps a bit over-engineered - I've
> never seen anyone use SHUT_RD or SHUT_RDWR...
>
> Also the "opposite" functionality is already available for ports via the
> 'eof' option - i.e. you get informed that the other end has closed its
> write side, but can still write data in the other direction.
>
>> Note that "only send data to a command" and "only receive data from a
>> command" are the traditional ways for a UNIX program to communicate
>> with another over a pipe.
>
> Well, it's basically the definition of the traditional pipeline concept
> of the Unix shells, and pipes are obviously what you need to implement
> it - but that doesn't preclude other uses of pipes. The zsh shell even
> allows you to set up "bidirectional pipes" on the commandline.
>
>> popen(<command>, "r") reads the output of
>> the command and popen(<command>, "w") writes to the input of the command.
>
> popen() is effectively a convenience function to abstract away the
> somewhat non-trivial application of pipe(), fork(), close(), and
> execve() that is required to set things up correctly for two particular
> and common usages of pipes in application code. (It is not used by
> common shells to implement pipelines though.)
>
>> There isn't even any standard _term_ for talking about connecting to both
>> stdin and stdout of a command in UNIX, and that's because it's an
>> incredibly easy way to deadlock.
>
> There is no need to have a term for it, since all you need is two pipes,
> one for each direction - and it's probably uncommon enough to not
> warrant its own convenience function. And you can indeed easily deadlock
> if you don't think about what you're doing, but I really doubt that this
> is the reason for any absence of terminology or functions.
>
> But anyway, I don't see how any of this is relevant to the question at
> hand. Opening a bi-directional connection between two processes by means
> of a pair of pipes is exactly what erlang:open_port/2 *already does*
> when you use 'spawn' (or 'spawn_executable' these days) to start an
> external process. And it has been doing this since day one, and I can't
> recall anyone complaining how this is hopelessly dangerous due to the
> risk of deadlock (the risk is of course reduced due to the fact that the
> VM does non-blocking I/O).
>
>>> This issue prevents Erlang users from doing any even slightly more than
>>> trivial communication with external processes without having some kind
>>> of middleman program that handles the creation of the actual process you
>>> need to talk to and looks for a specific byte sequence to indicate 'EOF'.
>> Just like it prevents C users from doing the same thing.
>
> No, there is nothing that prevents C users from doing the same thing.
> And even if they have to go to some effort to do it, it just means
> having to write a bit more C - whereas the Erlang user can't write a bit
> more Erlang to do just the small addition of "close the write side of
> one of the pipes", even though the pipe pair is already there...
>
>> Unix anonymous pipes are simply the wrong tool for the job in _any_
>> programming language.
>>
>> The historic way to do "slightly more than trivial communication with
>> external processes" has been to set the external processes up as C nodes
>> or to use sockets.
>
> Using (TCP) sockets instead of pipes doesn't really change the "risk of
> deadlock". In the case of the Erlang VM (i.e. open_port vs gen_tcp), it
> may actually increase it, due to the existence of the passive and
> {active, once} modes for sockets - another piece of functionality that
> is "missing" from ports.
>
> --Per Hedeland
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130729/7f3bc32e/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: compose-unknown-contact.jpg
Type: image/jpeg
Size: 770 bytes
Desc: not available
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130729/7f3bc32e/attachment.jpg>