[erlang-questions] Issues with stdin on ports

Tue Jul 30 08:02:37 CEST 2013

I apologize for getting terminology wrong, I guess. I've never heard
these terms before in reference to working with external processes. I'm
still not really getting it though.

The problem here is that Erlang ports let me read from stdout and write
to stdin, but not let me flush stdin and tell the program that I'm done
writing, which is important for certain programs that wait for this EOF
to start producing output. What am I saying that's wrong? I'm not saying
these are the same things. In fact, I'm not trying to make any
assertions about how unix processes and pipes works, and if it sounds
like that then I apologize and retract any assertions I've made.

I'm simply asking how I'm supposed to deal with programs that I have
*no* control over that wait for a EOF on their input to start producing
data. I just want to send a ^D.I have examples of programs that do that
and a thousand ways to do it in other languages, and I'm being told it
is disastrous to do. If the solution is to just keep doing what
everybody has been doing in Erlang for years which is write giant hack
middleman programs to do it, then I'll concede defeat. I've got to say
though, I'm pretty blown away by your response to this.

Anyways, thanks for the responses everyone!

> Richard A. O'Keefe <mailto:ok@REDACTED>
> July 29, 2013 10:44 PM
> On 29/07/2013, at 8:20 PM, Anthony Grimes wrote:
>
>> Yeah, re-reading your post a couple of times, I think we might be on the wrong page or something. Here is a low level example of what I'd like to be able to do in Erlang. This is a Clojure repl session where I interact with the 'cat' program via the built in Java Process library:
>>
>> user=> (def proc (.exec (Runtime/getRuntime) (into-array String ["cat"])))
>> #'user/proc
>> user=> (def stdin (.getOutputStream proc))
>> #'user/stdin
>> user=> (def stdout (.getInputStream proc))
>
> I have some trouble reading Clojure.  I don't know what the dots are.
> Hazarding a guess,
>
> 	This is *PRECISELY* the "Hello, deadlock!"
> 	kind of buggy stuff that the C interface was designed
> 	to *not* let you write.
>
>> Lots of unix programs work like this.
>> We have cat in this example, but grep, wc, and various others work like that as well. It is this easy or easier to do the same thing in every other language I can think of.
>
> Actually, NO.  You are talking about "filters" here,
> and filters are designed to be connected into ***ACYCLIC*** networks.
>
>> If it's fundamentally a bad thing, I'm surprised these programs work like that in the first place and that these languages support this.
>
> The programs do NOT work the way you think they do.
> A filter reads from its standard input.
> It writes to its standard output.
> If it could have emotions, it would view the prospect
> of those two being the *same* thing with shuddering dread.
> (Except of course, when the thing is the terminal.  The
> user is assumed to be capable of infinite buffering.)
>
> Erlang is perfectly happy to be connected to an ACYCLIC network
> of pipe-linked processes too.
>
>> It seems to be an entirely common place, basic feature any remotely high level programming language.
>
> Actually, no.  The ability to connect to the standard input *AND* the standard
> output of the *same* process is *not* a commonplace feature of high level
> programming languages (some do, some don't) because unless you code with
> extreme (and to a certain extent, non-portable) care, you end up in deadlock land.
>
> Only if one of the programs is absolutely guaranteed to write a tiny
> amount of information -- at most one PIPE_BUF worth, do you have
> any shadow of a trace of a right to expect it to work.
>
> If you don't believe me, believe the Java documentation,
> where the page for java.lang.Process says
>
> 	All [the new process's] standard io (i.e. stdin, stdout, stderr)
> 	operations will be redirected to the parent process through
> 	three streams (getOutputStream(), getInputStream(),
> 	getErrorStream()).  The parent process uses these streams to feed input to and get 	output from the subprocess.
>>>>>>> 	Because some native platforms only provide limited buffer size
>>>>>>> 	for standard input and output streams, failure to promptly
>>>>>>> 	write the input stream or read the output stream of the
>>>>>>> 	subprocess may cause the subprocess to block, and even deadlock.
>
> The POSIX guarantee for PIPE_BUF is just 512 bytes.
> That is, should the parent process write 513 bytes to the child,
> and the child write 513 bytes to the parent,
> hello deadlock!
>
> Like I said, connecting to *both* ends of a command through pipes
> is something to anticipate with shuddering dread.  It is *not* a
> standard feature to be used lightly.
>
> I can't find anything about external processes in the Haskell 2010
> report.  System.Process
> http://www.haskell.org/ghc/docs/7.4-latest/html/libraries/process-1.1.0.1/System-Process.html
> isn't mentioned in Haskell 2010.  I am actually pretty shocked that
> the documentation doesn't mention the deadlock problem.
>
>
>
> Anthony Grimes <mailto:i@REDACTED>
> July 29, 2013 1:20 AM
> Yeah, re-reading your post a couple of times, I think we might be on
> the wrong page or something. Here is a low level example of what I'd
> like to be able to do in Erlang. This is a Clojure repl session where
> I interact with the 'cat' program via the built in Java Process library:
>
> user=> (def proc (.exec (Runtime/getRuntime) (into-array String ["cat"])))
> #'user/proc
> user=> (def stdin (.getOutputStream proc))
> #'user/stdin
> user=> (def stdout (.getInputStream proc))
> #'user/stdout
> user=> (.write stdin (.getBytes "Hi!"))
> nil
> user=> (.close stdin)
> nil
> user=> (let [arr (byte-array 3)] (.read stdout arr) (String. arr))
> "Hi!"
>
> Lots of unix programs work like this. We have cat in this example, but
> grep, wc, and various others work like that as well. It is this easy
> or easier to do the same thing in every other language I can think of.
> If it's fundamentally a bad thing, I'm surprised these programs work
> like that in the first place and that these languages support this. It
> seems to be an entirely common place, basic feature any remotely high
> level programming language.
>
> Perhaps this example and clarification will clear things up!
>
> -Anthony
> Richard A. O'Keefe <mailto:ok@REDACTED>
> July 29, 2013 12:15 AM
> On 29/07/2013, at 4:14 PM, Anthony Grimes wrote:
>
>> and communicating with external processes in Erlang. They seem to have
>> at least one particular fatal flaw which prevents them from being very
>> useful to me, and that is that there is no way to close stdin (and send
>> EOF) and then also read from the process's stdout. For example, I cannot
>> use a port to start the 'cat' program which listens on stdin for data
>> and waits for EOF and then echos that data back to you. I can do the
>> first part, which is send it data on stdin, but the only way for me to
>> close it is to call port_close and close the entire process.
>
> Note that "only send data to a command" and "only receive data from a
> command" are the traditional ways for a UNIX program to communicate
> with another over a pipe.  popen(<command>, "r") reads the output of
> the command and popen(<command>, "w") writes to the input of the command.
> There isn't even any standard _term_ for talking about connecting to both
> stdin and stdout of a command in UNIX, and that's because it's an
> incredibly easy way to deadlock.
>
>> This issue prevents Erlang users from doing any even slightly more than
>> trivial communication with external processes without having some kind
>> of middleman program that handles the creation of the actual process you
>> need to talk to and looks for a specific byte sequence to indicate 'EOF'.
>
> Just like it prevents C users from doing the same thing.
> Unless they fake something up using named pipes or UNIX-domain sockets.
> (Or message queues.  I do wish Mac OS X implemented rather more of POSIX...)
>
> Unix anonymous pipes are simply the wrong tool for the job in _any_
> programming language.
>
> The historic way to do "slightly more than trivial communication with
> external processes" has been to set the external processes up as C nodes
> or to use sockets.
>
>
>
>
> Anthony Grimes <mailto:i@REDACTED>
> July 28, 2013 9:14 PM
> Howdy folks.
>
> I unfortunately have not been able to use Erlang for most of what I've
> been doing lately because of a long standing issue with Erlang ports
> that I'd like to start a discussion about here.
>
> As far as I am aware, ports are generally the only option for creating
> and communicating with external processes in Erlang. They seem to have
> at least one particular fatal flaw which prevents them from being very
> useful to me, and that is that there is no way to close stdin (and send
> EOF) and then also read from the process's stdout. For example, I cannot
> use a port to start the 'cat' program which listens on stdin for data
> and waits for EOF and then echos that data back to you. I can do the
> first part, which is send it data on stdin, but the only way for me to
> close it is to call port_close and close the entire process.
>
> This issue prevents Erlang users from doing any even slightly more than
> trivial communication with external processes without having some kind
> of middleman program that handles the creation of the actual process you
> need to talk to and looks for a specific byte sequence to indicate 'EOF'.
>
> I could totally be wrong, but it seems like we need something other than
> just port_close. Something like
> http://www.erlang.org/doc/man/gen_tcp.html#shutdown-2
> <http://www.erlang.org/doc/man/gen_tcp.html#shutdown-2> which lets you say
> "Hey, I want to close the stdin of this process but still read from its
> stdout." or something similar. I could be totally off track on what a
> good solution would be.
>
> So I'm wondering if people are aware of this problem, and I'd like to
> make sure that people think it is an actual problem that should be
> fixed. I'm also curious what people think a good solution to the problem
> would be. I'm not sure I have the time/particular skill set to fix it
> given that the port code is some pretty obscure (to me) C code, but
> starting conversation seems like a good way to begin.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130729/4dcc3a4f/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: compose-unknown-contact.jpg
Type: image/jpeg
Size: 770 bytes
Desc: not available
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130729/4dcc3a4f/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: postbox-contact.jpg
Type: image/jpeg
Size: 1188 bytes
Desc: not available
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130729/4dcc3a4f/attachment-0001.jpg>