[erlang-questions] benchmarks game harsh criticism (was Learning Erlang from the scratch)

Sat Nov 24 12:56:43 CET 2007

"Ulf Wiger (TN/EAB)" <ulf.wiger@REDACTED> wrote:
>
>When playing with the benchmark dealing with line-oriented
>input, I experimented with the line-oriented socket option.

I think you mean port option.

>It was fast - so fast, in fact, that the Erlang program couldn't
>keep up, even though it ran in the tightest loop possible.
>To solve this, we wouldn't have to make the system unsafe. We'd
>need to implement flow control on port input, much like that
>which already exists in the inet driver. Erlang would be better
>for it - not worse.
>
>I think this is a good finding.

I'm afraid I'll have to challenge it though (I think I already did so,
but I guess I didn't make my point very well). It says nothing at all
about how fast the port I/O is, only that design choices in the VM when
it comes to the relative frequency of polling for I/O vs scheduling
processes are such that if input is always available, you really can't
get much done in your Erlang code. I.e. it's only about the relative
amount of processing allocated to doing I/O and running Erlang, not
about absolute speed.

And using a "raw" port a.k.a. one of the builtin fd/spawn drivers for
reading from a disk file is rather "silly" - it's nice because it allows
(or can allow) for the "everything is a file" concept, and work
indpendent of whether the I/O channel refers to an actual file or to a
pipe/FIFO/socket, but it means that you keep calling poll() to get an
answer that will always be the same when the input actually is a file -
surely not optimal.

I think what is needed to get good file I/O performance in Erlang is
something very like C stdio, *and* having things arranged such that
using this functionality is obvious/transparent to the user. I.e.
file:open() followed by (e.g.) io:get_line() should just result in
"passively" reading data via buffering, and not involve intermediary
Erlang processes and protocols designed for interactive use. And the
Erlang VM can do this better than C stdio via read-ahead, i.e. get more
data "in parallell" before the buffer is empty and forces the
application process to block waiting for the file system.

Flow control on fd/spawn ports could still be very useful (e.g. when
running Erlang in a Unix pipeline, or having a third-party port program
that just spews data at you as fast as it can) - but it's not relevant
for actual file I/O - at least not file *input*.

--Per Hedeland