[erlang-questions] Not an Erlang fan

Mon Sep 24 16:54:23 CEST 2007

On 9/24/07, Steve Vinoski <vinoski@REDACTED> wrote:
>
> On 9/23/07, Bob Ippolito <bob@REDACTED> wrote:
> >
> > On 9/24/07, Patrick Logan <patrickdlogan@REDACTED> wrote:
> > > > > >  http://www.tbray.org/ongoing/When/200x/2007/09/22/Erlang
> > > > > >
> > > > > > Tim Bray might raise some valid points here, even if he's
> > slightly
> > > > > > biased by his background.
> > >
> > > The good news is speeding up the i/o in erlang should be easier than
> > > introducing better concurrency to another language.
> > >
> >
> > I've never had a problem with Erlang's general I/O performance, it's
> > probably just some implementation detail of direct file I/O that is
> > the loser here. The obvious Erlang fast path to read lines is to spawn
> > cat and let the port machinery do all of the work for you. Here's an
> > example (including a copy of Tim's dataset):
> >
> > http://undefined.org/erlang/o10k.zip
> >
>
> I posted a link in a comment to Tim's blog to an example that uses
> multiple processes to break down the expensive parts of processing Tim's
> dataset in parallel, and was able to achieve a pure Erlang approach that on
> my MacBook Pro equals your "cat" approach, and is much faster than "cat" on
> an 8-core machine. It's shown on my blog:
>
>
> <http://steve.vinoski.net/blog/2007/09/23/tim-bray-and-erlang/>
>
> It definitely speeds up as the number of cores goes up.
>
>
> I don't consider myself an Erlang expert and so welcome any suggestions
> for improving this. I'm guessing someone will see the two instances of "++"
> list handling and jump on that, but I tried it with the typical reverse
> approach and with flattening and neither was faster. However I am quite open
> to being enlightened. :-)
>

Just a follow-up: a couple people have mentioned that I must be missing the
fact that Tim's sample dataset in 100 times smaller than the real dataset.
No, I'm not missing that, as I explained in a comment on my blog.

I think it's obvious that any solution that counts on reading in the whole
file at once, like mine does, will have trouble with the full dataset. For
that, I think a combination of Bob's "cat port" and my multiprocess line
analysis would yield the best of both worlds.

--steve
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20070924/064c033a/attachment.htm>