[erlang-questions] slow file I/O (was Not an Erlang fan)

Thu Sep 27 16:41:44 CEST 2007

I've tried your code with the 1000k file on my MacBook Pro (2.33 GHz 2 core)
with 2 GB RAM and a Linux box with 2 quad-core 2.33 GHz Intel Xeons with 8
GB RAM. On the laptop, the best time I got was 17 secs. On the Linux system,
the best time I got was 9.7 sec. Changing the number of processes doesn't
seem to affect the result that much. Perhaps there's a better way to run it
than just going into the erl shell, compiling it, and then running it?

So even with an elaborate program like this, Ruby still outperforms Erlang
by at least 2x, closer to 3x, and yet Ruby is generally slow compared to
Perl and Python. (Mind you, I use multiple programming languages every day,
so I'm not approaching this from a "my language is better than yours"
perspective. I don't have time for such nonsense.) I'm just really trying to
understand why file I/O has to be so slow compared to these other languages.
Ulf and Klacke have mentioned putting flow control on ports -- is that the
right answer to this issue, and if so, can anyone who works on the code say
whether there's anything in the works for that?

BTW, Tim told me via email that he's working up a new solution, but I
haven't seen it yet.

thanks,
--steve

On 9/27/07, Caoyuan <dcaoyuan@REDACTED> wrote:
>
> I wrote a solution in Erlang, with a parallel file data reading, plus
> a simple express matching, for a 1 million lines file (200M size),
> took 8.311 seconds when -smp enable, and 10.206 seconds when smp
> disabled. The code is at:
> http://blogtrader.net/page/dcaoyuan?entry=tim_bray_s_erlang_exercise
>
> My computer is a 2.0G 2-core MacBook, I'd like a see a result on
> more-core machine :-)
>
> The solution spwans a lot of processes to parallel read the file to
> binary. Then send them to a collect_loop, collect_loop will
> buffered_read each chunk (when chunks order is correct), buffer_read
> will convert binary to small (4096 bytes here) lists, then scan_line
> will merge them to lines, and process_match on line.
>
>
> On 9/27/07, Ulf Wiger (TN/EAB) <ulf.wiger@REDACTED> wrote:
> >
> > The Haskell code uses functions more similar to
> > file:read_file/1 and file:read/2, which are not
> > nearly as slow as io:get_line/2.
> >
> > I agree with Klacke, that if we could just get flow
> > control on ports, then there is a perfectly fine
> > line-oriented mode built into the port. It's so fast,
> > that using it without flow control on a large file
> > will swamp your erlang code (I tried that in the shootout,
> > with disastrous results).
> >
> > http://www.erlang.org/pipermail/erlang-questions/2007-June/027557.html
> >
> > BR,
> > Ulf W
> >
> > Steve Vinoski wrote:
> > > The whole discussion surrounding the Erlang issues that Tim Bray
> > > presented in his blog has got me wondering about file I/O. I'm
> > > definitely no Erlang expert, but my own experiments seem to show that
> > > Erlang file I/O is an insurmountable obstacle for the kind of problem
> > > Tim's trying to solve. Unfortunately, clear details about file I/O
> > > didn't seem to come out in the "Not an Erlang fan" thread.
> > >
> > > So, I'm wondering:
> > >
> > > 1. Is file I/O for large files really as slow as it seems, and if so,
> why?
> > > 2. Are there existing alternatives to the regular file module
> functions
> > > for file I/O that might skirt this problem, and if so, what are they?
> > > 3. Is the whole premise of this problem just not "how it's done" in
> > > Erlang? If so, how would this problem be rearranged to better allow
> for
> > > an efficient Erlang solution on the large dataset?
> > > 4. Someone posted a link to a Haskell solution in a comment in my blog
> > > that seems too good to be true:
> > > <http://www.serpentine.com/blog /2007/09/25/what-the-heck-is-a
> -wide-finder-anyway/
> > > <http://www.serpentine.com/blog/2007/09/25/what-the-heck-is-a
> -wide-finder-anyway/>>.
> > > Assuming it's accurate, why does Haskell beat Erlang so handily in
> this
> > > situation?
> > > 5. If file I/O speed is really the issue that it seems to be, are
> there
> > > any plans to officially fix it?
> > >
> > >
> > > thanks,
> > > --steve
> > >
> > >
> > > ------------------------------------------------------------
> ------------
> > >
> > > _______________________________________________
> > > erlang-questions mailing list
> > > erlang-questions@REDACTED
> > > http://www.erlang.org/mailman/listinfo/erlang-questions
> >
> > ______________________________ _________________
> > erlang-questions mailing list
> > erlang-questions@REDACTED
> > http://www.erlang.org/mailman /listinfo/erlang-questions
> >
>
>
>
> --
> - Caoyuan
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20070927/623d190c/attachment.htm>