[erlang-questions] slow file I/O (was Not an Erlang fan)

Thu Sep 27 11:24:42 CEST 2007

I wrote a solution in Erlang, with a parallel file data reading, plus
a simple express matching, for a 1 million lines file (200M size),
took 8.311 seconds when -smp enable, and 10.206 seconds when smp
disabled. The code is at:
http://blogtrader.net/page/dcaoyuan?entry=tim_bray_s_erlang_exercise

My computer is a 2.0G 2-core MacBook, I'd like a see a result on
more-core machine :-)

The solution spwans a lot of processes to parallel read the file to
binary. Then send them to a collect_loop, collect_loop will
buffered_read each chunk (when chunks order is correct), buffer_read
will convert binary to small (4096 bytes here) lists, then scan_line
will merge them to lines, and process_match on line.

On 9/27/07, Ulf Wiger (TN/EAB) <ulf.wiger@REDACTED> wrote:
>
> The Haskell code uses functions more similar to
> file:read_file/1 and file:read/2, which are not
> nearly as slow as io:get_line/2.
>
> I agree with Klacke, that if we could just get flow
> control on ports, then there is a perfectly fine
> line-oriented mode built into the port. It's so fast,
> that using it without flow control on a large file
> will swamp your erlang code (I tried that in the shootout,
> with disastrous results).
>
> http://www.erlang.org/pipermail/erlang-questions/2007-June/027557.html
>
> BR,
> Ulf W
>
> Steve Vinoski wrote:
> > The whole discussion surrounding the Erlang issues that Tim Bray
> > presented in his blog has got me wondering about file I/O. I'm
> > definitely no Erlang expert, but my own experiments seem to show that
> > Erlang file I/O is an insurmountable obstacle for the kind of problem
> > Tim's trying to solve. Unfortunately, clear details about file I/O
> > didn't seem to come out in the "Not an Erlang fan" thread.
> >
> > So, I'm wondering:
> >
> > 1. Is file I/O for large files really as slow as it seems, and if so, why?
> > 2. Are there existing alternatives to the regular file module functions
> > for file I/O that might skirt this problem, and if so, what are they?
> > 3. Is the whole premise of this problem just not "how it's done" in
> > Erlang? If so, how would this problem be rearranged to better allow for
> > an efficient Erlang solution on the large dataset?
> > 4. Someone posted a link to a Haskell solution in a comment in my blog
> > that seems too good to be true:
> > <http://www.serpentine.com/blog/2007/09/25/what-the-heck-is-a-wide-finder-anyway/
> > <http://www.serpentine.com/blog/2007/09/25/what-the-heck-is-a-wide-finder-anyway/>>.
> > Assuming it's accurate, why does Haskell beat Erlang so handily in this
> > situation?
> > 5. If file I/O speed is really the issue that it seems to be, are there
> > any plans to officially fix it?
> >
> >
> > thanks,
> > --steve
> >
> >
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > erlang-questions mailing list
> > erlang-questions@REDACTED
> > http://www.erlang.org/mailman/listinfo/erlang-questions
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions
>

-- 
- Caoyuan