[erlang-questions] slow file I/O (was Not an Erlang fan)

Per Gustafsson <>
Thu Sep 27 17:53:59 CEST 2007


I've made another solution for this problem based on a line_server which 
  serves lists of lines taken from a file. It runs in about 8 secs on a 
2 p4 2.4 ghz machine using R11B-5 and in about 5 secs using an R-12 
pre-release, but the program should scale to more processors.

To compile the modules write:

erl -smp -make

To run the program write on your command line:

erl -smp -noshell -run pcount run filename n

where n is the number of processes you want to split the work on. Note 
that the modules are native compiled if possible.

Per

Steve Vinoski wrote:
> I've tried your code with the 1000k file on my MacBook Pro (2.33 GHz 2 core)
> with 2 GB RAM and a Linux box with 2 quad-core 2.33 GHz Intel Xeons with 8
> GB RAM. On the laptop, the best time I got was 17 secs. On the Linux system,
> the best time I got was 9.7 sec. Changing the number of processes doesn't
> seem to affect the result that much. Perhaps there's a better way to run it
> than just going into the erl shell, compiling it, and then running it?
> 
> So even with an elaborate program like this, Ruby still outperforms Erlang
> by at least 2x, closer to 3x, and yet Ruby is generally slow compared to
> Perl and Python. (Mind you, I use multiple programming languages every day,
> so I'm not approaching this from a "my language is better than yours"
> perspective. I don't have time for such nonsense.) I'm just really trying to
> understand why file I/O has to be so slow compared to these other languages.
> Ulf and Klacke have mentioned putting flow control on ports -- is that the
> right answer to this issue, and if so, can anyone who works on the code say
> whether there's anything in the works for that?
> 
> 
> BTW, Tim told me via email that he's working up a new solution, but I
> haven't seen it yet.
> 
> 
> thanks,
> --steve
> 
> 
> On 9/27/07, Caoyuan <> wrote:
> 
>>I wrote a solution in Erlang, with a parallel file data reading, plus
>>a simple express matching, for a 1 million lines file (200M size),
>>took 8.311 seconds when -smp enable, and 10.206 seconds when smp
>>disabled. The code is at:
>>http://blogtrader.net/page/dcaoyuan?entry=tim_bray_s_erlang_exercise
>>
>>My computer is a 2.0G 2-core MacBook, I'd like a see a result on
>>more-core machine :-)
>>
>>The solution spwans a lot of processes to parallel read the file to
>>binary. Then send them to a collect_loop, collect_loop will
>>buffered_read each chunk (when chunks order is correct), buffer_read
>>will convert binary to small (4096 bytes here) lists, then scan_line
>>will merge them to lines, and process_match on line.
>>
>>
>>On 9/27/07, Ulf Wiger (TN/EAB) <> wrote:
>>
>>>The Haskell code uses functions more similar to
>>>file:read_file/1 and file:read/2, which are not
>>>nearly as slow as io:get_line/2.
>>>
>>>I agree with Klacke, that if we could just get flow
>>>control on ports, then there is a perfectly fine
>>>line-oriented mode built into the port. It's so fast,
>>>that using it without flow control on a large file
>>>will swamp your erlang code (I tried that in the shootout,
>>>with disastrous results).
>>>
>>>http://www.erlang.org/pipermail/erlang-questions/2007-June/027557.html
>>>
>>>BR,
>>>Ulf W
>>>
>>>Steve Vinoski wrote:
>>>
>>>>The whole discussion surrounding the Erlang issues that Tim Bray
>>>>presented in his blog has got me wondering about file I/O. I'm
>>>>definitely no Erlang expert, but my own experiments seem to show that
>>>>Erlang file I/O is an insurmountable obstacle for the kind of problem
>>>>Tim's trying to solve. Unfortunately, clear details about file I/O
>>>>didn't seem to come out in the "Not an Erlang fan" thread.
>>>>
>>>>So, I'm wondering:
>>>>
>>>>1. Is file I/O for large files really as slow as it seems, and if so,
>>
>>why?
>>
>>>>2. Are there existing alternatives to the regular file module
>>
>>functions
>>
>>>>for file I/O that might skirt this problem, and if so, what are they?
>>>>3. Is the whole premise of this problem just not "how it's done" in
>>>>Erlang? If so, how would this problem be rearranged to better allow
>>
>>for
>>
>>>>an efficient Erlang solution on the large dataset?
>>>>4. Someone posted a link to a Haskell solution in a comment in my blog
>>>>that seems too good to be true:
>>>><http://www.serpentine.com/blog /2007/09/25/what-the-heck-is-a
>>
>>-wide-finder-anyway/
>>
>>>><http://www.serpentine.com/blog/2007/09/25/what-the-heck-is-a
>>
>>-wide-finder-anyway/>>.
>>
>>>>Assuming it's accurate, why does Haskell beat Erlang so handily in
>>
>>this
>>
>>>>situation?
>>>>5. If file I/O speed is really the issue that it seems to be, are
>>
>>there
>>
>>>>any plans to officially fix it?
>>>>
>>>>
>>>>thanks,
>>>>--steve
>>>>
>>>>
>>>>------------------------------------------------------------
>>
>>------------
>>
>>>>_______________________________________________
>>>>erlang-questions mailing list
>>>>
>>>>http://www.erlang.org/mailman/listinfo/erlang-questions
>>>
>>>______________________________ _________________
>>>erlang-questions mailing list
>>>
>>>http://www.erlang.org/mailman /listinfo/erlang-questions
>>>
>>
>>
>>
>>--
>>- Caoyuan
>>_______________________________________________
>>erlang-questions mailing list
>>
>>http://www.erlang.org/mailman/listinfo/erlang-questions
>>
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> erlang-questions mailing list
> 
> http://www.erlang.org/mailman/listinfo/erlang-questions

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: pcount.erl
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20070927/a6dee301/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: line_server.erl
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20070927/a6dee301/attachment-0001.ksh>


More information about the erlang-questions mailing list