[erlang-questions] Slow line oriented IO

Thu Feb 14 16:25:30 CET 2013

Hello!

If you start erlang with an async thread (erl +A 1) you will see radically
different behaviour. For me the time it takes to use read_line drops from
58s to 2s. When doing file i/o you should always have some async threads to
help you do the work, this is why in R16B we changed the default from 0 to
10 async threads.

Lukas

On Thu, Feb 14, 2013 at 3:46 PM, Hynek Vychodil <vychodil.hynek@REDACTED>wrote:

> Hello,
> I know it was been already discussed here in list and it is also recurring
> topic for at least five years. But anyway I have been bitten by it again
> and
> also found pretty pathological case. I have 30MB text file and it has a few
> near to 1MB lines there. (I can provide file with same line lengths if
> somebody interested.) What I have been observed is that reading this file
> using raw file:read_line/1 takes 51s! For comparison I have tried some
> different approaches and what I got (line_read_test:read_std/1 is using
> file:read_line/1):
>
> 1> timer:tc(line_read_test,read_std,["test.txt"]).
> {51028105,2408}
> 2> timer:tc(line_read_test,read,["test.txt"]).
> {226220,2408}
> 3> timer:tc(line_read_test,read_port,["test.txt"]).
> {139388,2408}
>
> $time perl -nE'$i++}{say $i' test.txt
> 2408
>
> real    0m0.053s
> user    0m0.044s
> sys     0m0.008s
>
> $ time wc -l test.txt
> 2408 test.txt
>
> real    0m0.013s
> user    0m0.004s
> sys     0m0.008s
>
> $ time ./a.out test.txt
> 2408
>
> real    0m0.020s
> user    0m0.012s
> sys     0m0.008s
>
> It means erlang should be at least 225 times faster (line_read_test:read/1
> which has flow control). Erlang can be 350 times faster
> (line_read_test:read_port/1 without flow control). Another high level
> language (perl) is almost thousand times faster. Special C program is
> almost
> four thousands times faster and old good glibc is two and half thousands
> times faster. Come on guys it is not even fun when simple (and wrong)
> erlang
> wrapper around standard module is more than two order of magnitude faster.
> From mine experience when there is something two orders of magnitude slower
> it tells me there is something damn wrong. I have looked into efile_drv.c
> and
> ti is unfortunately far beyond mine C skill but if simple buffering and
> binary:match/2 can outperform it 225 times there has to be something rotten
> in this code.
>
> I have also experimented with read_ahead option in file:open and changing
> to
> less value makes thing worse.
>
> Just to make grasp how bad it is, in same time I'm able sort 150 million
> 64bit values (1.2GB of data) three times (one CPU core same HW). It is not
> in
> flow control, mine simple wrapper does flow control too. It can't make
> current code less intrusive, if it consumes 100% CPU for 51s instead of
> 226ms
> then it will definitely affect whole server. It is not in concurrent
> access,
> mine code allows concurrent access too. Admitting there is something broken
> is first step to fixing it. I hope I helped at least in this way.
>
> With best regards
>   Hynek Vychodil
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130214/82f0e3ab/attachment.htm>