<div dir="ltr"><div><div>Hello!<br><br></div>If you start erlang with an async thread (erl +A 1) you will see radically different behaviour. For me the time it takes to use read_line drops from 58s to 2s. When doing file i/o you should always have some async threads to help you do the work, this is why in R16B we changed the default from 0 to 10 async threads. <br>

</div><div><div><br></div><div>Lukas<br></div></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, Feb 14, 2013 at 3:46 PM, Hynek Vychodil <span dir="ltr"><<a href="mailto:vychodil.hynek@gmail.com" target="_blank">vychodil.hynek@gmail.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hello,<br>

I know it was been already discussed here in list and it is also recurring<br>

topic for at least five years. But anyway I have been bitten by it again and<br>

also found pretty pathological case. I have 30MB text file and it has a few<br>

near to 1MB lines there. (I can provide file with same line lengths if<br>

somebody interested.) What I have been observed is that reading this file<br>

using raw file:read_line/1 takes 51s! For comparison I have tried some<br>

different approaches and what I got (line_read_test:read_std/1 is using<br>

file:read_line/1):<br>

<br>

1> timer:tc(line_read_test,read_std,["test.txt"]).<br>

{51028105,2408}<br>

2> timer:tc(line_read_test,read,["test.txt"]).<br>

{226220,2408}<br>

3> timer:tc(line_read_test,read_port,["test.txt"]).<br>

{139388,2408}<br>

<br>

$time perl -nE'$i++}{say $i' test.txt<br>

2408<br>

<br>

real    0m0.053s<br>

user    0m0.044s<br>

sys     0m0.008s<br>

<br>

$ time wc -l test.txt<br>

2408 test.txt<br>

<br>

real    0m0.013s<br>

user    0m0.004s<br>

sys     0m0.008s<br>

<br>

$ time ./a.out test.txt<br>

2408<br>

<br>

real    0m0.020s<br>

user    0m0.012s<br>

sys     0m0.008s<br>

<br>

It means erlang should be at least 225 times faster (line_read_test:read/1<br>

which has flow control). Erlang can be 350 times faster<br>

(line_read_test:read_port/1 without flow control). Another high level<br>

language (perl) is almost thousand times faster. Special C program is almost<br>

four thousands times faster and old good glibc is two and half thousands<br>

times faster. Come on guys it is not even fun when simple (and wrong) erlang<br>

wrapper around standard module is more than two order of magnitude faster.<br>

>From mine experience when there is something two orders of magnitude slower<br>

it tells me there is something damn wrong. I have looked into efile_drv.c and<br>

ti is unfortunately far beyond mine C skill but if simple buffering and<br>

binary:match/2 can outperform it 225 times there has to be something rotten<br>

in this code.<br>

<br>

I have also experimented with read_ahead option in file:open and changing to<br>

less value makes thing worse.<br>

<br>

Just to make grasp how bad it is, in same time I'm able sort 150 million<br>

64bit values (1.2GB of data) three times (one CPU core same HW). It is not in<br>

flow control, mine simple wrapper does flow control too. It can't make<br>

current code less intrusive, if it consumes 100% CPU for 51s instead of 226ms<br>

then it will definitely affect whole server. It is not in concurrent access,<br>

mine code allows concurrent access too. Admitting there is something broken<br>

is first step to fixing it. I hope I helped at least in this way.<br>

<br>

With best regards<br>

<span class="HOEnZb"><font color="#888888">  Hynek Vychodil<br>

</font></span><br>_______________________________________________<br>

erlang-questions mailing list<br>

<a href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a><br>

<a href="http://erlang.org/mailman/listinfo/erlang-questions" target="_blank">http://erlang.org/mailman/listinfo/erlang-questions</a><br>

<br></blockquote></div><br></div>