<div dir="ltr"><div><div>Hello!<br><br></div>If you start erlang with an async thread (erl +A 1) you will see radically different behaviour. For me the time it takes to use read_line drops from 58s to 2s. When doing file i/o you should always have some async threads to help you do the work, this is why in R16B we changed the default from 0 to 10 async threads. <br>
</div><div><div><br></div><div>Lukas<br></div></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, Feb 14, 2013 at 3:46 PM, Hynek Vychodil <span dir="ltr"><<a href="mailto:vychodil.hynek@gmail.com" target="_blank">vychodil.hynek@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hello,<br>
I know it was been already discussed here in list and it is also recurring<br>
topic for at least five years. But anyway I have been bitten by it again and<br>
also found pretty pathological case. I have 30MB text file and it has a few<br>
near to 1MB lines there. (I can provide file with same line lengths if<br>
somebody interested.) What I have been observed is that reading this file<br>
using raw file:read_line/1 takes 51s! For comparison I have tried some<br>
different approaches and what I got (line_read_test:read_std/1 is using<br>
file:read_line/1):<br>
<br>
1> timer:tc(line_read_test,read_std,["test.txt"]).<br>
{51028105,2408}<br>
2> timer:tc(line_read_test,read,["test.txt"]).<br>
{226220,2408}<br>
3> timer:tc(line_read_test,read_port,["test.txt"]).<br>
{139388,2408}<br>
<br>
$time perl -nE'$i++}{say $i' test.txt<br>
2408<br>
<br>
real 0m0.053s<br>
user 0m0.044s<br>
sys 0m0.008s<br>
<br>
$ time wc -l test.txt<br>
2408 test.txt<br>
<br>
real 0m0.013s<br>
user 0m0.004s<br>
sys 0m0.008s<br>
<br>
$ time ./a.out test.txt<br>
2408<br>
<br>
real 0m0.020s<br>
user 0m0.012s<br>
sys 0m0.008s<br>
<br>
It means erlang should be at least 225 times faster (line_read_test:read/1<br>
which has flow control). Erlang can be 350 times faster<br>
(line_read_test:read_port/1 without flow control). Another high level<br>
language (perl) is almost thousand times faster. Special C program is almost<br>
four thousands times faster and old good glibc is two and half thousands<br>
times faster. Come on guys it is not even fun when simple (and wrong) erlang<br>
wrapper around standard module is more than two order of magnitude faster.<br>
>From mine experience when there is something two orders of magnitude slower<br>
it tells me there is something damn wrong. I have looked into efile_drv.c and<br>
ti is unfortunately far beyond mine C skill but if simple buffering and<br>
binary:match/2 can outperform it 225 times there has to be something rotten<br>
in this code.<br>
<br>
I have also experimented with read_ahead option in file:open and changing to<br>
less value makes thing worse.<br>
<br>
Just to make grasp how bad it is, in same time I'm able sort 150 million<br>
64bit values (1.2GB of data) three times (one CPU core same HW). It is not in<br>
flow control, mine simple wrapper does flow control too. It can't make<br>
current code less intrusive, if it consumes 100% CPU for 51s instead of 226ms<br>
then it will definitely affect whole server. It is not in concurrent access,<br>
mine code allows concurrent access too. Admitting there is something broken<br>
is first step to fixing it. I hope I helped at least in this way.<br>
<br>
With best regards<br>
<span class="HOEnZb"><font color="#888888"> Hynek Vychodil<br>
</font></span><br>_______________________________________________<br>
erlang-questions mailing list<br>
<a href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a><br>
<a href="http://erlang.org/mailman/listinfo/erlang-questions" target="_blank">http://erlang.org/mailman/listinfo/erlang-questions</a><br>
<br></blockquote></div><br></div>