[erlang-questions] Slow line oriented IO
Hynek Vychodil
vychodil.hynek@REDACTED
Fri Feb 15 01:29:35 CET 2013
Using async thread drops time down to 4.6s which is still 20 times
slower than mine simple wrapper.
On Thu, Feb 14, 2013 at 4:25 PM, Lukas Larsson <lukas@REDACTED> wrote:
> Hello!
>
> If you start erlang with an async thread (erl +A 1) you will see radically
> different behaviour. For me the time it takes to use read_line drops from
> 58s to 2s. When doing file i/o you should always have some async threads to
> help you do the work, this is why in R16B we changed the default from 0 to
> 10 async threads.
>
> Lukas
>
>
> On Thu, Feb 14, 2013 at 3:46 PM, Hynek Vychodil <vychodil.hynek@REDACTED>
> wrote:
>>
>> Hello,
>> I know it was been already discussed here in list and it is also recurring
>> topic for at least five years. But anyway I have been bitten by it again
>> and
>> also found pretty pathological case. I have 30MB text file and it has a
>> few
>> near to 1MB lines there. (I can provide file with same line lengths if
>> somebody interested.) What I have been observed is that reading this file
>> using raw file:read_line/1 takes 51s! For comparison I have tried some
>> different approaches and what I got (line_read_test:read_std/1 is using
>> file:read_line/1):
>>
>> 1> timer:tc(line_read_test,read_std,["test.txt"]).
>> {51028105,2408}
>> 2> timer:tc(line_read_test,read,["test.txt"]).
>> {226220,2408}
>> 3> timer:tc(line_read_test,read_port,["test.txt"]).
>> {139388,2408}
>>
>> $time perl -nE'$i++}{say $i' test.txt
>> 2408
>>
>> real 0m0.053s
>> user 0m0.044s
>> sys 0m0.008s
>>
>> $ time wc -l test.txt
>> 2408 test.txt
>>
>> real 0m0.013s
>> user 0m0.004s
>> sys 0m0.008s
>>
>> $ time ./a.out test.txt
>> 2408
>>
>> real 0m0.020s
>> user 0m0.012s
>> sys 0m0.008s
>>
>> It means erlang should be at least 225 times faster (line_read_test:read/1
>> which has flow control). Erlang can be 350 times faster
>> (line_read_test:read_port/1 without flow control). Another high level
>> language (perl) is almost thousand times faster. Special C program is
>> almost
>> four thousands times faster and old good glibc is two and half thousands
>> times faster. Come on guys it is not even fun when simple (and wrong)
>> erlang
>> wrapper around standard module is more than two order of magnitude faster.
>> From mine experience when there is something two orders of magnitude
>> slower
>> it tells me there is something damn wrong. I have looked into efile_drv.c
>> and
>> ti is unfortunately far beyond mine C skill but if simple buffering and
>> binary:match/2 can outperform it 225 times there has to be something
>> rotten
>> in this code.
>>
>> I have also experimented with read_ahead option in file:open and changing
>> to
>> less value makes thing worse.
>>
>> Just to make grasp how bad it is, in same time I'm able sort 150 million
>> 64bit values (1.2GB of data) three times (one CPU core same HW). It is not
>> in
>> flow control, mine simple wrapper does flow control too. It can't make
>> current code less intrusive, if it consumes 100% CPU for 51s instead of
>> 226ms
>> then it will definitely affect whole server. It is not in concurrent
>> access,
>> mine code allows concurrent access too. Admitting there is something
>> broken
>> is first step to fixing it. I hope I helped at least in this way.
>>
>> With best regards
>> Hynek Vychodil
>>
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
>>
>
More information about the erlang-questions
mailing list