[erlang-questions] Slow line oriented IO

Lukas Larsson lukas@REDACTED
Fri Feb 15 16:42:48 CET 2013


aha, you needed to have very long lines in the file to provoke the
behaviour. The file I was experimenting on had the same line length all the
time.

I think I've found the culprit. Aparently the read_ahead option was ignored
after the first buffer had been consumed. Try the patch below and see if it
works for you:

>From 0884480691e047801cef117b19151a7d34c9e438 Mon Sep 17 00:00:00 2001
From: Lukas Larsson <lukas@REDACTED>
Date: Fri, 15 Feb 2013 16:37:52 +0100
Subject: [PATCH] invoke_read_line now remembers read_ahead settings

---
 erts/emulator/drivers/common/efile_drv.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/erts/emulator/drivers/common/efile_drv.c
b/erts/emulator/drivers/common/efile_drv.c
index 2279fec..d81f116 100644
--- a/erts/emulator/drivers/common/efile_drv.c
+++ b/erts/emulator/drivers/common/efile_drv.c
@@ -1160,6 +1160,8 @@ static void invoke_read_line(void *data)
            /* Need more place */
            ErlDrvSizeT need = (d->c.read_line.read_size >=
DEFAULT_LINEBUF_SIZE) ?
                d->c.read_line.read_size + DEFAULT_LINEBUF_SIZE :
DEFAULT_LINEBUF_SIZE;
+         if (need < (d->c.read_line.binp)->orig_size)
+           need = (d->c.read_line.binp)->orig_size;
            ErlDrvBinary   *newbin = driver_alloc_binary(need);
            if (newbin == NULL) {
                d->result_ok = 0;
-- 
1.7.10.4




On Fri, Feb 15, 2013 at 1:29 AM, Hynek Vychodil <vychodil.hynek@REDACTED>wrote:

> Using async thread drops time down to 4.6s which is still 20 times
> slower than mine simple wrapper.
>
> On Thu, Feb 14, 2013 at 4:25 PM, Lukas Larsson <lukas@REDACTED> wrote:
> > Hello!
> >
> > If you start erlang with an async thread (erl +A 1) you will see
> radically
> > different behaviour. For me the time it takes to use read_line drops from
> > 58s to 2s. When doing file i/o you should always have some async threads
> to
> > help you do the work, this is why in R16B we changed the default from 0
> to
> > 10 async threads.
> >
> > Lukas
> >
> >
> > On Thu, Feb 14, 2013 at 3:46 PM, Hynek Vychodil <
> vychodil.hynek@REDACTED>
> > wrote:
> >>
> >> Hello,
> >> I know it was been already discussed here in list and it is also
> recurring
> >> topic for at least five years. But anyway I have been bitten by it again
> >> and
> >> also found pretty pathological case. I have 30MB text file and it has a
> >> few
> >> near to 1MB lines there. (I can provide file with same line lengths if
> >> somebody interested.) What I have been observed is that reading this
> file
> >> using raw file:read_line/1 takes 51s! For comparison I have tried some
> >> different approaches and what I got (line_read_test:read_std/1 is using
> >> file:read_line/1):
> >>
> >> 1> timer:tc(line_read_test,read_std,["test.txt"]).
> >> {51028105,2408}
> >> 2> timer:tc(line_read_test,read,["test.txt"]).
> >> {226220,2408}
> >> 3> timer:tc(line_read_test,read_port,["test.txt"]).
> >> {139388,2408}
> >>
> >> $time perl -nE'$i++}{say $i' test.txt
> >> 2408
> >>
> >> real    0m0.053s
> >> user    0m0.044s
> >> sys     0m0.008s
> >>
> >> $ time wc -l test.txt
> >> 2408 test.txt
> >>
> >> real    0m0.013s
> >> user    0m0.004s
> >> sys     0m0.008s
> >>
> >> $ time ./a.out test.txt
> >> 2408
> >>
> >> real    0m0.020s
> >> user    0m0.012s
> >> sys     0m0.008s
> >>
> >> It means erlang should be at least 225 times faster
> (line_read_test:read/1
> >> which has flow control). Erlang can be 350 times faster
> >> (line_read_test:read_port/1 without flow control). Another high level
> >> language (perl) is almost thousand times faster. Special C program is
> >> almost
> >> four thousands times faster and old good glibc is two and half thousands
> >> times faster. Come on guys it is not even fun when simple (and wrong)
> >> erlang
> >> wrapper around standard module is more than two order of magnitude
> faster.
> >> From mine experience when there is something two orders of magnitude
> >> slower
> >> it tells me there is something damn wrong. I have looked into
> efile_drv.c
> >> and
> >> ti is unfortunately far beyond mine C skill but if simple buffering and
> >> binary:match/2 can outperform it 225 times there has to be something
> >> rotten
> >> in this code.
> >>
> >> I have also experimented with read_ahead option in file:open and
> changing
> >> to
> >> less value makes thing worse.
> >>
> >> Just to make grasp how bad it is, in same time I'm able sort 150 million
> >> 64bit values (1.2GB of data) three times (one CPU core same HW). It is
> not
> >> in
> >> flow control, mine simple wrapper does flow control too. It can't make
> >> current code less intrusive, if it consumes 100% CPU for 51s instead of
> >> 226ms
> >> then it will definitely affect whole server. It is not in concurrent
> >> access,
> >> mine code allows concurrent access too. Admitting there is something
> >> broken
> >> is first step to fixing it. I hope I helped at least in this way.
> >>
> >> With best regards
> >>   Hynek Vychodil
> >>
> >> _______________________________________________
> >> erlang-questions mailing list
> >> erlang-questions@REDACTED
> >> http://erlang.org/mailman/listinfo/erlang-questions
> >>
> >
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130215/17542ffa/attachment.htm>


More information about the erlang-questions mailing list