[erlang-questions] Not an Erlang fan

Claes Wikstrom klacke@REDACTED
Mon Sep 24 19:55:12 CEST 2007


Bob Ippolito wrote:
> On 9/24/07, Patrick Logan <patrickdlogan@REDACTED> wrote:
>>>>>  http://www.tbray.org/ongoing/When/200x/2007/09/22/Erlang
>>>>>
>>>>> Tim Bray might raise some valid points here, even if he's slightly
>>>>> biased by his background.
>> The good news is speeding up the i/o in erlang should be easier than
>> introducing better concurrency to another language.
>>
> 
> I've never had a problem with Erlang's general I/O performance, it's
> probably just some implementation detail of direct file I/O that is
> the loser here. The obvious Erlang fast path to read lines is to spawn
> cat and let the port machinery do all of the work for you. Here's an
> example (including a copy of Tim's dataset):
> 

spawning cat is actually a very bad idea - It'll lead to all
kinds of havoc since ports don't by themselves have any means
to do flow control.

Spawning cat will lead to a situation where a whole lot of messages
will be sent to the owner of the port. When we do socket I/O this is
the equivalent of having {active ,true} on a socket, we'll just get
a lot of messages and if we cannot process the messages at the same
speed or higher than we receive them our message inbox will fill up
and we'll just be spending time in the garbage collector - bad.

We need the equivalent of {active once} on ports to do this
efficiently. There exists a very nice {line, L} mode for ports, but
that still doesn't solve the problem of flow control.

No, the only fast way today to process a large file line/by/line is to

1. file:open(Filename, [read, raw])
2. In a loop {ok, Bin} = file:read(Fd, BufSize),
3. Use a binary regex matcher such as
    http://yaws.hyber.org/download/posregex-1.0.tgz

(I don't know the state of the regex lib in OTP today, last time
  I looked it sucked bigtime though)

/klacke



More information about the erlang-questions mailing list