[erlang-questions] Regular expression library (was Not anErlang fan)

David Mercer <>
Fri Oct 5 17:33:07 CEST 2007


Very very cool.  Worth the read.  Thanks.

Cheers,

David

-----Original Message-----
From:  [mailto:] On Behalf Of Yariv Sadan
Sent: Thursday, October 04, 2007 13:24
To: 
Cc: 
Subject: Re: [erlang-questions] Regular expression library (was Not anErlang
fan)

You can create its abstract forms and then compile them into a .beam
binary and load it into the VM at runtime. ErlyWeb uses this
extensively. Check out smerl.erl (http://code.google.com/p/erlyweb).

Yariv

On 10/4/07, David Mercer <> wrote:
>
>
>
>
> How do you create a module on the fly?  Wouldn't you have to write it to
> disk and then compile and load it?  Or is there another way?
>
>
>
> Cheers,
>
>
>
> David
>
>
>
>  ________________________________
>
>
> From: 
> [mailto:] On Behalf Of
> Robert Virding
>  Sent: Wednesday, October 03, 2007 16:32
>  To: Yariv Sadan
>  Cc: ; G Bulmer
>  Subject: Re: [erlang-questions] Regular expression library (was Not
> anErlang fan)
>
>
>
>
> That's not too difficult to do at all. The speed you get would depend on
> which regexp family you do it for.
>
>  Robert
>
>
> On 27/09/2007, Yariv Sadan <> wrote:
>
> I was wondering if it would be possible/beneficial to write a regexp
>  compiler that takes a regexp string and compiles it into an Erlang
>  module.
>
>  You would use it as follows:
>
>  regexp2:compile("f*o+o?", myregexp, [native]),
>  true = myregexp:matches("foo").
>
>  This could be a very fast regexp engine.
>
>  What do you think?
>
>  Yariv
>
>  On 9/26/07, Robert Virding <> wrote:
>  > The regular expression module which is included in the distribution is
a
>  > simple rather naïve one written in Erlang which uses a back-tracking
>  > algorithm directly on the AST. It has all the problems Russ Cox
mentioned
> in
>  > his paper. I wrote it long ago when no one was interested in parsing
> regexps
>  > in Erlang. :-)
>  >
>  > I am rewriting it at the moment, still in Erlang but using the same
>  > algorithm, or similar to the one, which Russ describes. It is about the
> same
>  > speed as the existing one for simpler regexps but it can handle all the
>  > pathological cases as it should. It will handle POSIX syntax and
> semantics
>  > (but no back-references*). There will also be a version which will
> implement
>  > a subset of PERL regexps with PERL semantics. Both will work on either
> lists
>  > or binaries.
>  >
>  > I hadn't seen the lambda-the-ultimate discussion you mentioned and will
>  > check it out.
>  >
>  > There are a number of regexp packages for Erlang which use PCRE or
other
>  > libraries which I don't have references to. Look in the "Not an Erlang
> fan"
>  > thread here which gives some references to them.
>  >
>  > An interesting alternative would be to write the regexp parser/compiler
> in
>  > Erlang and have the basic regexp matcher in the emulator. This actually
>  > wouldn't be so big, I think. Get the best of both worlds. An I used to
>  > complain (still do in fact) that the emulator is to big. :-)
>  >
>  > Robert
>  >
>  > * Actually there are two types of POSIX regexps in the standard, one
> which
>  > has back-references and one which doesn't.
>  >
>  >
>  >
>  > On 25/09/2007, G Bulmer < > wrote:
>  > > After reading the original thread, I wondered what implementation of
>  > > regular expression matching is used by Erlang?
>  > >
>  > > I ask because this paper points out that many implementations have
>  > > bad pathological behaviour:
>  > > http://swtch.com/~rsc/regexp/regexp1.html
>  > > An observation they make is:
>  > > "Regular expression matching can be simple and fast, using finite
>  > > automata-based techniques that have been known for decades. In
>  > > contrast, Perl, PCRE, Python, Ruby, Java, and many other languages
>  > > have regular expression implementations based on recursive
>  > > backtracking that are simple but can be excruciatingly slow."
>  > >
>  > > This is followed by a discussion at http://lambda-the-ultimate.org/
>  > > node/2064
>  > > Which seems to come to a working implementation.
>  > > Usefully, they note that Thompson's original patent has expired, so
>  > > there is no obstacle to using the original, and apparently superior,
>  > > algorithm
>  > >
>  > > I looked at 'man regex' on my Mac, and it describes the POSIX
>  > > implementation of regex. It does contain the note "Originally written
>  > > by Henry Spencer", which is suggested as the basis/culprit of the
>  > > pathological regular expression implementations (I should add, as
>  > > Henry Spencer is one of my heroes, that he is also credited with TCL
>  > > 8 regex, which works very well according to the graphs in the paper).
>  > >
>  > > I have tried grep'ing through *.c in otp_src for regex and regcomp,
>  > > and came up with nothing!
>  > >
>  > > Does anyone know what is used in Erlang? It may be a relatively small
>  > > amount of work to incorporate a new regex to provide more stable
>  > > behaviour.
>  > >
>  > > GB
>  > >
>  > > On 25 Sep 2007, at 10:00,
>  >  wrote:
>  > >
>  > > > Date: Mon, 24 Sep 2007 09:20:08 -0700 (PDT)
>  > > > From: Thomas Lindgren <  >
>  > > > Subject: Re: [erlang-questions] Not an Erlang fan
>  > > > To: "Erlang-Questions \(E-mail\)" < >
>  > > > Message-ID:
>  > < >
>  > > > Content-Type: text/plain; charset=iso-8859-1
>  > > >
>  > > >
>  > > > -
>  > > > Note that this does not include the actual regexp
>  > > > search of each line ... The code to do that could be
>  > > > an FSM + dictionary insert instead of the simple
>  > > > check-and-accumulate for newline. Extra cost? Unclear.
>  > > >
>  > > > One could merge that code into the scanning the binary
>  > > > but this should ideally be done by a regexp compiler,
>  > > > shouldn't it? Not done at this time AFAIK.
>  > > >
>  > > > Best,
>  > > > Thomas
>  > >
>  > > > Date: Mon, 24 Sep 2007 19:55:12 +0200
>  > > > From: Claes Wikstrom < >
>  > > > Subject: Re: [erlang-questions] Not an Erlang fan
>  > > > To: Bob Ippolito < >
>  > > > Cc: "Erlang-Questions \(E-mail\)" < >
>  > > > Message-ID: < >
>  > > > Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>  > > >
>  > > > Bob Ippolito wrote:
>  > > >
>  > > >> On 9/24/07, Patrick Logan < > wrote:
>  > > >>
>  > > >>>>>>
>  > http://www.tbray.org/ongoing/When/200x/2007/09/22/Erlang
>  > > >>>>>>
>  > > >>>>>> Tim Bray might raise some valid points here, even if he's
>  > > >>>>>> slightly
>  > > >>>>>> biased by his background.
>  > > >>>>>>
>  > > >
>  > > > No, the only fast way today to process a large file line/by/line is
> to
>  > > >
>  > > > 1. file:open(Filename, [read, raw])
>  > > > 2. In a loop {ok, Bin} = file:read(Fd, BufSize),
>  > > > 3. Use a binary regex matcher such as
>  > > >     http://yaws.hyber.org/download/posregex-1.0.tgz
>  > > >
>  > > > (I don't know the state of the regex lib in OTP today, last time
>  > > >   I looked it sucked bigtime though)
>  > > >
>  > > > /klacke
>  > > >
>  > > _______________________________________________
>  > > erlang-questions mailing list
>  > > 
>  > >
> http://www.erlang.org/mailman/listinfo/erlang-questions
>  > >
>  >
>  >
>  > _______________________________________________
>  > erlang-questions mailing list
>  > 
>  > http://www.erlang.org/mailman/listinfo/erlang-questions
>  >
>
>
> _______________________________________________
> erlang-questions mailing list
> 
> http://www.erlang.org/mailman/listinfo/erlang-questions
>




More information about the erlang-questions mailing list