[erlang-questions] Regular expression library (was Not anErlang fan)

David Mercer dmercer@REDACTED
Thu Oct 4 19:43:40 CEST 2007


How do you create a module on the fly?  Wouldn't you have to write it to
disk and then compile and load it?  Or is there another way?

 

Cheers,

 

David

 

  _____  

From: erlang-questions-bounces@REDACTED
[mailto:erlang-questions-bounces@REDACTED] On Behalf Of Robert Virding
Sent: Wednesday, October 03, 2007 16:32
To: Yariv Sadan
Cc: erlang-questions@REDACTED; G Bulmer
Subject: Re: [erlang-questions] Regular expression library (was Not anErlang
fan)

 

That's not too difficult to do at all. The speed you get would depend on
which regexp family you do it for.

Robert

On 27/09/2007, Yariv Sadan <yarivsadan@REDACTED> wrote:

I was wondering if it would be possible/beneficial to write a regexp 
compiler that takes a regexp string and compiles it into an Erlang
module.

You would use it as follows:

regexp2:compile("f*o+o?", myregexp, [native]),
true = myregexp:matches("foo"). 

This could be a very fast regexp engine.

What do you think?

Yariv

On 9/26/07, Robert Virding <rvirding@REDACTED> wrote:
> The regular expression module which is included in the distribution is a 
> simple rather naïve one written in Erlang which uses a back-tracking
> algorithm directly on the AST. It has all the problems Russ Cox mentioned
in
> his paper. I wrote it long ago when no one was interested in parsing
regexps 
> in Erlang. :-)
>
> I am rewriting it at the moment, still in Erlang but using the same
> algorithm, or similar to the one, which Russ describes. It is about the
same
> speed as the existing one for simpler regexps but it can handle all the 
> pathological cases as it should. It will handle POSIX syntax and semantics
> (but no back-references*). There will also be a version which will
implement
> a subset of PERL regexps with PERL semantics. Both will work on either
lists 
> or binaries.
>
> I hadn't seen the lambda-the-ultimate discussion you mentioned and will
> check it out.
>
> There are a number of regexp packages for Erlang which use PCRE or other 
> libraries which I don't have references to. Look in the "Not an Erlang
fan"
> thread here which gives some references to them.
>
> An interesting alternative would be to write the regexp parser/compiler in

> Erlang and have the basic regexp matcher in the emulator. This actually
> wouldn't be so big, I think. Get the best of both worlds. An I used to
> complain (still do in fact) that the emulator is to big. :-) 
>
> Robert
>
> * Actually there are two types of POSIX regexps in the standard, one which
> has back-references and one which doesn't.
>
>
>
> On 25/09/2007, G Bulmer < gbulmer@REDACTED> wrote:
> > After reading the original thread, I wondered what implementation of
> > regular expression matching is used by Erlang?
> > 
> > I ask because this paper points out that many implementations have
> > bad pathological behaviour:
> > http://swtch.com/~rsc/regexp/regexp1.html 
> > An observation they make is:
> > "Regular expression matching can be simple and fast, using finite
> > automata-based techniques that have been known for decades. In
> > contrast, Perl, PCRE, Python, Ruby, Java, and many other languages 
> > have regular expression implementations based on recursive
> > backtracking that are simple but can be excruciatingly slow."
> >
> > This is followed by a discussion at http://lambda-the-ultimate.org/
> > node/2064
> > Which seems to come to a working implementation.
> > Usefully, they note that Thompson's original patent has expired, so
> > there is no obstacle to using the original, and apparently superior, 
> > algorithm
> >
> > I looked at 'man regex' on my Mac, and it describes the POSIX
> > implementation of regex. It does contain the note "Originally written
> > by Henry Spencer", which is suggested as the basis/culprit of the 
> > pathological regular expression implementations (I should add, as
> > Henry Spencer is one of my heroes, that he is also credited with TCL
> > 8 regex, which works very well according to the graphs in the paper). 
> >
> > I have tried grep'ing through *.c in otp_src for regex and regcomp,
> > and came up with nothing!
> >
> > Does anyone know what is used in Erlang? It may be a relatively small 
> > amount of work to incorporate a new regex to provide more stable
> > behaviour.
> >
> > GB
> >
> > On 25 Sep 2007, at 10:00,
> erlang-questions-request@REDACTED wrote:
> >
> > > Date: Mon, 24 Sep 2007 09:20:08 -0700 (PDT)
> > > From: Thomas Lindgren < thomasl_erlang@REDACTED
<mailto:thomasl_erlang@REDACTED> >
> > > Subject: Re: [erlang-questions] Not an Erlang fan
> > > To: "Erlang-Questions \(E-mail\)" < erlang-questions@REDACTED> 
> > > Message-ID:
> <761781.89097.qm@REDACTED >
> > > Content-Type: text/plain; charset=iso-8859-1
> > >
> > >
> > > -
> > > Note that this does not include the actual regexp
> > > search of each line ... The code to do that could be
> > > an FSM + dictionary insert instead of the simple 
> > > check-and-accumulate for newline. Extra cost? Unclear.
> > >
> > > One could merge that code into the scanning the binary
> > > but this should ideally be done by a regexp compiler, 
> > > shouldn't it? Not done at this time AFAIK.
> > >
> > > Best,
> > > Thomas
> >
> > > Date: Mon, 24 Sep 2007 19:55:12 +0200
> > > From: Claes Wikstrom < klacke@REDACTED>
> > > Subject: Re: [erlang-questions] Not an Erlang fan
> > > To: Bob Ippolito < bob@REDACTED> 
> > > Cc: "Erlang-Questions \(E-mail\)" < erlang-questions@REDACTED>
> > > Message-ID: < 46F7FA00.8070507@REDACTED>
> > > Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> > >
> > > Bob Ippolito wrote:
> > >
> > >> On 9/24/07, Patrick Logan < patrickdlogan@REDACTED> wrote:
> > >>
> > >>>>>>
> http://www.tbray.org/ongoing/When/200x/2007/09/22/Erlang 
> > >>>>>>
> > >>>>>> Tim Bray might raise some valid points here, even if he's
> > >>>>>> slightly
> > >>>>>> biased by his background. 
> > >>>>>>
> > >
> > > No, the only fast way today to process a large file line/by/line is to
> > >
> > > 1. file:open(Filename, [read, raw])
> > > 2. In a loop {ok, Bin} = file:read(Fd, BufSize),
> > > 3. Use a binary regex matcher such as
> > >     http://yaws.hyber.org/download/posregex-1.0.tgz 
> > >
> > > (I don't know the state of the regex lib in OTP today, last time
> > >   I looked it sucked bigtime though)
> > >
> > > /klacke
> > > 
> > _______________________________________________
> > erlang-questions mailing list
> > erlang-questions@REDACTED
> > http://www.erlang.org/mailman/listinfo/erlang-questions
> >
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions
>

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20071004/4aa2f40c/attachment.htm>


More information about the erlang-questions mailing list