[erlang-questions] Fast regular expression implementation

Thu Dec 21 11:47:37 CET 2006

Robert Virding wrote:
 > A quick comment to implementation speeds of various regexp packages.
 >
 > I would say that the main reason a Perl based regexp package *SHOULD* be
 > faster than the existing regexp, which is AWK and POSIX based, is the
 > difference in semantics. POSIX guarantees to find the first longest
 > match while Perl just guarantees to find the first match, longest or
 > otherwise.

I specially do not implemented regexp:match function, because it 
guarantees to return longest match. And my library return eactly first 
match - so first_match is imlemented ;).

 > This means that with Perl it is very critical HOW you write
 > your regexp as it affects which match you will find, while this is not
 > significant for POSIX based regexps.
 >
 > So for example with a Perl regexp changing the order of the alternatives
 > in '|' will affect what is matched, while this will have no effect with
 > a POSIX based regexp.

I still thinking of effective implementation of '|' pattern evaluation :)

 > This is one reason why in "Mastering Regular
 > Expressions" Friedl calls POSIX based (DFA based) regexps for
 > "uninteresting" as you can't fiddle with them to tune them. :-)
 > The benefit is of course that you know exactly what you will get. It
 > very much depends what you are after.
 >
 > I had planned to do a Perl based package as well after I have fixed
 > the compiler in regexp. (Almost done)

Well, I've stuck in writing RE to erlang code converter -- to allow in 
place erlang code generation and compile. Mainly because some recursion 
is necessary and it's a little bit difficult with funs.

In other I think hand RE interpreter will benefit from rewrite if all 
functions will be made tail recursive.

 >
 > I would love to see your test cases.
 > Robert
 >

-- 
Gaspar Chilingarov

System Administrator,
Network security consulting

t +37493 419763 (mob)
i 63174784
e nm@REDACTED