[erlang-questions] Fast regular expression implementation

Robert Virding robert.virding@REDACTED
Thu Dec 21 10:28:45 CET 2006


A quick comment to implementation speeds of various regexp packages.

I would say that the main reason a Perl based regexp package *SHOULD* be 
faster than the existing regexp, which is AWK and POSIX based, is the 
difference in semantics. POSIX guarantees to find the first longest 
match while Perl just guarantees to find the first match, longest or 
otherwise. This means that with Perl it is very critical HOW you write 
your regexp as it affects which match you will find, while this is not 
significant for POSIX based regexps.

So for example with a Perl regexp changing the order of the alternatives 
in '|' will affect what is matched, while this will have no effect with 
a POSIX based regexp. This is one reason why in "Mastering Regular 
Expressions" Friedl calls POSIX based (DFA based) regexps for 
"uninteresting" as you can't fiddle with them to tune them. :-)

The benefit is of course that you know exactly what you will get. It 
very much depends what you are after.

I had planned to do a Perl based package as well after I have fixed the 
compiler in regexp. (Almost done)

I would love to see your test cases.

Robert



More information about the erlang-questions mailing list