[erlang-questions] Fast regular expression implementation
Robert Virding
robert.virding@REDACTED
Thu Dec 21 10:28:45 CET 2006
A quick comment to implementation speeds of various regexp packages.
I would say that the main reason a Perl based regexp package *SHOULD* be
faster than the existing regexp, which is AWK and POSIX based, is the
difference in semantics. POSIX guarantees to find the first longest
match while Perl just guarantees to find the first match, longest or
otherwise. This means that with Perl it is very critical HOW you write
your regexp as it affects which match you will find, while this is not
significant for POSIX based regexps.
So for example with a Perl regexp changing the order of the alternatives
in '|' will affect what is matched, while this will have no effect with
a POSIX based regexp. This is one reason why in "Mastering Regular
Expressions" Friedl calls POSIX based (DFA based) regexps for
"uninteresting" as you can't fiddle with them to tune them. :-)
The benefit is of course that you know exactly what you will get. It
very much depends what you are after.
I had planned to do a Perl based package as well after I have fixed the
compiler in regexp. (Almost done)
I would love to see your test cases.
Robert
More information about the erlang-questions
mailing list