[erlang-questions] Fast regular expression implementation
Gaspar Chilingarov
nm@REDACTED
Thu Dec 21 11:47:37 CET 2006
Robert Virding wrote:
> A quick comment to implementation speeds of various regexp packages.
>
> I would say that the main reason a Perl based regexp package *SHOULD* be
> faster than the existing regexp, which is AWK and POSIX based, is the
> difference in semantics. POSIX guarantees to find the first longest
> match while Perl just guarantees to find the first match, longest or
> otherwise.
I specially do not implemented regexp:match function, because it
guarantees to return longest match. And my library return eactly first
match - so first_match is imlemented ;).
> This means that with Perl it is very critical HOW you write
> your regexp as it affects which match you will find, while this is not
> significant for POSIX based regexps.
>
> So for example with a Perl regexp changing the order of the alternatives
> in '|' will affect what is matched, while this will have no effect with
> a POSIX based regexp.
I still thinking of effective implementation of '|' pattern evaluation :)
> This is one reason why in "Mastering Regular
> Expressions" Friedl calls POSIX based (DFA based) regexps for
> "uninteresting" as you can't fiddle with them to tune them. :-)
> The benefit is of course that you know exactly what you will get. It
> very much depends what you are after.
>
> I had planned to do a Perl based package as well after I have fixed
> the compiler in regexp. (Almost done)
Well, I've stuck in writing RE to erlang code converter -- to allow in
place erlang code generation and compile. Mainly because some recursion
is necessary and it's a little bit difficult with funs.
In other I think hand RE interpreter will benefit from rewrite if all
functions will be made tail recursive.
>
> I would love to see your test cases.
> Robert
>
--
Gaspar Chilingarov
System Administrator,
Network security consulting
t +37493 419763 (mob)
i 63174784
e nm@REDACTED
More information about the erlang-questions
mailing list