pcre, bifs, drivers and ports

Willem de Jong <>
Sun Aug 6 21:55:14 CEST 2006


Hi,

I think regexp is too slow on exactly the kind of pattern that was
used in the 'benchmark': patterns that contain a long(ish) literal
string, especially if  they start with it.

Regexp:parse() translates the pattern to a compiled RE that looks like
this (for ABC): {concat, concat{A, B}, C}. As long as no match is
found, the algorithm tries to match every character against the
compiled RE. For every character, it has to traverse the tree down to
the A. If the pattern is longer, the tree grows 'deeper' (one
additional level with each additional character), and this process
takes more time.

The solution (I think) is to change the result of regexp:parse to
something that looks like this: {concat, A, concat{B, C}}. Then it
will find the A much quicker.

I wrote a 'post-processing' module that modifies the compiled RE as
described. I also changed a few things to the main function that does
the matching (re_apply), but I am not sure whether that was really
necessary. (If you want I can send you the code).

On my PC it is not so easy to measure response-time, but my impression
is that this improved the result for the benchmark by a factor 3.

Regards,
Willem


On 8/4/06, Robert Virding <> wrote:
> Ah, but that's a bug! That doesn't count. :-)
>
> Looking at the regexp I think I can guess what is going wrong. Anchoring
> patterns at the beginning of a line is not as easy as one would think.
> Leex doesn't handle it.
>
> Seriously though, I would like some REAL problems to experiment on.
>
> Robert
>
> Fredrik Thulin wrote:
> > On Thursday 03 August 2006 13:11, Robert Virding wrote:
> > ...
> >
> >>Could someone send a real problem to experiment on? Especially one
> >>which the current regexp module is definitely too slow on.
> >
> >
> > Here is a regexp that the regexp module is definately too slow on ;) :
> >
> >       regexp:match("+461234", "^+46(.*)$").
> >
> > Previously reported in
> >
> >   http://www.erlang.org/ml-archive/erlang-questions/200404/msg00105.html
> >
> > /Fredrik
> >
>
>



More information about the erlang-questions mailing list