pcre, bifs, drivers and ports
Robert Virding
robert.virding@REDACTED
Tue Aug 8 23:17:26 CEST 2006
Willem de Jong skrev:
> I think regexp is too slow on exactly the kind of pattern that was
> used in the 'benchmark': patterns that contain a long(ish) literal
> string, especially if they start with it.
>
> Regexp:parse() translates the pattern to a compiled RE that looks like
> this (for ABC): {concat, concat{A, B}, C}. As long as no match is
> found, the algorithm tries to match every character against the
> compiled RE. For every character, it has to traverse the tree down to
> the A. If the pattern is longer, the tree grows 'deeper' (one
> additional level with each additional character), and this process
> takes more time.
This way of parsing the regexp follows the definition: concatenation is
left associative.
> The solution (I think) is to change the result of regexp:parse to
> something that looks like this: {concat, A, concat{B, C}}. Then it
> will find the A much quicker.
I don't think that transforming it in this way changes the semantics but
I have not checked it. Does anyone know? If it doesn't then it should
probably be done directly in the parser.
> I wrote a 'post-processing' module that modifies the compiled RE as
> described. I also changed a few things to the main function that does
> the matching (re_apply), but I am not sure whether that was really
> necessary. (If you want I can send you the code).
>
> On my PC it is not so easy to measure response-time, but my impression
> is that this improved the result for the benchmark by a factor 3.
Please send me the core and I will check it out. I was surprised by the
speed-up you mention. If it is of this order then it would be worth the
effort. Given it doesn't change the semantics.
Robert
More information about the erlang-questions
mailing list