pcre, bifs, drivers and ports

Mats Cronqvist mats.cronqvist@REDACTED
Wed Aug 2 11:53:49 CEST 2006

```Bjorn Gustavsson wrote:
> Mats Cronqvist <mats.cronqvist@REDACTED> writes:
>
>>    for our immediate problem, it is ok to limit the size of the regexp.
>>
> If you by size means the length of the regexp string, this is a very
> poor measure of how fast or slow the regexp matching will be.

this is true.
but as far as i understand (from googling) the time is bounded by k*M*N,
where M is the length of the string and N is the length of the regexp. so by
limiting M and N you can guarantee that the regexp computation will never take
more than so and so many cycles. i'm guessing that even very conservative values
of M and N will be large enough to be useful.

note that i'm perfectly willing to accept that a driver is a better solution
than a bif. i'm just not (yet) convinced by the arguments against the bif solution.

mats

p.s.

why do i even care in the first place? i invented the totally arbitrary
benchmark of finding the first word that starts with "con" in Virdings mail of 2
Aug 2006 01:53:06 (first 1,000 bytes). re is the bif, gregexp is erlang code
from from jungerl, regexp is from OTP.
summary; the bif is ~100 times faster that regexp, and ~20 times faster than
gregexp.

> [REre] = re:compile(["(con[a-z]+)"])
> timer:tc(re,grep,[Str,[REre]]).
{28,[{253,264,["consumption"]}]}

> {ok,REgr} = gregexp:parse(".*\\(con[a-z]+\\)").
> timer:tc(gregexp,groups,[Str,REgr]).
{768,{match,["consumption"]}}

> {ok,REr} = regexp:parse("con[a-z]+").
> timer:tc(regexp,match,[Str,REr]).
{3255,{match,254,11}}
> timer:tc(string,substr,[Str,254,11]).
{13,"consumption"}

```