pcre, bifs, drivers and ports

Mats Cronqvist <>
Wed Aug 2 11:53:49 CEST 2006


Bjorn Gustavsson wrote:
> Mats Cronqvist <> writes:
> 
>>    for our immediate problem, it is ok to limit the size of the regexp.
>> 
> If you by size means the length of the regexp string, this is a very
> poor measure of how fast or slow the regexp matching will be.

   this is true.
   but as far as i understand (from googling) the time is bounded by k*M*N, 
where M is the length of the string and N is the length of the regexp. so by 
limiting M and N you can guarantee that the regexp computation will never take 
more than so and so many cycles. i'm guessing that even very conservative values 
of M and N will be large enough to be useful.

   note that i'm perfectly willing to accept that a driver is a better solution 
than a bif. i'm just not (yet) convinced by the arguments against the bif solution.

   mats

p.s.

   why do i even care in the first place? i invented the totally arbitrary 
benchmark of finding the first word that starts with "con" in Virdings mail of 2 
Aug 2006 01:53:06 (first 1,000 bytes). re is the bif, gregexp is erlang code 
from from jungerl, regexp is from OTP.
   summary; the bif is ~100 times faster that regexp, and ~20 times faster than 
gregexp.

 > [REre] = re:compile(["(con[a-z]+)"])
 > timer:tc(re,grep,[Str,[REre]]).
{28,[{253,264,["consumption"]}]}

 > {ok,REgr} = gregexp:parse(".*\\(con[a-z]+\\)").
 > timer:tc(gregexp,groups,[Str,REgr]).
{768,{match,["consumption"]}}

 > {ok,REr} = regexp:parse("con[a-z]+").
 > timer:tc(regexp,match,[Str,REr]).
{3255,{match,254,11}}
 > timer:tc(string,substr,[Str,254,11]).
{13,"consumption"}



More information about the erlang-questions mailing list