[erlang-questions] Fast regular expression implementation - benchmarks

Gaspar Chilingarov <>
Thu Dec 21 10:01:46 CET 2006


Yariv Sadan wrote:
> Hi Gaspar,
> 
> Have you ran any benchmarks comparing your implementation to the OTP
> regexp and/or the revised on on trapexit? Also, can you please give us
> a hint as to what makes your implementation faster?
> 
> Thanks,
> Yariv
> 

I've tried benchmarking

on the same 18kb html file:
pattern: class=g.*<a\s+class

regexp from trapexit/original regexp - work about 90-100ms
gregexp from jungerl - work 80-87 ms
mine -- 29-34ms

on same file joined together 10 times (180kb)
regexp from trapexit - 480-490 ms
OTP regexp - 480-550 ms
gregexp -- 478-490
mine -- 310-327 ms

same file joined 100 times (1800kb)
mine works -- 3.2-3.7 seconds
regexp from trapexit/regexp from OTP/gregexp - 4.6-5.1 seconds


now increase complicity and try
class=g.*<a\s+class=l\s+href=\".*\"
regexp

on 18kb file
mine works -- 60-75 ms
another regexps - 270-280 ms

If we use   class=g.*?<a\s+?class=l\s+?href=\".*?\" pattern, which is 
really what I meant to extract from file and what is really optimized in 
my regexp interpreter -- mine works about 10-12 ms, another libraries 
does not have non-greedy evaluation. I've modified gregexp to support 
such operations and it return matches in this case in a 240-300 ms.

I think there is more place for improvement and speedup :)

/Gaspar

-- 
Gaspar Chilingarov

System Administrator,
Network security consulting

t +37493 419763 (mob)
i 63174784
e 



More information about the erlang-questions mailing list