[erlang-questions] Fast regular expression implementation - benchmarks
Gaspar Chilingarov
nm@REDACTED
Thu Dec 21 10:01:46 CET 2006
Yariv Sadan wrote:
> Hi Gaspar,
>
> Have you ran any benchmarks comparing your implementation to the OTP
> regexp and/or the revised on on trapexit? Also, can you please give us
> a hint as to what makes your implementation faster?
>
> Thanks,
> Yariv
>
I've tried benchmarking
on the same 18kb html file:
pattern: class=g.*<a\s+class
regexp from trapexit/original regexp - work about 90-100ms
gregexp from jungerl - work 80-87 ms
mine -- 29-34ms
on same file joined together 10 times (180kb)
regexp from trapexit - 480-490 ms
OTP regexp - 480-550 ms
gregexp -- 478-490
mine -- 310-327 ms
same file joined 100 times (1800kb)
mine works -- 3.2-3.7 seconds
regexp from trapexit/regexp from OTP/gregexp - 4.6-5.1 seconds
now increase complicity and try
class=g.*<a\s+class=l\s+href=\".*\"
regexp
on 18kb file
mine works -- 60-75 ms
another regexps - 270-280 ms
If we use class=g.*?<a\s+?class=l\s+?href=\".*?\" pattern, which is
really what I meant to extract from file and what is really optimized in
my regexp interpreter -- mine works about 10-12 ms, another libraries
does not have non-greedy evaluation. I've modified gregexp to support
such operations and it return matches in this case in a 240-300 ms.
I think there is more place for improvement and speedup :)
/Gaspar
--
Gaspar Chilingarov
System Administrator,
Network security consulting
t +37493 419763 (mob)
i 63174784
e nm@REDACTED
More information about the erlang-questions
mailing list