[erlang-questions] Fast regular expression implementation

Yariv Sadan yarivvv@REDACTED
Thu Dec 21 01:36:31 CET 2006


Hi Gaspar,

Have you ran any benchmarks comparing your implementation to the OTP
regexp and/or the revised on on trapexit? Also, can you please give us
a hint as to what makes your implementation faster?

Thanks,
Yariv

On 12/18/06, Gaspar Chilingarov <nm@REDACTED> wrote:
> Hi all!
>
> I wish to announce implementation of regular expressions in erlang,
> which works fast enough to be useful for text processing and extraction.
>
> Please follow the link for download: http://zanazan.am/erlang/re.html
>
> There are some things which are not implemented for now (i.e. or
> operator "|" between regexp branches).
>
> Subpatterns are extracted using (), grouping without extraction is done
> as in a perl - (?:pattern). Multiple nested subpatterns are allowed.
>
> I've tried to keep behavior as much as possible close to perl patterns.
>
> All substitute functions are missing at the moment -- I will be glad to
> get suggestions what should be implemented besides standard sub/gsub.
>
> Library is quite fast - 18kb text matches against
> "class=g.*?<a\s+class=l\s+href=\"(.*?)\">(.*?)</a>" pattern to extract
> all matches in 10-12ms (if you ask only for positions). If you ask only
> for subpattern matches (i.e. re:mgg) it works only 18ms.
>
> Same text duplicated together 100 times (1.8Mb) is matched in a first
> case in 1.2sec, with subpatterns text extraction - about 2.5sec, so
> matching time grows linearly. In case of gregexp implementation time in
> a exponential manner.
>
>
> I would like to listen any feedback and especially bug reports.
>
> /Gaspar
>
> --
> Gaspar Chilingarov
>
> System Administrator,
> Network security consulting
>
> t +37493 419763 (mob)
> i 63174784
> e nm@REDACTED
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions
>



More information about the erlang-questions mailing list