[erlang-questions] Fast regular expression implementation
Gaspar Chilingarov
nm@REDACTED
Mon Dec 18 15:59:01 CET 2006
Hi all!
I wish to announce implementation of regular expressions in erlang,
which works fast enough to be useful for text processing and extraction.
Please follow the link for download: http://zanazan.am/erlang/re.html
There are some things which are not implemented for now (i.e. or
operator "|" between regexp branches).
Subpatterns are extracted using (), grouping without extraction is done
as in a perl - (?:pattern). Multiple nested subpatterns are allowed.
I've tried to keep behavior as much as possible close to perl patterns.
All substitute functions are missing at the moment -- I will be glad to
get suggestions what should be implemented besides standard sub/gsub.
Library is quite fast - 18kb text matches against
"class=g.*?<a\s+class=l\s+href=\"(.*?)\">(.*?)</a>" pattern to extract
all matches in 10-12ms (if you ask only for positions). If you ask only
for subpattern matches (i.e. re:mgg) it works only 18ms.
Same text duplicated together 100 times (1.8Mb) is matched in a first
case in 1.2sec, with subpatterns text extraction - about 2.5sec, so
matching time grows linearly. In case of gregexp implementation time in
a exponential manner.
I would like to listen any feedback and especially bug reports.
/Gaspar
--
Gaspar Chilingarov
System Administrator,
Network security consulting
t +37493 419763 (mob)
i 63174784
e nm@REDACTED
More information about the erlang-questions
mailing list