[erlang-questions] Fast regular expression implementation

Gaspar Chilingarov nm@REDACTED
Mon Dec 18 15:59:01 CET 2006


Hi all!

I wish to announce implementation of regular expressions in erlang, 
which works fast enough to be useful for text processing and extraction.

Please follow the link for download: http://zanazan.am/erlang/re.html

There are some things which are not implemented for now (i.e. or 
operator "|" between regexp branches).

Subpatterns are extracted using (), grouping without extraction is done 
as in a perl - (?:pattern). Multiple nested subpatterns are allowed.

I've tried to keep behavior as much as possible close to perl patterns.

All substitute functions are missing at the moment -- I will be glad to 
get suggestions what should be implemented besides standard sub/gsub.

Library is quite fast - 18kb text matches against
"class=g.*?<a\s+class=l\s+href=\"(.*?)\">(.*?)</a>" pattern to extract 
all matches in 10-12ms (if you ask only for positions). If you ask only 
for subpattern matches (i.e. re:mgg) it works only 18ms.

Same text duplicated together 100 times (1.8Mb) is matched in a first 
case in 1.2sec, with subpatterns text extraction - about 2.5sec, so
matching time grows linearly. In case of gregexp implementation time in 
a exponential manner.


I would like to listen any feedback and especially bug reports.

/Gaspar

-- 
Gaspar Chilingarov

System Administrator,
Network security consulting

t +37493 419763 (mob)
i 63174784
e nm@REDACTED



More information about the erlang-questions mailing list