[erlang-questions] regexp sux! (but perhaps less now)
Robert Virding
robert.virding@REDACTED
Mon Jun 4 00:07:04 CEST 2007
I have loaded up a new regular expression module, re.erl, to trapexit.org:
http://forum.trapexit.org/viewtopic.php?t=8675
This is a new implementation of regular expressions which is sort of
compatible with regexp.erl with two major improvements:
1. It now works directly on binaries, all the functions take binaries as
input, but not for the regexp.
2. There are 2 new function which extract and return sub-expressions,
smatch/2, and first_smatch2. These are the similar to match/2 and
first_match/2 but they also sub expressions For example:
2> re:smatch("-axxxb--", "a((x+)|(y+))b").
{match,2,5,"axxxb",{{3,3,"xxx"},{3,3,"xxx"},undefined}}
A sub-expr is 'undefined' if there is no match.
It supports POSIX regexp as did the old one, but we now have POSIX
character classes but only for Latin-1. So we can write "[[:digit:]]" or
"[[:alnum:]]". The functions are the same as before.
The regexp engine should never explode irrespective of the regexp, which
many do, and is about as fast as the old one. It depends on the regexp.
I would like some feed-back on the speed and the interface.
N.B. It is not really possible to have both POSIX and PERL regexps in
the same module as apart from the difference in features they have
different semantics. If all goes well a PERL module might follow.
Robert
More information about the erlang-questions
mailing list