[erlang-questions] regexp sux! (but perhaps less now)

Mon Jun 4 00:07:04 CEST 2007

I have loaded up a new regular expression module, re.erl, to trapexit.org:

http://forum.trapexit.org/viewtopic.php?t=8675

This is a new implementation of regular expressions which is sort of 
compatible with regexp.erl with two major improvements:

1. It now works directly on binaries, all the functions take binaries as 
input, but not for the regexp.

2. There are 2 new function which extract and return sub-expressions, 
smatch/2, and first_smatch2. These are the similar to match/2 and 
first_match/2 but they also sub expressions For example:

2> re:smatch("-axxxb--", "a((x+)|(y+))b").
{match,2,5,"axxxb",{{3,3,"xxx"},{3,3,"xxx"},undefined}}

A sub-expr is 'undefined' if there is no match.

It supports POSIX regexp as did the old one, but we now have POSIX 
character classes but only for Latin-1. So we can write "[[:digit:]]" or 
"[[:alnum:]]". The functions are the same as before.

The regexp engine should never explode irrespective of the regexp, which 
many do, and is about as fast as the old one. It depends on the regexp.

I would like some feed-back on the speed and the interface.

N.B. It is not really possible to have both POSIX and PERL regexps in 
the same module as apart from the difference in features they have 
different semantics. If all goes well a PERL module might follow.

Robert