regexp module with submatches available

Robert Virding rv@REDACTED
Mon Mar 26 15:36:59 CEST 2001


Pascal Brisset <pascal.brisset@REDACTED> writes:
>We have extended the regexp module in OTP R7B-1 with support for
>submatches (the '\(...\)' syntax in SED regular expressions).
>This makes it possible to retrieve several components of a match with
>a single evaluation of a regexp. For example:
>
>1> RE_URL="\\(.+\\)://\\(.+\\)\\(/.+\\)(\\?\\(.*\\)(&\\(.*\\))*)?",
>1> gregexp:groups("http://localhost:81/script?arg&arg2&arg3", RE_URL).
>{match,["http","localhost:81","/script","arg","arg2","arg3"]}

Something like this is already planned for the next version.  It
follows the AWK style so it only exists in the substitution
functions.  You can extract sub-matches with a \1 - \9 syntax in the
replacement string.  The main question left is whether to change the
old sub/gsub functions or to only have it in a new gensub function.
Gensub is a new function which allows more control.  AWK only has it
in gensub.

Having a call to just match and extract the groups would probably be
useful.  The question is whether to return the actual substrings or
return a list of start/length pairs like match does today.

Comments?

	Robert



More information about the erlang-questions mailing list