regexp module with submatches available
Robert Virding
rv@REDACTED
Mon Mar 26 15:36:59 CEST 2001
Pascal Brisset <pascal.brisset@REDACTED> writes:
>We have extended the regexp module in OTP R7B-1 with support for
>submatches (the '\(...\)' syntax in SED regular expressions).
>This makes it possible to retrieve several components of a match with
>a single evaluation of a regexp. For example:
>
>1> RE_URL="\\(.+\\)://\\(.+\\)\\(/.+\\)(\\?\\(.*\\)(&\\(.*\\))*)?",
>1> gregexp:groups("http://localhost:81/script?arg&arg2&arg3", RE_URL).
>{match,["http","localhost:81","/script","arg","arg2","arg3"]}
Something like this is already planned for the next version. It
follows the AWK style so it only exists in the substitution
functions. You can extract sub-matches with a \1 - \9 syntax in the
replacement string. The main question left is whether to change the
old sub/gsub functions or to only have it in a new gensub function.
Gensub is a new function which allows more control. AWK only has it
in gensub.
Having a call to just match and extract the groups would probably be
useful. The question is whether to return the actual substrings or
return a list of start/length pairs like match does today.
Comments?
Robert
More information about the erlang-questions
mailing list