[erlang-questions] regexp sux! (but perhaps less now)
Robert Virding
robert.virding@REDACTED
Fri Jun 15 00:32:25 CEST 2007
Bengt Kleberg wrote:
> On 2007-06-04 22:34, Robert Virding wrote:
>> tobbe wrote:
>>> 1> re:match("now/plus42hours/","^now/(plus|minus)(\d{1,2})hours/$").
>>> nomatch
>>> 2> re:smatch("now/plus42hours/","^now/(plus|minus)([[:alnum:]])hours/$").
>>> nomatch
>>> 3> re:smatch("now/plus42hours/","now/(plus|minus)([[:alnum:]])hours/").
>>> nomatch
>> OK:
>> 1) \d is a PERLism and as I wrote I only support POSIX style regexps. As
>> the regexp is a string it would have to be "\\d" as the '\' needs to be
>
> if somebody is interested in something else than ''normal regular
> expressions'' (where normal is awk, sed, posix, perl, etc) i can recommend
> http://www.scsh.net/docu/html/man-Z-H-7.html#node_idx_1178
>
> it is regexp for the scheme shell. it has s-expressions instead of
> strings. i find it easier to use when the regular expression goes beyond
> that which is possible to do with strstr and friends.
Sorry for taking so long to answer this.
The is definitely interesting. What it describes is along the same lines
as what Richard O'Keefe was suggesting, defining the regular expression
with a structure instead of with a string. They wrap the s-expr form
with a read macro which parses the s-expr and builds an internal
representation. One interesting point is that when matching it does not
return an explicit structure with the results of the match, but instead
an ADT with a set of access functions.
One benefit of doing this is that as the internal structure of the ADT
is undefined and data only accessible though the access functions then
you are free to change the internals. The downside is not being able to
pattern match on the result. What do people feel is the best way to go?
I rather like having both the string form for a regular expression and a
structural representation. It easier to get it more beautiful in Lisp I
think. For Erlang would could either use terms directly or have a more
functional way as Richard described. So instead of "[a-c]*|z+" you could
have:
{alt,{'*',{cc,"a-c"}},{'+',{c,$z}}}
or
alt('*'(cc("a-c")),'+'($z))
Can't think of better names for the closures right now, using kclosure
and pclosure seems so long.
Robert
More information about the erlang-questions
mailing list