[erlang-questions] regexp sux! (but perhaps less now)

Robert Virding robert.virding@REDACTED
Mon Jun 4 22:34:34 CEST 2007


tobbe wrote:
> 1> re:match("now/plus42hours/","^now/(plus|minus)(\d{1,2})hours/$").
> nomatch
> 2> re:smatch("now/plus42hours/","^now/(plus|minus)([[:alnum:]])hours/$").
> nomatch
> 3> re:smatch("now/plus42hours/","now/(plus|minus)([[:alnum:]])hours/").
> nomatch

OK:
1) \d is a PERLism and as I wrote I only support POSIX style regexps. As 
the regexp is a string it would have to be "\\d" as the '\' needs to be 
seen by the regexp module. If there is interest I will do a PERL 
compatible version.

2) [[:alnum:]] matches ONE alpha-numeric character, almost equivalent to 
"[a-zA-Z_0-9]" but for all of Latin-1
3) Same comment here.

So:
2>re:match("now/plus42hours/","^now/(plus|minus)([[:alnum:]]+)hours/$").
{match,1,16}

3>re:smatch("now/plus42hours/","^now/(plus|minus)([[:alnum:]]+)hours/$").
{match,1,16,"now/plus42hours/",{{5,4,"plus"},{9,2,"42"}}}

4>re:smatch(<<"now/plus42hours/">>,"^now/(plus|minus)([[:alnum:]]+)hours/$").
{match,1,16,"now/plus42hours/",{{5,4,"plus"},{9,2,"42"}}}

5>re:smatch(<<"now/plus42hours/">>,"^now/(plus|minus)([[:alnum:]]{1,2})hours/$").
{match,1,16,"now/plus42hours/",{{5,4,"plus"},{9,2,"42"}}}

6>re:smatch("now/plus42hours/","^now/(plus|minus)([[:alnum:]]{1,2})hours/$"). 

{match,1,16,"now/plus42hours/",{{5,4,"plus"},{9,2,"42"}}}

I hope that's legible

> Also, it would be really nice with some docs with lots of examples.
> Or, why not provide an Eunit test file? That would help you to do
> regression testing and give good examples in one place.
> 
> Cheers, Tobbe

As I said it is compatible with the old regexp except for the smatch/2, 
first_smatch/2 functions.

It hasn't got to the stage of eunit regression testing yet, I would like 
to get the interface nailed down first. What information needs to be 
returned? Now smatch returns everything. By the way an unused sub-expr 
returns 'undefined' to differentiate it from the empty string.

The reason I don't use the module name 'regexp' is to give me more 
freedom in determining the interface. I will feed improvements back in 
to regexp of course.

Then I can split it up along the lines of rok's suggestions and add a 
re-entrant interface as someone else was requesting. And leex. Could 
even make it work directly on io-lists, but I don't really see the need.

In testing tobbe's examples I found a few bugs and have added a new 
version to trapexit:

http://forum.trapexit.org/viewtopic.php?t=8676

Robert

P.S. Using the code tags in trapexit really screws it up for me in 
Thunderbird, I get the raw HTML.

P.P.S I found I couldn't directly use the update file option is trapexit 
as it changes the name of the file, my update became re_478.erl.



More information about the erlang-questions mailing list