[erlang-bugs] Bug with named subpatterns in re module

Patrik Nyblom pan@REDACTED
Tue Apr 2 18:49:05 CEST 2013


Hi!
On 03/28/2013 05:52 PM, Sergei Golovan wrote:
> Hi!
>
> On Thu, Mar 28, 2013 at 8:13 PM, Patrik Nyblom <pan@REDACTED> wrote:
>> Well, removing dupnames might be the easiest, but as there are perl
>> semantics we can imitate, I think we should give it a try!
> I should say that PCRE manual describes named subpatterns using the
> following regexp:
>
> (?<DN>Mon|Fri|Sun)(?:day)?|
> (?<DN>Tue)(?:sday)?|
> (?<DN>Wed)(?:nesday)?|
> (?<DN>Thu)(?:rsday)?|
> (?<DN>Sat)(?:urday)?
>
> (search 'NAMED SUBPATTERNS' in http://www.pcre.org/pcre.txt). And currently
>
> 1> re:run("Monday",
> "(?<DN>Mon|Fri|Sun)(?:day)?|(?<DN>Tue)(?:sday)?|(?<DN>Wed)(?:nesday)?|(?<DN>Thu)(?:rsday)?|(?<DN>Sat)(?:urday)?",
> [dupnames, {capture, ['DN'], list}]).
> {match,[[]]}
>
> doesn't work. If I leave only one branch it works fine:
> 2> re:run("Monday", "(?<DN>Mon|Fri|Sun)(?:day)?", [dupnames, {capture,
> ['DN'], list}]).
> {match,["Mon"]}
Yes, it's not really PCRE's fault, it's up to the user of the library 
(i.e. re) not to use the one-to-one mapping when using dupnames. I 
shouldn't have allowed dupnames if I wasn't to handle them as I 
described in my last post, i.e. by digging out the full one-to-many 
mapping between names and subpattern indexes. The only thing i'm still 
wondering about is a good semantics for capturing 'all'. Maybe we 
shouldn't touch that and should concentrate on the capturing of specific 
names, but it feels like we should have an 'all_names' option...

Also, I think I should bump the PCRE version while at it, there are some 
issues with Unicode that was discussed earlier on some of the lists...
>
> Cheers!
Cheers,
/Patrik




More information about the erlang-bugs mailing list