[erlang-bugs] Bug with named subpatterns in re module

Patrik Nyblom pan@REDACTED
Thu Mar 28 12:35:54 CET 2013


Hi!

I'm unsure of the nature of this bug. What are you actually expecting as 
a return when you use duplicate names and named capture? Both instances 
of the name, "the right instance" of the name or a badarg?

I.e would you like

re:run("bar", "^(?<b>foo)(?<a>bla)$|^(?<b>[[:word:]]+)$",[dupnames, {capture, [a, b], list}]).

to give the same result as:

re:run("bar", "^(?<b>foo)(?<a>bla)$|^(?<c>[[:word:]]+)$",[dupnames, {capture, [a, b, c], list}]).

? Or return the second instance if that matches, but the first instance 
if that one matches? Or should we simply not allow it? The thing is that 
even with dupnames, you have a varying amount of subexpressions. 
Capturing 'all' (or rather 'all_but_first') will show you that this call 
returns three distinct subexpressions, of which two happen to have the 
same name (regardless of the names). If the part before | matches, the 
result is only two subexpressions, as the first two subexpressions 
match. No duplicate naming will change this. There is no real "select 
the one that matches" functionality in giving two subexpressions the 
same name.

PCRE just picks one of the occurences of a name when you ask for it - in 
your last example the occurence you were not expecting, but that's more 
or less random, the first example would give unexpected results if the 
first part matched. PCRE has no functionality to pick all occurences of 
a name, but that could of course be changed if there was some 
understandable semantics that should be implemented. I think badarg 
exception is the way to go though...

Cheers,
/Patrik

On 03/24/2013 07:58 AM, Sergei Golovan wrote:
> Hi!
>
> Chris King recently discovered a bug in re module. Appears that the
> matched named subpatterns are not always returned.
>
> The following command works correctly:
> 1> re:run("bar", "^(?<a>foo)(?<b>bla)$|^(?<a>[[:word:]]+)$",
> [dupnames, {capture, [a, b], list}]).
> {match,["bar",[]]}
>
> But semantically the same one doesn't (note the swapped <a> and <b>):
> 1> re:run("bar", "^(?<b>foo)(?<a>bla)$|^(?<b>[[:word:]]+)$",
> [dupnames, {capture, [a, b], list}]).
> {match,[[],[]]}
>
> In both cases the second branch matches, but only the first command
> returns the required subpattern.
>
> The bug is reproducible in R16B.
>
> Cheers!




More information about the erlang-bugs mailing list