[erlang-bugs] Bug with named subpatterns in re module
Patrik Nyblom
pan@REDACTED
Thu Mar 28 12:35:54 CET 2013
Hi!
I'm unsure of the nature of this bug. What are you actually expecting as
a return when you use duplicate names and named capture? Both instances
of the name, "the right instance" of the name or a badarg?
I.e would you like
re:run("bar", "^(?<b>foo)(?<a>bla)$|^(?<b>[[:word:]]+)$",[dupnames, {capture, [a, b], list}]).
to give the same result as:
re:run("bar", "^(?<b>foo)(?<a>bla)$|^(?<c>[[:word:]]+)$",[dupnames, {capture, [a, b, c], list}]).
? Or return the second instance if that matches, but the first instance
if that one matches? Or should we simply not allow it? The thing is that
even with dupnames, you have a varying amount of subexpressions.
Capturing 'all' (or rather 'all_but_first') will show you that this call
returns three distinct subexpressions, of which two happen to have the
same name (regardless of the names). If the part before | matches, the
result is only two subexpressions, as the first two subexpressions
match. No duplicate naming will change this. There is no real "select
the one that matches" functionality in giving two subexpressions the
same name.
PCRE just picks one of the occurences of a name when you ask for it - in
your last example the occurence you were not expecting, but that's more
or less random, the first example would give unexpected results if the
first part matched. PCRE has no functionality to pick all occurences of
a name, but that could of course be changed if there was some
understandable semantics that should be implemented. I think badarg
exception is the way to go though...
Cheers,
/Patrik
On 03/24/2013 07:58 AM, Sergei Golovan wrote:
> Hi!
>
> Chris King recently discovered a bug in re module. Appears that the
> matched named subpatterns are not always returned.
>
> The following command works correctly:
> 1> re:run("bar", "^(?<a>foo)(?<b>bla)$|^(?<a>[[:word:]]+)$",
> [dupnames, {capture, [a, b], list}]).
> {match,["bar",[]]}
>
> But semantically the same one doesn't (note the swapped <a> and <b>):
> 1> re:run("bar", "^(?<b>foo)(?<a>bla)$|^(?<b>[[:word:]]+)$",
> [dupnames, {capture, [a, b], list}]).
> {match,[[],[]]}
>
> In both cases the second branch matches, but only the first command
> returns the required subpattern.
>
> The bug is reproducible in R16B.
>
> Cheers!
More information about the erlang-bugs
mailing list