[erlang-bugs] Two xmerl_xpath predicate handling bugs
Matthew Dempsky
matthew@REDACTED
Sun Aug 17 10:26:20 CEST 2008
Summary:
1. "//c[1]" against "<a><b><c/></b><b><c/></b></a>" should return
both 'c' elements; xmerl_xpath returns the first one.
2. "/a/b[@e='f'][position()=1]" against "<a><b c='d'/><b
e='f'/></a>" should return the second 'b' element; xmerl_xpath
returns an empty set.
For the first bug, the XPath spec states:
NOTE: The location path //para[1] does *not* mean the same as the
location path /descendant::para[1]. The latter selects the first
descendant para element; the former selects all descendant para
elements that are the first para children of their parents.
Accordingly, "//c[1]" against "<a><b><c/></b><b><c/></b></a>" should
match both 'c' elements, but xmerl_xpath only returns the first.
This is because when xmerl_xpath:path_expr applies the child::c
axis/node-test selection to its context nodeset of all nodes,
xmerl_xpath:axis merges the result nodesets before path_expr calls
pred_expr, so the [1] predicate is applied to the merged nodeset
rather than to each individual nodeset.
For the second bug, the spec states:
child::para[attribute::type='warning'][position()=5] selects the
fifth para child of the context node that has a type attribute
with value warning
Accordingly, "/a/b[@e='f'][position()=1]" against "<a><b c='d'/><b
e='f'/></a>" should return the second 'b' element, because it is the
first b child of the context node (the root 'a' element) that has its
'e' attribute with value 'f'; but xmerl_xpath returns an empty set.
This is because xmerl_xpath numbers the node positions only once after
applying the axis/node tests, so "position()" still evaluates to 2 in
xmerl_xpath_pred. (Note that "/a/b[@e='f'][1]" still works correctly
because xmerl_xpath includes a short-circuit to not depend on
#xmlNode.pos.)
As a third bug, you might also argue that "//b/following::b" against
"<a><b/><b/><b/></a>" should return the final 'b' element only once,
because mathematically unions of sets should omit duplicates and
node-sets as defined in expression contexts omit duplicates, but if
the user ensures to call lists:usort on the result, the only other
negative consequence is worsened performance in contrived test cases.
Also beware I'm unlikely to submit a patch for the above issues any
time soon. (At this point, I'm more tempted to write my own XPath
implementation that uses lazy axis walking, but I've already spent
far, far more time on XPath than I really should have...)
More information about the erlang-bugs
mailing list