xmerl and $'

Mark Fine mark.fine@REDACTED
Sun May 23 21:51:51 CEST 2010


Ditto for $&, which I can't workaround:

problem2() ->
    Xml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Key>There
& problem</Key>",
    {Doc, _} = xmerl_scan:string(Xml),
    [Value || #xmlText{value = Value} <-
xmerl_xpath:string("//Key/text()", Doc)].

Which returns ["There ","& problem"] instead of ["There & problem"]

I see in xmerl_scan:scan_entity_ref there's some handling of these and
some comments that seem relevant:

%% Chapter 4.4.2: ... the replacement text of entities used to escape
%% markup delimiters (the entities amp, lt, gt, apos, quot) is always treated
%% as data. (The string "AT&T;" expands to "AT&T;" and the remaining
%% ampersand is not recognized as an entity-reference delimiter.)"

let me know how I should pre-process the XML to enable xmerl_scan to
handle things appropriately. Thanks!

Mark

On Sun, May 23, 2010 at 11:58 AM, Mark Fine <mark.fine@REDACTED> wrote:
> The character $' is being returned in an XML document, and when I use
> xmerl to process the XML document, I get multiple values instead of a
> single value:
>
> problem() ->
>    Xml = "<?xml version=\"1.0\"
> encoding=\"UTF-8\"?>\n<Key>There's a problem</Key>",
>    {Doc, _} = xmerl_scan:string(Xml),
>    [Value || #xmlText{value = Value} <-
> xmerl_xpath:string("//Key/text()", Doc)].
>
> Which returns ["There","'s a problem"] instead of ["There's a problem"].
>
> I note if I pre-process the XML by s/&apos/'/, I will get the expected
> ["There's a problem"] -- is that the right workaround? Thanks!
>
> Mark
>


More information about the erlang-questions mailing list