[erlang-bugs] Bug in xmerl
Lars Thorsen
lars@REDACTED
Wed Jul 2 09:38:58 CEST 2008
Hi,
it was a bug in xmerl. The ending parenthesis in the call to
string_to_char_set/2 (line 2449 in xmerl_scan)was placed wrong.
This will be fixed in R12B-4 but I include some patch lines below.
------------------------- Patch start ----------------------------------
--- xmerl_scan.erl@@/main/xmerl/108 2008-04-25 09:20:41.000000000 +0200
+++ xmerl_scan.erl 2008-07-01 17:11:18.000000000 +0200
@@ -2446,7 +2446,7 @@
case markup_delimeter(ExpRef) of
true ->
scan_content(ExpRef++T1,S1,Pos,Name,Attrs,Space,Lang,Parents,NS,Acc,ExpRef);
_ ->
-
scan_content(string_to_char_set(S1#xmerl_scanner.encoding,ExpRef++T1),S1,Pos,Name,Attrs,Space,Lang,Parents,NS,Acc,[])
+
scan_content(string_to_char_set(S1#xmerl_scanner.encoding,ExpRef)++T1,S1,Pos,Name,Attrs,Space,Lang,Parents,NS,Acc,[])
end;
scan_content("<!--" ++ T, S, Pos, Name, Attrs, Space, Lang, Parents,
NS, Acc,[]) ->
{_, T1, S1} = scan_comment(T, S, Pos, Parents, Lang),
------------------------- Patch end ----------------------------------
Regards Lars
Mikkel Jensen wrote:
> Is it possible for someone from the OTP team to confirm if this is a bug
> or not?
>
> If it is I could really use a patch :-)
>
> - Mikkel
>
> On Fri, Jun 27, 2008 at 2:57 PM, Mikkel Jensen <mj@REDACTED
> <mailto:mj@REDACTED>> wrote:
>
> It seems there is a bug in xmerl when loading elements that contain
> numeric character references followed by UTF-8 characters.
>
> Example: é newline é
>
> 1> element(1, xmerl_scan:string("<a>\303\251
\303\251</a>",
> [{encoding, 'utf-8'}])).
> {xmlElement,a,a,[],
> {xmlNamespace,[],[]},
> [],1,[],
> [{xmlText,[{a,1}],1,[],"\303\251",text},
> {xmlText,[{a,1}],2,[],[10,195,131,194,169],text}],
> [],"/",undeclared}
>
> Xmerl splits the parsed value around the newline character (strange
> but ok). However, the first part is encoded correctly while the
> second part is garbled!
>
> It's worth noticing that attribute values are encoded correctly:
>
> 2> element(1, xmerl_scan:string("<a b=\"\303\251
\303\251\"/>",
> [{encoding, 'utf-8'}])).
> {xmlElement,a,a,[],
> {xmlNamespace,[],[]},
> [],1,
> [{xmlAttribute,b,[],[],[],[],1,[],"\303\251
> \303\251",false}],
> [],[],"/",undeclared}
>
> - Mikkel
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-bugs
More information about the erlang-bugs
mailing list