[erlang-bugs] Bug in xmerl

Wed Jul 2 09:38:58 CEST 2008

Hi,
it was a bug in xmerl. The ending parenthesis in the call to 
string_to_char_set/2 (line 2449 in xmerl_scan)was placed wrong.
This will be fixed in R12B-4 but I include some patch lines below.

------------------------- Patch start ----------------------------------

--- xmerl_scan.erl@@/main/xmerl/108     2008-04-25 09:20:41.000000000 +0200
+++ xmerl_scan.erl      2008-07-01 17:11:18.000000000 +0200
@@ -2446,7 +2446,7 @@
      case markup_delimeter(ExpRef) of
         true -> 
scan_content(ExpRef++T1,S1,Pos,Name,Attrs,Space,Lang,Parents,NS,Acc,ExpRef);
         _ ->
- 
scan_content(string_to_char_set(S1#xmerl_scanner.encoding,ExpRef++T1),S1,Pos,Name,Attrs,Space,Lang,Parents,NS,Acc,[])
+ 
scan_content(string_to_char_set(S1#xmerl_scanner.encoding,ExpRef)++T1,S1,Pos,Name,Attrs,Space,Lang,Parents,NS,Acc,[])
      end;
  scan_content("<!--" ++ T, S, Pos, Name, Attrs, Space, Lang, Parents, 
NS, Acc,[]) ->
      {_, T1, S1} = scan_comment(T, S, Pos, Parents, Lang),
------------------------- Patch end ----------------------------------

Regards Lars




Mikkel Jensen wrote:
> Is it possible for someone from the OTP team to confirm if this is a bug 
> or not?
> 
> If it is I could really use a patch :-)
> 
> - Mikkel
> 
> On Fri, Jun 27, 2008 at 2:57 PM, Mikkel Jensen <mj@REDACTED 
> <mailto:mj@REDACTED>> wrote:
> 
>     It seems there is a bug in xmerl when loading elements that contain
>     numeric character references followed by UTF-8 characters.
> 
>     Example: é newline é
> 
>     1> element(1, xmerl_scan:string("<a>\303\251&#xD;\303\251</a>",
>     [{encoding, 'utf-8'}])).
>     {xmlElement,a,a,[],
>                 {xmlNamespace,[],[]},
>                 [],1,[],
>                 [{xmlText,[{a,1}],1,[],"\303\251",text},
>                  {xmlText,[{a,1}],2,[],[10,195,131,194,169],text}],
>                 [],"/",undeclared}
> 
>     Xmerl splits the parsed value around the newline character (strange
>     but ok). However, the first part is encoded correctly while the
>     second part is garbled!
> 
>     It's worth noticing that attribute values are encoded correctly:
> 
>     2> element(1, xmerl_scan:string("<a b=\"\303\251&#xD;\303\251\"/>",
>     [{encoding, 'utf-8'}])).
>     {xmlElement,a,a,[],
>                 {xmlNamespace,[],[]},
>                 [],1,
>                 [{xmlAttribute,b,[],[],[],[],1,[],"\303\251
>     \303\251",false}],
>                 [],[],"/",undeclared}
> 
>     - Mikkel
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-bugs