[erlang-questions] [erlang-bugs] xmerl and unicode data

Anthony Ramine <>
Fri Oct 19 17:01:00 CEST 2012


Le 19 oct. 2012 à 15:58, Ali Sabil a écrit :

> Hi all,
> 
> I was wondering if anyone came across the following behaviour?
> 
> 
> Erlang R15B02 (erts-5.9.2) [source] [64-bit] [smp:4:4]
> [async-threads:0] [hipe] [kernel-poll:false] [dtrace]
> 
> Eshell V5.9.2  (abort with ^G)
> 1> xmerl_scan:string("<?xml version=\"1.0\"
> encoding=\"utf-8\"?><test>你好 Björk</test>").
> 3414- fatal: {error,{wfc_Legal_Character,{error,{bad_character,20320}}}}
> ** exception exit:
> {fatal,{{error,{wfc_Legal_Character,{error,{bad_character,20320}}}},
>                           {file,file_name_unknown},
>                           {line,1},
>                           {col,47}}}
>     in function  xmerl_scan:fatal/2 (xmerl_scan.erl, line 4102)
>     in call from xmerl_scan:scan_char_data/5 (xmerl_scan.erl, line 2703)
>     in call from xmerl_scan:scan_content/11 (xmerl_scan.erl, line 2615)
>     in call from xmerl_scan:scan_element/12 (xmerl_scan.erl, line 2128)
>     in call from xmerl_scan:scan_document/2 (xmerl_scan.erl, line 570)
>     in call from xmerl_scan:string/2 (xmerl_scan.erl, line 286)
> 2>
> 2> xmerl_scan:string("<?xml version=\"1.0\"
> encoding=\"utf-8\"?><test>你好 Björk</test>", [{encoding, latin1}]).
> {{xmlElement,test,test,[],
>             {xmlNamespace,[],[]},
>             [],1,[],
>             [{xmlText,[{test,1}],
>                       1,[],
>                       [20320,22909,32,66,106,246,114,107],
>                       text}],
>             [],"/Users/asabil/test",
>             undeclared},
> []}
> 3>
> 3> io:getopts().
> [{expand_fun,#Fun<group.0.129081181>},
> {echo,true},
> {binary,false},
> {encoding,unicode}]
> 
> 
> Thanks,
> Ali

Hi,

From my vague souvenirs of xmerl's innards, I'm pretty sure it happens
because xmerl_scan:string expects a list of bytes and does not check whether
a given byte is valid latin1.

Regards,

-- 
Anthony Ramine




More information about the erlang-questions mailing list