[erlang-questions] [erlang-bugs] xmerl and unicode data
Anthony Ramine
n.oxyde@REDACTED
Fri Oct 19 17:01:00 CEST 2012
Le 19 oct. 2012 à 15:58, Ali Sabil a écrit :
> Hi all,
>
> I was wondering if anyone came across the following behaviour?
>
>
> Erlang R15B02 (erts-5.9.2) [source] [64-bit] [smp:4:4]
> [async-threads:0] [hipe] [kernel-poll:false] [dtrace]
>
> Eshell V5.9.2 (abort with ^G)
> 1> xmerl_scan:string("<?xml version=\"1.0\"
> encoding=\"utf-8\"?><test>你好 Björk</test>").
> 3414- fatal: {error,{wfc_Legal_Character,{error,{bad_character,20320}}}}
> ** exception exit:
> {fatal,{{error,{wfc_Legal_Character,{error,{bad_character,20320}}}},
> {file,file_name_unknown},
> {line,1},
> {col,47}}}
> in function xmerl_scan:fatal/2 (xmerl_scan.erl, line 4102)
> in call from xmerl_scan:scan_char_data/5 (xmerl_scan.erl, line 2703)
> in call from xmerl_scan:scan_content/11 (xmerl_scan.erl, line 2615)
> in call from xmerl_scan:scan_element/12 (xmerl_scan.erl, line 2128)
> in call from xmerl_scan:scan_document/2 (xmerl_scan.erl, line 570)
> in call from xmerl_scan:string/2 (xmerl_scan.erl, line 286)
> 2>
> 2> xmerl_scan:string("<?xml version=\"1.0\"
> encoding=\"utf-8\"?><test>你好 Björk</test>", [{encoding, latin1}]).
> {{xmlElement,test,test,[],
> {xmlNamespace,[],[]},
> [],1,[],
> [{xmlText,[{test,1}],
> 1,[],
> [20320,22909,32,66,106,246,114,107],
> text}],
> [],"/Users/asabil/test",
> undeclared},
> []}
> 3>
> 3> io:getopts().
> [{expand_fun,#Fun<group.0.129081181>},
> {echo,true},
> {binary,false},
> {encoding,unicode}]
>
>
> Thanks,
> Ali
Hi,
From my vague souvenirs of xmerl's innards, I'm pretty sure it happens
because xmerl_scan:string expects a list of bytes and does not check whether
a given byte is valid latin1.
Regards,
--
Anthony Ramine
More information about the erlang-questions
mailing list