[erlang-bugs] xmerl and unicode data

Ali Sabil <>
Fri Oct 19 15:58:06 CEST 2012


Hi all,

I was wondering if anyone came across the following behaviour?


Erlang R15B02 (erts-5.9.2) [source] [64-bit] [smp:4:4]
[async-threads:0] [hipe] [kernel-poll:false] [dtrace]

Eshell V5.9.2  (abort with ^G)
1> xmerl_scan:string("<?xml version=\"1.0\"
encoding=\"utf-8\"?><test>你好 Björk</test>").
3414- fatal: {error,{wfc_Legal_Character,{error,{bad_character,20320}}}}
** exception exit:
{fatal,{{error,{wfc_Legal_Character,{error,{bad_character,20320}}}},
                           {file,file_name_unknown},
                           {line,1},
                           {col,47}}}
     in function  xmerl_scan:fatal/2 (xmerl_scan.erl, line 4102)
     in call from xmerl_scan:scan_char_data/5 (xmerl_scan.erl, line 2703)
     in call from xmerl_scan:scan_content/11 (xmerl_scan.erl, line 2615)
     in call from xmerl_scan:scan_element/12 (xmerl_scan.erl, line 2128)
     in call from xmerl_scan:scan_document/2 (xmerl_scan.erl, line 570)
     in call from xmerl_scan:string/2 (xmerl_scan.erl, line 286)
2>
2> xmerl_scan:string("<?xml version=\"1.0\"
encoding=\"utf-8\"?><test>你好 Björk</test>", [{encoding, latin1}]).
{{xmlElement,test,test,[],
             {xmlNamespace,[],[]},
             [],1,[],
             [{xmlText,[{test,1}],
                       1,[],
                       [20320,22909,32,66,106,246,114,107],
                       text}],
             [],"/Users/asabil/test",
             undeclared},
 []}
3>
3> io:getopts().
[{expand_fun,#Fun<group.0.129081181>},
 {echo,true},
 {binary,false},
 {encoding,unicode}]


Thanks,
Ali


More information about the erlang-bugs mailing list