[erlang-bugs] xmerl default encoding

Paul Mineiro <>
Tue Jan 22 06:59:12 CET 2008


when i run the attached document (a simple xml document that lacks an
encoding declaration) through xmerl_scan:file/1 the result contains the
iso-8859-1 encoding of tilde n (\361).  however the original contains the
utf-8 encoding of tilde n (\303\261) and the character set change
suprised me.  adding a { encoding, "utf-8" } option to xmerl_scan:file/2
fixed things but the reference manual (and xml spec) say the utf-8 is
the default.

thanks,

-- p

Eshell V5.5.5  (abort with ^G)
1> xmerl_scan:file ("noencodingdecl.xml").
{{xmlElement,'Actor',
             'Actor',
             [],
             {xmlNamespace,[],[]},
             [],
             1,
             [],
             [{xmlText,[{'Actor',1}],1,[],"Elizabeth Pe\361a",text}],
             [],
             ".",
             undeclared},
 []}
2> xmerl_scan:file ("noencodingdecl.xml", [ { encoding, "utf-8" } ]).
{{xmlElement,'Actor',
             'Actor',
             [],
             {xmlNamespace,[],[]},
             [],
             1,
             [],
             [{xmlText,[{'Actor',1}],1,[],"Elizabeth Pe\303\261a",text}],
             [],
             ".",
             undeclared},
 []}
-------------- next part --------------
A non-text attachment was scrubbed...
Name: noencodingdecl.xml
Type: application/xml
Size: 52 bytes
Desc: 
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20080121/c06a332d/attachment.wsdl>


More information about the erlang-bugs mailing list