[erlang-questions] character encoding and xmerl

Martin Dimitrov mrtndimitrov@REDACTED
Thu Jan 5 10:25:06 CET 2012


Hello,

In our app we upload a XML file through simple form. The page is encoded
in UTF-8 as well as the file.

YAWS gathers the parts of the file, flattens them and sends them to
xmerl. The XML is scanned through xmerl_scan:string with {encoding,
"utf-8"}. When I dump the string the Cyrillic word продукт is printed as
208,191,209,128,208,190,208,180,209,131,208,186,209,130.

After the scan, the Cyrillic word is printed as
1087,1088,1086,1076,1091,1082,1090 which, according to my believes, is
the correct Unicode representation.

The problem is when our internal structures are exported to XML. Then
trying to scan the XML again, xmerl reports:

{fatal,{{unexpected_char,{error,{bad_character,1087}}}


Thanks in advance,

Martin



More information about the erlang-questions mailing list