[erlang-questions] Exception in xmerl, when pasing XML with non UTF8 character set

Zvi exta7@REDACTED
Mon Jan 7 17:22:25 CET 2008


Bertil,

thanks for the reply.
Actually the charcter set used is always latin-1, but for some reason 3rd
party software call it windows-1252 . So if you can tell me, what I should
change in xmerl, so it will threat windows-1252 as Latin-1 .

>From Wikipedia:
"The ISO-8859-1/Windows-1252 mixup
It is very common to mislabel text data with the charset label ISO-8859-1,
even though the data is really Windows-1252 encoded. In Windows-1252, codes
between 0x80 and 0x9F are used for letters and punctuation, whereas they are
control codes in ISO-8859-1. Many web browsers and e-mail clients will
interpret ISO-8859-1 control codes as Windows-1252 characters in order to
accommodate such mislabeling but it is not a standard behaviour and care
should be taken to avoid generating these characters in ISO-8859-1 labeled
content."

Thanks in advance.
Zvi


Bertil Karlsson wrote:
> 
> If I'm right windows-1252 uses its own conversion table that doesn't 
> exist in xmerl today. Just changing the encoding to something that seems 
> to work may cause trouble when it comes to those characters that differs.
> It is not difficult to add the changes needed to xmerl, but I cannot 
> promise it into the next release.
> 
> /Bertil
> 
> Zvi wrote:
>> 3> { Xml, _Rest } = xmerl_scan:file(ResultIdx).
>> ** exception exit: {bad_character_code,
>>                        "<!DOCTYPE BODY SYSTEM
>> "http://www.xxx.com/yyy.dtd\">\n<BODY>\n<RENDERING>\naaa</RENDERING>\n</BODY>\n",
>>                        'windows-1252'}
>>      in function  xmerl_ucs:to_unicode/2
>>      in call from xmerl_scan:scan_document/2
>>      in call from xmerl_scan:file/2
>>
>> The XML document starts with PI: <?xml version="1.0"
>> encoding="windows-1252"?>
>> It works, after changing it to 
>>    <?xml version="1.0" encoding="utf-8"?>
>>
>> The problem is that this XML document generated by 3rd party SW, so I
>> would
>> like to fix xmerl code, or use some xmerl option.
>>
>> I using R12B on Windows.
>>
>> TIA
>>
>> Zvi
>>
>>   
> 
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions
> 
> 

-- 
View this message in context: http://www.nabble.com/Exception-in-xmerl%2C-when-pasing-XML-with-non-UTF8-character-set-tp14588326p14669389.html
Sent from the Erlang Questions mailing list archive at Nabble.com.




More information about the erlang-questions mailing list