[erlang-questions] Exception in xmerl, when pasing XML with non UTF8 character set

Bertil Karlsson <>
Tue Jan 8 09:00:22 CET 2008


Introducing a bug that accomplish what you want is to add 'windows-1252'
in the function guard of to_unicode/2 in xmerl_ucs.erl

/Bertil

Zvi wrote:
> XML generated by closed-source 3rd party Windows server (if it was generated
> by me, then it was encoded in utf-8).
> I asking here questions from Erlang domain, not the obvious & ugly common
> sence solutions, like reading the entire file into memory, changing the
> encoding string and only then feeding it into xmerl. (the problem only that
> this XML can be quite big, like 0.5 MB and more).
> Maybe xmerl has some option for forcing encoding, other than specified in
> the <?xml?> PI?
> Maybe there is some other XML parser like erlsom or expat driver, which
> supports windows-1252 encoding?
> Anyway I using xmerl just for prototyping, the long term solution will be to
> write C++ port, which will be doing all the XML processing and return Erlang
> terms in either text or binary form, which can be read either by
> file:consult or binary_to_term on the Erlang side.
>
> ZVi
>
>
> Christian S wrote:
>   
>> Why not ask yourself how to change your xml so it says iso-8859-1 as you
>> say
>> it should be doing?
>>
>> http://en.wikipedia.org/wiki/Garbage_In,_Garbage_Out
>>
>> On Jan 7, 2008 5:22 PM, Zvi <> wrote:
>>     
>>> Bertil,
>>>
>>> thanks for the reply.
>>> Actually the charcter set used is always latin-1, but for some reason 3rd
>>> party software call it windows-1252 . So if you can tell me, what I
>>> should
>>> change in xmerl, so it will threat windows-1252 as Latin-1 .
>>>       
>> _______________________________________________
>> erlang-questions mailing list
>> 
>> http://www.erlang.org/mailman/listinfo/erlang-questions
>>
>>
>>     
>
>   




More information about the erlang-questions mailing list