[erlang-questions] Exception in xmerl, when pasing XML with non UTF8 character set

Zvi <>
Mon Jan 7 20:52:00 CET 2008


XML generated by closed-source 3rd party Windows server (if it was generated
by me, then it was encoded in utf-8).
I asking here questions from Erlang domain, not the obvious & ugly common
sence solutions, like reading the entire file into memory, changing the
encoding string and only then feeding it into xmerl. (the problem only that
this XML can be quite big, like 0.5 MB and more).
Maybe xmerl has some option for forcing encoding, other than specified in
the <?xml?> PI?
Maybe there is some other XML parser like erlsom or expat driver, which
supports windows-1252 encoding?
Anyway I using xmerl just for prototyping, the long term solution will be to
write C++ port, which will be doing all the XML processing and return Erlang
terms in either text or binary form, which can be read either by
file:consult or binary_to_term on the Erlang side.

ZVi


Christian S wrote:
> 
> Why not ask yourself how to change your xml so it says iso-8859-1 as you
> say
> it should be doing?
> 
> http://en.wikipedia.org/wiki/Garbage_In,_Garbage_Out
> 
> On Jan 7, 2008 5:22 PM, Zvi <> wrote:
>>
>> Bertil,
>>
>> thanks for the reply.
>> Actually the charcter set used is always latin-1, but for some reason 3rd
>> party software call it windows-1252 . So if you can tell me, what I
>> should
>> change in xmerl, so it will threat windows-1252 as Latin-1 .
> _______________________________________________
> erlang-questions mailing list
> 
> http://www.erlang.org/mailman/listinfo/erlang-questions
> 
> 

-- 
View this message in context: http://www.nabble.com/Exception-in-xmerl%2C-when-pasing-XML-with-non-UTF8-character-set-tp14588326p14674437.html
Sent from the Erlang Questions mailing list archive at Nabble.com.




More information about the erlang-questions mailing list