[erlang-questions] Exception in xmerl, when pasing XML with non UTF8 character set
Tue Jan 8 09:00:22 CET 2008
Introducing a bug that accomplish what you want is to add 'windows-1252'
in the function guard of to_unicode/2 in xmerl_ucs.erl
> XML generated by closed-source 3rd party Windows server (if it was generated
> by me, then it was encoded in utf-8).
> I asking here questions from Erlang domain, not the obvious & ugly common
> sence solutions, like reading the entire file into memory, changing the
> encoding string and only then feeding it into xmerl. (the problem only that
> this XML can be quite big, like 0.5 MB and more).
> Maybe xmerl has some option for forcing encoding, other than specified in
> the <?xml?> PI?
> Maybe there is some other XML parser like erlsom or expat driver, which
> supports windows-1252 encoding?
> Anyway I using xmerl just for prototyping, the long term solution will be to
> write C++ port, which will be doing all the XML processing and return Erlang
> terms in either text or binary form, which can be read either by
> file:consult or binary_to_term on the Erlang side.
> Christian S wrote:
>> Why not ask yourself how to change your xml so it says iso-8859-1 as you
>> it should be doing?
>> On Jan 7, 2008 5:22 PM, Zvi <> wrote:
>>> thanks for the reply.
>>> Actually the charcter set used is always latin-1, but for some reason 3rd
>>> party software call it windows-1252 . So if you can tell me, what I
>>> change in xmerl, so it will threat windows-1252 as Latin-1 .
>> erlang-questions mailing list
More information about the erlang-questions