[erlang-questions] Exception in xmerl, when pasing XML with non UTF8 character set
Willem de Jong
Wed Jan 9 17:00:38 CET 2008
Similar to what Bertil suggested for Xmerl, you can achieve this in Erlsom
by adding a clause
"windows-1252" -> 'iso-8859-1'; %% note: this is actually introducing a bug
%% in order to work around a problem!
to the case statement in encoding_type() in erlsom_lib.erl.
I would be interested to know why you think it will be necessary to replace
it by a C++ port. It seems to me that it will be complicating things
considerably. What are the requirements that make this necessary? What
properties should an Erlang XML parser have?
On 1/7/08, Zvi <exta7@REDACTED> wrote:
> XML generated by closed-source 3rd party Windows server (if it was
> by me, then it was encoded in utf-8).
> I asking here questions from Erlang domain, not the obvious & ugly common
> sence solutions, like reading the entire file into memory, changing the
> encoding string and only then feeding it into xmerl. (the problem only
> this XML can be quite big, like 0.5 MB and more).
> Maybe xmerl has some option for forcing encoding, other than specified in
> the <?xml?> PI?
> Maybe there is some other XML parser like erlsom or expat driver, which
> supports windows-1252 encoding?
> Anyway I using xmerl just for prototyping, the long term solution will be
> write C++ port, which will be doing all the XML processing and return
> terms in either text or binary form, which can be read either by
> file:consult or binary_to_term on the Erlang side.
> Christian S wrote:
> > Why not ask yourself how to change your xml so it says iso-8859-1 as you
> > say
> > it should be doing?
> > http://en.wikipedia.org/wiki/Garbage_In,_Garbage_Out
> > On Jan 7, 2008 5:22 PM, Zvi <exta7@REDACTED> wrote:
> >> Bertil,
> >> thanks for the reply.
> >> Actually the charcter set used is always latin-1, but for some reason
> >> party software call it windows-1252 . So if you can tell me, what I
> >> should
> >> change in xmerl, so it will threat windows-1252 as Latin-1 .
> > _______________________________________________
> > erlang-questions mailing list
> > erlang-questions@REDACTED
> > http://www.erlang.org/mailman/listinfo/erlang-questions
> View this message in context:
> Sent from the Erlang Questions mailing list archive at Nabble.com.
> erlang-questions mailing list
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the erlang-questions