[erlang-questions] Exception in xmerl, when pasing XML with non UTF8 character set

Willem de Jong w.a.de.jong@REDACTED
Wed Jan 9 17:00:38 CET 2008


Hi,

Similar to what Bertil suggested for Xmerl, you can achieve this in Erlsom
by adding a clause

"windows-1252" -> 'iso-8859-1';  %% note: this is actually introducing a bug

                                 %% in order to work around a problem!

to the case statement in encoding_type() in erlsom_lib.erl.

I would be interested to know why you think it will be necessary to replace
it by a C++ port. It seems to me that it will be complicating things
considerably. What are the requirements that make this necessary? What
properties should an Erlang XML parser have?

Regards,
Willem


On 1/7/08, Zvi <exta7@REDACTED> wrote:
>
>
> XML generated by closed-source 3rd party Windows server (if it was
> generated
> by me, then it was encoded in utf-8).
> I asking here questions from Erlang domain, not the obvious & ugly common
> sence solutions, like reading the entire file into memory, changing the
> encoding string and only then feeding it into xmerl. (the problem only
> that
> this XML can be quite big, like 0.5 MB and more).
> Maybe xmerl has some option for forcing encoding, other than specified in
> the <?xml?> PI?
> Maybe there is some other XML parser like erlsom or expat driver, which
> supports windows-1252 encoding?
> Anyway I using xmerl just for prototyping, the long term solution will be
> to
> write C++ port, which will be doing all the XML processing and return
> Erlang
> terms in either text or binary form, which can be read either by
> file:consult or binary_to_term on the Erlang side.
>
> ZVi
>
>
> Christian S wrote:
> >
> > Why not ask yourself how to change your xml so it says iso-8859-1 as you
> > say
> > it should be doing?
> >
> > http://en.wikipedia.org/wiki/Garbage_In,_Garbage_Out
> >
> > On Jan 7, 2008 5:22 PM, Zvi <exta7@REDACTED> wrote:
> >>
> >> Bertil,
> >>
> >> thanks for the reply.
> >> Actually the charcter set used is always latin-1, but for some reason
> 3rd
> >> party software call it windows-1252 . So if you can tell me, what I
> >> should
> >> change in xmerl, so it will threat windows-1252 as Latin-1 .
> > _______________________________________________
> > erlang-questions mailing list
> > erlang-questions@REDACTED
> > http://www.erlang.org/mailman/listinfo/erlang-questions
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Exception-in-xmerl%2C-when-pasing-XML-with-non-UTF8-character-set-tp14588326p14674437.html
> Sent from the Erlang Questions mailing list archive at Nabble.com.
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20080109/f456602a/attachment.htm>


More information about the erlang-questions mailing list