[erlang-questions] erlsom and binary parsing output

Zvi <>
Thu Feb 7 02:21:32 CET 2008


Hi Willem,

I using erlsom and it's much easier to use, than xmerl.
The added benefit, that I do not need to patch my Erlang instalation with
'windows-1252' encoding support.
My problem with erlsom is,that parsed strings are lists. Is there are any
option in erlsom so xs:string will be mapped to the Erlang binary?
Also in XML Data Binding product I was using with C++, you can specify
mappings from XSD datatypes to the C++ datatypes and even create mappings
for custom datatypes.
In my schema I have integers and floats, besides strings, but they all
mapped to strings (more exactly to Erlang lists of ASCII codes :).
Some C++ XML Data Binding products even handle enums (which is much bigger
problem in C++ than in Erlang - no atoms). For example I can map XSD
datatype xs:date to my custom class CDate and just provide two methods:
fromString and toString.
If erlsom will map at least standard XSD datatypes to standard Erlang
datatypes it will be also usefull.

Thanks for the usefull tool.
Zvi



Willem de Jong-2 wrote:
> 
> Hi,
> 
> Similar to what Bertil suggested for Xmerl, you can achieve this in Erlsom
> by adding a clause
> 
> "windows-1252" -> 'iso-8859-1';  %% note: this is actually introducing a
> bug
> 
>                                  %% in order to work around a problem!
> 
> to the case statement in encoding_type() in erlsom_lib.erl.
> 
> I would be interested to know why you think it will be necessary to
> replace
> it by a C++ port. It seems to me that it will be complicating things
> considerably. What are the requirements that make this necessary? What
> properties should an Erlang XML parser have?
> 
> Regards,
> Willem
> 
> 
> On 1/7/08, Zvi <> wrote:
>>
>>
>> XML generated by closed-source 3rd party Windows server (if it was
>> generated
>> by me, then it was encoded in utf-8).
>> I asking here questions from Erlang domain, not the obvious & ugly common
>> sence solutions, like reading the entire file into memory, changing the
>> encoding string and only then feeding it into xmerl. (the problem only
>> that
>> this XML can be quite big, like 0.5 MB and more).
>> Maybe xmerl has some option for forcing encoding, other than specified in
>> the <?xml?> PI?
>> Maybe there is some other XML parser like erlsom or expat driver, which
>> supports windows-1252 encoding?
>> Anyway I using xmerl just for prototyping, the long term solution will be
>> to
>> write C++ port, which will be doing all the XML processing and return
>> Erlang
>> terms in either text or binary form, which can be read either by
>> file:consult or binary_to_term on the Erlang side.
>>
>> ZVi
>>
>>
>> Christian S wrote:
>> >
>> > Why not ask yourself how to change your xml so it says iso-8859-1 as
>> you
>> > say
>> > it should be doing?
>> >
>> > http://en.wikipedia.org/wiki/Garbage_In,_Garbage_Out
>> >
>> > On Jan 7, 2008 5:22 PM, Zvi <> wrote:
>> >>
>> >> Bertil,
>> >>
>> >> thanks for the reply.
>> >> Actually the charcter set used is always latin-1, but for some reason
>> 3rd
>> >> party software call it windows-1252 . So if you can tell me, what I
>> >> should
>> >> change in xmerl, so it will threat windows-1252 as Latin-1 .
>> > _______________________________________________
>> > erlang-questions mailing list
>> > 
>> > http://www.erlang.org/mailman/listinfo/erlang-questions
>> >
>> >
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Exception-in-xmerl%2C-when-pasing-XML-with-non-UTF8-character-set-tp14588326p14674437.html
>> Sent from the Erlang Questions mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> erlang-questions mailing list
>> 
>> http://www.erlang.org/mailman/listinfo/erlang-questions
>>
> 
> _______________________________________________
> erlang-questions mailing list
> 
> http://www.erlang.org/mailman/listinfo/erlang-questions
> 

-- 
View this message in context: http://www.nabble.com/Exception-in-xmerl%2C-when-pasing-XML-with-non-UTF8-character-set-tp14588326p15325643.html
Sent from the Erlang Questions mailing list archive at Nabble.com.




More information about the erlang-questions mailing list