<div>Hi Zvi,</div>
<div> </div>
<div>I am pleased to hear that you like the tool :)</div>
<div> </div>
<div>At this moment there is no way to tell erlsom that it should map strings to binaries in stead of lists. Right now erlsom only maps integers, booleans and qnames (and only under certain conditions - it doesn't try to determine what the type is of an extended or restricted type).</div>
<div> </div>
<div>The sax parser that is part of erlsom already does the decoding of the binary input to a list of integers (unicode 'code points'). This means that the layer that deals with all the schema-stuff would have to re-encode it again, which wouldn't be very efficient. However, I am currently finalising a new version the sax-parser. This version will take advantage of the improved handling of binaries in R12B; it will work directly on binaries. In theory it could also return binaries, but I am not sure this would be a good idea. It depends on what you want to do with the output, I guess. </div>
<div> </div>
<div>Your request raises a couple of questions:</div>
<div>- if erlsom would mp strings to binaries, then these binaries have to be encoded in some way (unless you want to accept only ASCII). What encoding should it use? My feeling is that UTF-8 would be the best choice.</div>
<div> </div>
<div>- if you want to have this kind of mapping, then you need a way to tell erlsom what to map, and how. What would be the way to do this? Would you prefer to add some special information to the xsd, or would you like to do it in another way?</div>
<div><br>Anyway, currently I do not have any plans to do a lot of work on this - but I must say that I have also been thinking about the possibility to return binaries in stead of strings. It might be a possibility to introduce an option to have binaries (utf-8 encoded) in all the places where currently strings are used. </div>
<div> </div>
<div>Regards,</div>
<div>Willem</div>
<div><br> </div>
<div class="gmail_quote">On Feb 7, 2008 2:21 AM, Zvi <<a href="mailto:exta7@walla.com">exta7@walla.com</a>> wrote:<br>
<blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid"><br>Hi Willem,<br><br>I using erlsom and it's much easier to use, than xmerl.<br>The added benefit, that I do not need to patch my Erlang instalation with<br>
'windows-1252' encoding support.<br>My problem with erlsom is,that parsed strings are lists. Is there are any<br>option in erlsom so xs:string will be mapped to the Erlang binary?<br>Also in XML Data Binding product I was using with C++, you can specify<br>
mappings from XSD datatypes to the C++ datatypes and even create mappings<br>for custom datatypes.<br>In my schema I have integers and floats, besides strings, but they all<br>mapped to strings (more exactly to Erlang lists of ASCII codes :).<br>
Some C++ XML Data Binding products even handle enums (which is much bigger<br>problem in C++ than in Erlang - no atoms). For example I can map XSD<br>datatype xs:date to my custom class CDate and just provide two methods:<br>
fromString and toString.<br>If erlsom will map at least standard XSD datatypes to standard Erlang<br>datatypes it will be also usefull.<br><br>Thanks for the usefull tool.<br>Zvi<br><br><br><br>Willem de Jong-2 wrote:<br>
><br>> Hi,<br>><br>> Similar to what Bertil suggested for Xmerl, you can achieve this in Erlsom<br>> by adding a clause<br>><br>> "windows-1252" -> 'iso-8859-1'; %% note: this is actually introducing a<br>
> bug<br>><br>> %% in order to work around a problem!<br>><br>> to the case statement in encoding_type() in erlsom_lib.erl.<br>><br>> I would be interested to know why you think it will be necessary to<br>
> replace<br>> it by a C++ port. It seems to me that it will be complicating things<br>> considerably. What are the requirements that make this necessary? What<br>> properties should an Erlang XML parser have?<br>
><br>> Regards,<br>> Willem<br>><br>><br>> On 1/7/08, Zvi <<a href="mailto:exta7@walla.com">exta7@walla.com</a>> wrote:<br>>><br>>><br>>> XML generated by closed-source 3rd party Windows server (if it was<br>
>> generated<br>>> by me, then it was encoded in utf-8).<br>>> I asking here questions from Erlang domain, not the obvious & ugly common<br>>> sence solutions, like reading the entire file into memory, changing the<br>
>> encoding string and only then feeding it into xmerl. (the problem only<br>>> that<br>>> this XML can be quite big, like 0.5 MB and more).<br>>> Maybe xmerl has some option for forcing encoding, other than specified in<br>
>> the <?xml?> PI?<br>>> Maybe there is some other XML parser like erlsom or expat driver, which<br>>> supports windows-1252 encoding?<br>>> Anyway I using xmerl just for prototyping, the long term solution will be<br>
>> to<br>>> write C++ port, which will be doing all the XML processing and return<br>>> Erlang<br>>> terms in either text or binary form, which can be read either by<br>>> file:consult or binary_to_term on the Erlang side.<br>
>><br>>> ZVi<br>>><br>>><br>>> Christian S wrote:<br>>> ><br>>> > Why not ask yourself how to change your xml so it says iso-8859-1 as<br>>> you<br>>> > say<br>
>> > it should be doing?<br>>> ><br>>> > <a href="http://en.wikipedia.org/wiki/Garbage_In,_Garbage_Out" target="_blank">http://en.wikipedia.org/wiki/Garbage_In,_Garbage_Out</a><br>>> ><br>
>> > On Jan 7, 2008 5:22 PM, Zvi <<a href="mailto:exta7@walla.com">exta7@walla.com</a>> wrote:<br>>> >><br>>> >> Bertil,<br>>> >><br>>> >> thanks for the reply.<br>
>> >> Actually the charcter set used is always latin-1, but for some reason<br>>> 3rd<br>>> >> party software call it windows-1252 . So if you can tell me, what I<br>>> >> should<br>
>> >> change in xmerl, so it will threat windows-1252 as Latin-1 .<br>>> > _______________________________________________<br>>> > erlang-questions mailing list<br>>> > <a href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a><br>
>> > <a href="http://www.erlang.org/mailman/listinfo/erlang-questions" target="_blank">http://www.erlang.org/mailman/listinfo/erlang-questions</a><br>>> ><br>>> ><br>>><br>>> --<br>>> View this message in context:<br>
>> <a href="http://www.nabble.com/Exception-in-xmerl%2C-when-pasing-XML-with-non-UTF8-character-set-tp14588326p14674437.html" target="_blank">http://www.nabble.com/Exception-in-xmerl%2C-when-pasing-XML-with-non-UTF8-character-set-tp14588326p14674437.html</a><br>
>> Sent from the Erlang Questions mailing list archive at <a href="http://nabble.com/" target="_blank">Nabble.com</a>.<br>>><br>>> _______________________________________________<br>>> erlang-questions mailing list<br>
>> <a href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a><br>>> <a href="http://www.erlang.org/mailman/listinfo/erlang-questions" target="_blank">http://www.erlang.org/mailman/listinfo/erlang-questions</a><br>
>><br>><br>> _______________________________________________<br>> erlang-questions mailing list<br>> <a href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a><br>> <a href="http://www.erlang.org/mailman/listinfo/erlang-questions" target="_blank">http://www.erlang.org/mailman/listinfo/erlang-questions</a><br>
><br><font color="#888888"><br>--<br>View this message in context: <a href="http://www.nabble.com/Exception-in-xmerl%2C-when-pasing-XML-with-non-UTF8-character-set-tp14588326p15325643.html" target="_blank">http://www.nabble.com/Exception-in-xmerl%2C-when-pasing-XML-with-non-UTF8-character-set-tp14588326p15325643.html</a><br>
Sent from the Erlang Questions mailing list archive at <a href="http://nabble.com/" target="_blank">Nabble.com</a>.<br><br>_______________________________________________<br>erlang-questions mailing list<br><a href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a><br>
<a href="http://www.erlang.org/mailman/listinfo/erlang-questions" target="_blank">http://www.erlang.org/mailman/listinfo/erlang-questions</a><br></font></blockquote></div><br>