<p dir="ltr">Hello,<br>

As far last releases are 'full' Utf8  , it should probably be the default output for xmerl, but Erlang core developpers may have other reasons not to do so.<br>

Regards <br>

</p>

<div class="gmail_quote">Le 15 août 2015 09:30, Hynek Vychodil <vychodil.hynek@gmail.com> a écrit :<br type='attribution'><blockquote class="quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">The same result is in R18 and it it correct result. Letter é has unicode 233 see <a href="http://unicode-table.com/en/#00E9">http://unicode-table.com/en/#00E9</a></div><div><br /><div class="elided-text">On Fri, Aug 14, 2015 at 6:23 PM, Éric Pailleau <span dir="ltr"><<a href="mailto:eric.pailleau@wanadoo.fr">eric.pailleau@wanadoo.fr</a>></span> wrote:<br /><blockquote style="margin:0 0 0 0.8ex;border-left:1px #ccc solid;padding-left:1ex">Hello,<br />


Please precise what Erlang release you are using. Utf8 came lately in Erlang.<br />


Regards<br />


<div><div><br />


Le 14 août 2015 13:49, Alexander Turkin <<a href="mailto:snowwlex@gmail.com">snowwlex@gmail.com</a>> a écrit :<br />


><br />


> Dear list,<br />


><br />


><br />


> I've got a problem with unicode & xmerl library.<br />


><br />


> Input data for xmerl is utf-8 encoded xml, and what I've got as the result is encoded latin1. But I need utf8!<br />


><br />


><br />


> EXAMPLES<br />


><br />


> Body = <<"<?xml version=\"1.0\" encoding=\"UTF-8\"?><response><value>René</value></response>"/utf8>>.<br />


><br />


> (for the sake of portability here is term_to_binary(Body): <br />


><br />


> <<131,109,0,0,0,79,60,63,120,109,108,32,118,101,114,115,<br />


>   105,111,110,61,34,49,46,48,34,32,101,110,99,111,100,105,<br />


>   110,103,61,34,85,84,70,45,56,34,63,62,60,114,101,115,<br />


>   112,111,110,115,101,62,60,118,97,108,117,101,62,82,101,<br />


>   110,195,169,60,47,118,97,108,117,101,62,60,47,114,101,<br />


>   115,112,111,110,115,101,62>><br />


><br />


><br />


><br />


> (1):<br />


><br />


> When I do <br />


><br />


> xmerl_scan:string(binary_to_list(Body)).<br />


><br />


> it returns <br />


><br />


> {#xmlElement{name = response,expanded_name = response,<br />


>              nsinfo = [],<br />


>              namespace = #xmlNamespace{default = [],nodes = []},<br />


>              parents = [],pos = 1,attributes = [],<br />


>              content = [#xmlElement{name = value,expanded_name = value,<br />


>                                     nsinfo = [],<br />


>                                     namespace = #xmlNamespace{default = [],nodes = []},<br />


>                                     parents = [{response,1}],<br />


>                                     pos = 1,attributes = [],<br />


>                                     content = [#xmlText{parents = [{value,1},{response,1}],<br />


>                                                         pos = 1,language = [],<br />


><br />


><br />


>                                                         value = "René",<br />


><br />


><br />


>                                                         type = text}],<br />


>                                     language = [],xmlbase = "/Users/aturkin/ws/",<br />


>                                     elementdef = undeclared}],<br />


>              language = [],xmlbase = "/Users/aturkin/ws/",<br />


>              elementdef = undeclared},<br />


>  []}<br />


><br />


><br />


> So, note there is `value = "René"` string, and it uses [233] symbol, which is latin1.<br />


><br />


><br />


><br />


><br />


> (2):<br />


><br />


> xmerl_scan:string(xmerl_ucs:to_utf8(binary_to_list(Body)))<br />


><br />


> returns <br />


><br />


> {#xmlElement{name = response,expanded_name = response,<br />


>              nsinfo = [],<br />


>              namespace = #xmlNamespace{default = [],nodes = []},<br />


>              parents = [],pos = 1,attributes = [],<br />


>              content = [#xmlElement{name = value,expanded_name = value,<br />


>                                     nsinfo = [],<br />


>                                     namespace = #xmlNamespace{default = [],nodes = []},<br />


>                                     parents = [{response,1}],<br />


>                                     pos = 1,attributes = [],<br />


>                                     content = [#xmlText{parents = [{value,1},{response,1}],<br />


>                                                         pos = 1,language = [],<br />


><br />


><br />


>                                                         value = "RenÃ©",<br />


><br />


><br />


>                                                         type = text}],<br />


>                                     language = [],xmlbase = "/Users/aturkin/ws/",<br />


>                                     elementdef = undeclared}],<br />


>              language = [],xmlbase = "/Users/aturkin/ws/",<br />


>              elementdef = undeclared},<br />


>  []}<br />


><br />


> Now `value = "RenÃ©"`, so 2 bytes are used to code this symbol, and this is utf-8.<br />


><br />


> So in (2) I get what I need, but why I need to force that conversion for xmerl? <br />


><br />


><br />


><br />


><br />


> QUESTIONS<br />


><br />


> 1. I don't understand why xmerl_scan allows you to set input encoding, but it looks like there is no way to set output encoding. Is there any way to make xmerl_scan to return utf8 instead of latin1?<br />


><br />


> 2. How is that happen, that in (1) it does conversion utf-8 -> latin1, and in (2) it's utf-8?<br />


><br />


><br />


><br />


><br />


> --<br />


> Best Regards,<br />


> Alex Turkin<br />


</div></div>_______________________________________________<br />


erlang-questions mailing list<br />


<a href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a><br />


<a href="http://erlang.org/mailman/listinfo/erlang-questions">http://erlang.org/mailman/listinfo/erlang-questions</a><br />


</blockquote></div><br /></div>


</blockquote></div>