<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    Even then the reversal is not guaranteed.<br>

    <br>

    The character 'é' can be represented, for example, in two ways:<br>

    <br>

    é =

    <meta http-equiv="Content-Type" content="text/html;

      charset=ISO-8859-1">

    <title></title>

    U+00E9<br>

    e+ ́ = U+0065 + U+0301<br>

    <br>

    The first one allows a representation as a single codepoint, but the

    second one is a 'grapheme cluster', a sequence of codepoints

    representing a single grapheme, a single unit of text. Grapheme

    clusters can be larger than two elements, and as far as I know, you

    cannot reverse them. The cluster should ideally remain in the same

    order in the reversed string:<br>

    <br>

    2> io:format("~ts~n",[[16#0065,16#0301]]).<br>

    é<br>

    ok<br>

    3> io:format("~ts~n",[[16#0301,16#0065]]). <br>

     ́e<br>

    ok<br>

    <br>

    This is fine with your plan -- if I force a single code point

    representation, this is a non-issue.<br>

    <br>

    The tricky thing is that if I enter a string containing " ́e" in my

    module and later reverse it, I will get "é" and not "e ́" as a final

    result. What was initially [16#0301,16#0065] gets reversed into

    [16#0065,16#0301], which is not the same as the correct visual

    representation " ́e" (represented as ([16#0065, $ , 16#0301]), with

    an implicit space in there)<br>

    <br>

     It works one way (starting the right direction then reversing), but

    without being very careful, it can break when going the other way

    (starting with two non-combined code points that get assembled in

    the same cluster when reversed).<br>

    <br>

    Just changing to single code point representations isn't enough to

    make sure nothing is broken.<br>

    <br>

    <div class="moz-cite-prefix">On 12-07-31 10:04 AM, Richard Carlsson

      wrote:<br>

    </div>

    <blockquote cite="mid:5017E5D5.2030508@gmail.com" type="cite">No,

      you're confusing Unicode (a sequence of code points) with specific

      encodings such as UTF-8 and UTF-16. The first is downwards

      compatible with Latin-1: the values from 128 to 255 are the same.

      In UTF-8 they're not. At runtime, Erlang's strings are just plain

      sequences of Unicode code points (you can think of it as UTF-32 if

      you like). Whether the source code is encoded in UTF-8 or Latin-1

      or any other encoding is irrelevant as long as the compiler knows

      how to transform the input to the single-codepoint representation.

      <br>

      <br>

      For example, reversing a Unicode string is a bad idea anyway

      because it could contain combining characters, and reversing the

      order of the codepoints in that case will create an illegal

      string. But an expression like lists:reverse("a∞b") will be

      working on the list [97, 8734, 98] (once the compiler has been

      extended to accept other encodings than Latin-1), not the list

      [97,226,136,158,98], so it will produce the intended "b∞a". This

      string might then become encoded as UTF-8 on its way to your

      terminal, but that's another story.

      <br>

      <br>

          /Richard

      <br>

      <br>

      _______________________________________________

      <br>

      erlang-questions mailing list

      <br>

      <a class="moz-txt-link-abbreviated" href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a>

      <br>

      <a class="moz-txt-link-freetext" href="http://erlang.org/mailman/listinfo/erlang-questions">http://erlang.org/mailman/listinfo/erlang-questions</a>

      <br>

    </blockquote>

    <br>

  </body>

</html>