[erlang-questions] correct terminology for referring to strings

CGS cgsmcmlxxv@REDACTED
Thu Aug 2 15:46:06 CEST 2012


Hi Joe,

Regarding the clarity, you can see from the length of this thread how clear
your definition is. :)

Regarding the correctness, your definition is a bit tricky (arguable if
taking into account the difference in between Unicode code points and
Unicode encoding schemes) in my opinion (non-expert opinion, though).
That's because even if using UTF-8 encoding scheme, for example, Erlang
knows nothing about the correlation in between the elements of the list,
so, the sequence can be interpreted as code points in Latin-1 region even
if those code points may make no real sense in Latin-1 when replaced with
the indexed characters (especially in the region 128 - 255). For clarity,
your famous "a∞b" in Unicode code points is [97,8734,98] (this format may
break the code) while in UTF-8 encoding scheme is reading [97,226,136,158,98]
(Erlang compiler has no idea that the sequence [226,136,158] should be
built back to 8734 before passing it back to the environment, so, strange
symbols may appear if the environment interprets the integers as Unicode
code points - which usually does). When UTF-8 support will be available in
Erlang, I suppose the string will be accepted internally also as Unicode
code points for the range from U+0080 - U+1FFFFF, but until then the
accepted integers represent the disconnected UTF-8 encoding scheme sequence
of bytes. It is still the user's job to transform them back in Unicode code
points for the environment to display correctly the symbols (e.g.,
io:format("~ts~n",[[97,8734,98]]) will reproduce the correct string in an
UTF-8 environment).

This is my 2c opinion (I hope I offended no expert).

CGS


On Tue, Jul 31, 2012 at 11:24 AM, Joe Armstrong <erlang@REDACTED> wrote:

> I'm working on a 2'nd edition of my book, and have got to strings :-)
> Strings confuse everybody, including me so I have a few questions:
>
> To start with Erlang doesn't have strings - it has lists (not strings)
> and it has string literals.
>
> I want to define a string - is this correct:
>
> << A "string" is a list of integers where the integers
>       represent Unicode codepoints. >>
>
> Questions:
>     Is the sentence inside << .. >> using the correct terminology?
>     If not what should it say?
>
>     Is the sentence inside << ... >> widely understood, do you think this
>     would confuse a lot of people?
>
>     Is the phrase "string literal" widely understood?
>
>
> Cheers
>
> /Joe
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20120802/b4089b77/attachment.htm>


More information about the erlang-questions mailing list