[erlang-questions] byte() vs. char() use in documentation

Steve Davis steven.charles.davis@REDACTED
Mon May 9 14:53:38 CEST 2011


The problem with "string"s and "char"s is that they are not full
specifications. It is not enough to say that strings are "just lists
of integers" which has been a convenient shorthand. It could fully and
equivalently be argued that *any data* is just a list of integers. The
shorthand used historically in Erlang is that strings are list of
integers that *implicitly represent a Latin-1 mapping to glyphs*.

The issue, when other character sets are "allowed" becomes: which
mapping does the value of char() represent?

To say char() :: 0..16#10ffff is a type may allow for an Unicode
mapping, but does not specify it. The value could be mapped to any
glyph, since there is no longer the implicit constraint that the
char() has a Latin 1 mapping, and so any user-defined choice of
mapping becomes possible. This makes it impossible to tell which
mapping is in use for that particular instance.

People say that Erlang is "bad at string handling", but rather, I
believe, Erlang (or rather the functional nature of Erlang) is showing
that strings are simply a bad idea. Most languages choose/impose their
own internal implicit mapping of this crazy "type" we call "string".
e.g. Java says that all strings are Unicode mapped.

If you decide to drop the constraint that chars and strings are no
longer implicitly glyph-mapped by Latin 1, you must specify which
encoding is in force in that context. I would argue therefore that
string() and char() are therefore no longer useful or meaningful
types, and that if you wish to define text representations in a type
specification, you had better provide not just the data but a
specification of the glyph map that is in force, and so the decoding
of that data can be determined by a function, a situation which the
string() and char() notation imply must be possible.

/s

On Apr 28, 11:26 am, Kostis Sagonas <kos...@REDACTED> wrote:
> In the Erlang documentation, the language of types and specs makes a
> clear distinction between the following two types:
>
>      byte() :: 0..255
>      char() :: 0..16#10ffff



More information about the erlang-questions mailing list