[erlang-questions] byte() vs. char() use in documentation

Mon May 9 05:45:41 CEST 2011

On 6/05/2011, at 6:49 PM, Masklinn wrote:

> On 2011-05-06, at 07:34 , Richard O'Keefe wrote:
>>> To come back to the point, we have to define what we mean with the Erlang
>>> char() type:
>>> - if it's an individual character then it can naturally be represented as
>>> a single integer for its code point
>>> - if it's a logical character then it has to be a list of integers
>> Since we cannot know what a logical character is, and since we need *some*
>> representation of Unicode code points, I recommend that char()=code point.
> Why pick code points rather than grapheme cluster?

For so many many reasons I haven't the patience to list them all.
A.  Because it is the simplest thing that could possibly work.
B.  Because grapheme clusters aren't any better a fit to the user's
    perception of a "character" than code points.
C.  Because Unicode properties are defined for characters, not
    grapheme clusters.
D.  Because there is a finite and not *hopelessly* large set of
    code points, but the set of grapheme clusters is unbounded
...

> 
>>> In any case, the language must provide specific functions to work on strings
>>> and characters. For instance, a logical character comparison must take into
>>> account the Unicode equivalence.
>> What do you mean "THE" equivalence?\
> I would guess he means what he linked: unicode equivalence (as per unicode),
> likely compatible (in order to equate "ﬃ" with "ffi" for instance)

Yes, but there are *several* notions of equivalence in the Unicode
standard, which was my point.  Which of them is "THE" equivalence?
(The one with arguably the strongest claim does NOT deal with
compatibility mappings.)