[erlang-questions] byte() vs. char() use in documentation

Richard O'Keefe <>
Mon May 9 05:45:41 CEST 2011


On 6/05/2011, at 6:49 PM, Masklinn wrote:

> On 2011-05-06, at 07:34 , Richard O'Keefe wrote:
>>> To come back to the point, we have to define what we mean with the Erlang
>>> char() type:
>>> - if it's an individual character then it can naturally be represented as
>>> a single integer for its code point
>>> - if it's a logical character then it has to be a list of integers
>> Since we cannot know what a logical character is, and since we need *some*
>> representation of Unicode code points, I recommend that char()=code point.
> Why pick code points rather than grapheme cluster?

For so many many reasons I haven't the patience to list them all.
A.  Because it is the simplest thing that could possibly work.
B.  Because grapheme clusters aren't any better a fit to the user's
    perception of a "character" than code points.
C.  Because Unicode properties are defined for characters, not
    grapheme clusters.
D.  Because there is a finite and not *hopelessly* large set of
    code points, but the set of grapheme clusters is unbounded
...

> 
>>> In any case, the language must provide specific functions to work on strings
>>> and characters. For instance, a logical character comparison must take into
>>> account the Unicode equivalence.
>> What do you mean "THE" equivalence?\
> I would guess he means what he linked: unicode equivalence (as per unicode),
> likely compatible (in order to equate "ffi" with "ffi" for instance)

Yes, but there are *several* notions of equivalence in the Unicode
standard, which was my point.  Which of them is "THE" equivalence?
(The one with arguably the strongest claim does NOT deal with
compatibility mappings.)





More information about the erlang-questions mailing list