[erlang-questions] byte() vs. char() use in documentation

Masklinn <>
Thu May 5 16:36:56 CEST 2011


On 2011-05-05, at 16:12 , David Mercer wrote:
> In the past few days, various people wrote:
> 
>> [Various stuff debating Unicode, characters, glyphs, codepoints, code
> units, bits, bytes, strings, iolists, etc. etc.]
> 
> I think most programmers are content treating each Unicode codepoint as a
> "character," regardless of whether that is strictly correct or not.  It is
> the unit that is strung together to make strings.  A list of Unicode
> codepoints seems the reasonable canonical way of representing strings.
> Thus:
> 
> 	char() :: 0..16#10ffff
> 	string() :: [char()]

It's the only way, but you can not manipulate a unicode string as a list
because it's *broken*. Sure, you don't realize it if you're an
english-speaking developer working only with english speakers. But that
does not make it not-broken.

And what "most developers" are content with has never been very high
praises. You'd think a dweller of the Erlang mailing list would be the
first to know it: most programmers are also content using threads and
locks, regardless of whether that's strictly correct or not.



More information about the erlang-questions mailing list