[erlang-questions] byte() vs. char() use in documentation
Masklinn
masklinn@REDACTED
Thu May 5 16:36:56 CEST 2011
On 2011-05-05, at 16:12 , David Mercer wrote:
> In the past few days, various people wrote:
>
>> [Various stuff debating Unicode, characters, glyphs, codepoints, code
> units, bits, bytes, strings, iolists, etc. etc.]
>
> I think most programmers are content treating each Unicode codepoint as a
> "character," regardless of whether that is strictly correct or not. It is
> the unit that is strung together to make strings. A list of Unicode
> codepoints seems the reasonable canonical way of representing strings.
> Thus:
>
> char() :: 0..16#10ffff
> string() :: [char()]
It's the only way, but you can not manipulate a unicode string as a list
because it's *broken*. Sure, you don't realize it if you're an
english-speaking developer working only with english speakers. But that
does not make it not-broken.
And what "most developers" are content with has never been very high
praises. You'd think a dweller of the Erlang mailing list would be the
first to know it: most programmers are also content using threads and
locks, regardless of whether that's strictly correct or not.
More information about the erlang-questions
mailing list