[erlang-questions] byte() vs. char() use in documentation
Anthony Shipman
als@REDACTED
Thu May 5 18:44:44 CEST 2011
On Fri, 6 May 2011 12:36:56 am Masklinn wrote:
> > char() :: 0..16#10ffff
> > string() :: [char()]
>
> It's the only way, but you can not manipulate a unicode string as a list
> because it's *broken*. Sure, you don't realize it if you're an
> english-speaking developer working only with english speakers. But that
> does not make it not-broken.
>
> And what "most developers" are content with has never been very high
> praises. You'd think a dweller of the Erlang mailing list would be the
> first to know it: most programmers are also content using threads and
> locks, regardless of whether that's strictly correct or not.
I imagine an API providing for:
iterating over the string returning a sequence of bytes (e.g. UTF8);
iterating over the string returning a sequence of code points;
iterating over the string returning a sequence of normalised composite
characters each perhaps in the form of a binary.
The input to the iterators could be a deep list but what would the parts be?
We could decide that an integer in the list is a code point and a binary is a
UTF8 sequence. Would it be saner to require that the binary decode to whole
composite characters? There could be a variety of functions for different
ways of interpreting the input.
--
Anthony Shipman Mamas don't let your babies
als@REDACTED grow up to be outsourced.
More information about the erlang-questions
mailing list