[erlang-questions] byte() vs. char() use in documentation

Anthony Shipman als@REDACTED
Thu May 5 18:44:44 CEST 2011


On Fri, 6 May 2011 12:36:56 am Masklinn wrote:
> >       char() :: 0..16#10ffff
> >       string() :: [char()]
>
> It's the only way, but you can not manipulate a unicode string as a list
> because it's *broken*. Sure, you don't realize it if you're an
> english-speaking developer working only with english speakers. But that
> does not make it not-broken.
>
> And what "most developers" are content with has never been very high
> praises. You'd think a dweller of the Erlang mailing list would be the
> first to know it: most programmers are also content using threads and
> locks, regardless of whether that's strictly correct or not.

I imagine an API providing for:
	iterating over the string returning a sequence of bytes (e.g. UTF8);

	iterating over the string returning a sequence of code points;

	iterating over the string returning a sequence of normalised composite
	characters each perhaps in the form of a binary.

The input to the iterators could be a deep list but what would the parts be? 
We could decide that an integer in the list is a code point and a binary is a 
UTF8 sequence. Would it be saner to require that the binary decode to whole 
composite characters?  There could be a variety of functions for different
ways of interpreting the input.

-- 
Anthony Shipman                    Mamas don't let your babies 
als@REDACTED                   grow up to be outsourced.



More information about the erlang-questions mailing list