[erlang-questions] byte() vs. char() use in documentation

Richard O'Keefe ok@REDACTED
Tue May 3 04:23:55 CEST 2011


On 3/05/2011, at 7:43 AM, James Churchman wrote:
> strings can be of utf8 utf16 or utf32,

No.  The model for strings is "one list element = one unicode character",
and both UTF-8 and UTF-16 violate that.

A list of ASCII code-points is both a (Unicode) string and an iolist.

Of course, nothing stops you holding an abstract string as a list of
octets using UTF-8 (or for that matter, UTF-EBCDIC) or as a list of
16-bit units using UTF-16.  It's just that if you do so, what you
have doesn't count as an Erlang string any more (outside ASCII).

> also there does seem to be a needed distinction between char() and byte() as they are not the same at all, but the documentation is wrong as at the moment iolists can infact only contain byte() not char()

yes.





More information about the erlang-questions mailing list