[erlang-questions] unicode:characters_to_list

Masklinn masklinn@REDACTED
Thu Mar 22 17:17:31 CET 2012


On 2012-03-22, at 16:48 , Michael Uvarov wrote:
> Also, in utf-8 each code point can be encoded using from 1 to 6 bytes.

1 to 4: Unicode is defined from 0 to 10FFFF, code-points beyond this
range are to be considered ill-formed. UTF-8 can encode U+10FFFF in 4
bytes (with room to spare), and its definition was restricted to the
same range as Unicode in RFC 3629 (the original definition did indeed
allow for encoding 31 bit over up to 6 bytes).



More information about the erlang-questions mailing list