Strings (was: Re: are Mnesia tables immutable?)
Andrew Lentvorski
bsder@REDACTED
Wed Jun 28 22:54:59 CEST 2006
Romain Lenglet wrote:
> The most efficient is still most often to use an official 8-bit
> encoding for strings. E.g. for Thai, TIS-620 is the most
> efficient, for Japanese, ISO-2022 (or others) is the most
> efficient, etc.
Really? ISO-2022? How does that beat UTF-16?
IIRC, UTF-16 manages to account for all of the Joyou Kanji as well as
kana in two bytes. Given that the Kana account for close to 90 entries
off the top, that only leaves the upper 128 bytes for Kanji.
That isn't much.
In addition, Japanese mixes Kanji, Kana, and Roman characters fairly
fluidly on the web.
I find it very difficult to believe that any "byte"-based encoding beats
UTF-16 by very much for any of the languages which use Kanji.
-a
More information about the erlang-questions
mailing list