Strings (was: Re: are Mnesia tables immutable?)

Andrew Lentvorski bsder@REDACTED
Wed Jun 28 22:54:59 CEST 2006


Romain Lenglet wrote:

> The most efficient is still most often to use an official 8-bit 
> encoding for strings. E.g. for Thai, TIS-620 is the most 
> efficient, for Japanese, ISO-2022 (or others) is the most 
> efficient, etc.

Really?  ISO-2022?  How does that beat UTF-16?

IIRC, UTF-16 manages to account for all of the Joyou Kanji as well as 
kana in two bytes.   Given that the Kana account for close to 90 entries 
off the top, that only leaves the upper 128 bytes for Kanji.

That isn't much.

In addition, Japanese mixes Kanji, Kana, and Roman characters fairly 
fluidly on the web.

I find it very difficult to believe that any "byte"-based encoding beats 
UTF-16 by very much for any of the languages which use Kanji.

-a



More information about the erlang-questions mailing list