Strings (was: Re: are Mnesia tables immutable?)

Richard A. O'Keefe ok@REDACTED
Thu Jun 29 08:49:39 CEST 2006


I suggested that people should look at SCSU before attacking it.

Andrew Lentvorski <bsder@REDACTED> wrote:
	So, the only language this benefits is basically Japanese.

It would be more accurate to say that the only language it DOESN'T
benefit is Japanese (and the Chinese languages).  Remember, the
main point of it is to get the *alphabetic* scripts (like the
Indic scripts used in India and much of Asia) down to one byte per
character.  And that's a 50% compression, well worth having.

Remember, I was not talking about *processing* characters in SCSU.
Processing characters in anything other than one-program-thingy-equals-
one-codepoint is pretty silly.  I was talking about SCSU *FOR USE IN
THE EXTERNAL TERM REPRESENTATION*, where the "annoyance" is confined
exclusively to external term representation encoders and decoders.
(Having written an SCSU encoder and decoder, I have to say that the
annoyance is minimal; it's not _that_ much harder than encoding or
decoding UTF-8, especially if you have to cope with not-really-UTF-8
generated by Java.)



More information about the erlang-questions mailing list