Strings (was: Re: are Mnesia tables immutable?)
Thu Jun 29 08:42:05 CEST 2006
Richard A. O'Keefe wrote:
> I _did_ think that people would actually _look_ at SCSU before attacking
> it. How naive of me.
I did. I just don't see much advantage to the added annoyance.
"Switch to Unicode mode for uncompressible text.
SCSU does not provide for window definitions for the main Han and Hangul
character ranges, which are too large for effective use of dynamic
windows. The Unicode mode should also be used for large scripts using
supplementary code points."
So, the only language this benefits is basically Japanese. And, even
then, the true benfit is suspect.
If you look at the Japanese example, the difference is 178 bytes vs. 232
bytes. That's not a great compression ratio given the highly regular
code points and the sample text is highly biased toward
Kana(compressible) rather than Kanji(uncompressible).
This is the standard problem with trying to "compress" text.
Compressing text at the character level almost always loses. Even a
crummy LZW would pack 30hex into a single bit.
More information about the erlang-questions