Language change proposal

Richard A. O'Keefe ok@REDACTED
Wed Nov 5 02:46:35 CET 2003


Eric Merritt <cyberlync@REDACTED> replied:
	Sure the IBM machines support ununicodebut at the
	cost of doubling the size required to store your
	character based data.

This claim is quite untrue.

First off, for the people who really REALLY need Unicode,
they were going to be using 16 bits per character anyway.
Their storage costs don't go up at all.  As I believe I've
mentioned, IBM have supported "DBCS" (Double-Byte Character
Sets) for decades.

Second, in addition to UTF-8, which is good for ASCII, there is
Unicode Technical Report 6, which describes a compressed storage
format for Unicode which can handle Latin 1 with *no* expansion,
several other 8-bit schemes with 1 byte of overhead, and CJK
strings also with 1 byte of overhead, no matter what the length
of the string.

Typically what you do is store text in some compressed form on
disc, unpack it if and only if you are going to do some processing,
and then repack on the way out.

	390s and 400s are not dead architectures
	by any ststretchf the imagination.
	
Someone who knows that the current 64-bit "360" architecture is
called z/Architecture clearly *knows* that; as does someone who
has read the z/Architecture Principles of Operation closely enough
to know about the Unicode support instructions.




More information about the erlang-questions mailing list