Language change proposal
Richard A. O'Keefe
ok@REDACTED
Wed Nov 5 02:46:35 CET 2003
Eric Merritt <cyberlync@REDACTED> replied:
Sure the IBM machines support ununicodebut at the
cost of doubling the size required to store your
character based data.
This claim is quite untrue.
First off, for the people who really REALLY need Unicode,
they were going to be using 16 bits per character anyway.
Their storage costs don't go up at all. As I believe I've
mentioned, IBM have supported "DBCS" (Double-Byte Character
Sets) for decades.
Second, in addition to UTF-8, which is good for ASCII, there is
Unicode Technical Report 6, which describes a compressed storage
format for Unicode which can handle Latin 1 with *no* expansion,
several other 8-bit schemes with 1 byte of overhead, and CJK
strings also with 1 byte of overhead, no matter what the length
of the string.
Typically what you do is store text in some compressed form on
disc, unpack it if and only if you are going to do some processing,
and then repack on the way out.
390s and 400s are not dead architectures
by any ststretchf the imagination.
Someone who knows that the current 64-bit "360" architecture is
called z/Architecture clearly *knows* that; as does someone who
has read the z/Architecture Principles of Operation closely enough
to know about the Unicode support instructions.
More information about the erlang-questions
mailing list