Language change proposal
Joachim Durchholz
joachim.durchholz@REDACTED
Tue Nov 4 19:05:27 CET 2003
Michael Hobbs wrote:
> Joachim Durchholz said:
>
>>Michael Hobbs wrote:
>>
>>>That line seems to imply that if an entity contains an encoding
>>>declaration, then the whole entity must be encoded with that encoding.
>>>This presents a chicken-or-egg problem in that how is an XML processor
>>>to process an encoding declaration before it knows what the encoding
>>>is?
>>
>>The first byte of an entity is always a specific character (probably "<"
>> for XML).
>>Assuming the entity is correct, the XML processor can infer at least a
>>first estimate of what encoding was used, and later check it against the
>> encoding declarations.
>
>
> Okay, before you wrote this, I hadn't realized that every character
> encoding (besides UTF-16 and EBCDIC) is a superset of ASCII.
Oh, there /are/ character sets that vary wildly. The Leibniz Computer
Center in Munich had several CDC computers, which sported non-standard
word sizes (48-bit words), non-standard character sets (6-bit, A-Z are
codes 1-32, 0-9 are 33-44), and a raw computing power that exceeded the
best IBM machines by a factor of ten, and stayed the fastest machine on
the market until about 1960 (when it was outdone by its own chief
engineer who had founded his own company, Cray Research *g*).
You can find this and other CDC-related character sets at
http://www.informatik.uni-hamburg.de/RZ/software/gnu/utilities/recode_9.html
if you're interested :-)
Actually, the "recode" tool still understand these character sets,
supposedly because recode originated on a CDC :-)
These encodings are more of historical interest than anything else,
though. I'm pretty sure that only few machines with truly non-ASCII
non-EBCDIC encodings exist, and that even fewer would call for an Erlang
port...
Regards,
Jo
More information about the erlang-questions
mailing list