Language change proposal

Tue Nov 4 19:05:27 CET 2003

Michael Hobbs wrote:

> Joachim Durchholz said:
> 
>>Michael Hobbs wrote:
>>
>>>That line seems to imply that if an entity contains an encoding
>>>declaration, then the whole entity must be encoded with that encoding.
>>>This presents a chicken-or-egg problem in that how is an XML processor
>>>to process an encoding declaration before it knows what the encoding
>>>is?
>>
>>The first byte of an entity is always a specific character (probably "<"
>> for XML).
>>Assuming the entity is correct, the XML processor can infer at least a
>>first estimate of what encoding was used, and later check it against the
>> encoding declarations.
> 
> 
> Okay, before you wrote this, I hadn't realized that every character
> encoding (besides UTF-16 and EBCDIC) is a superset of ASCII.

Oh, there /are/ character sets that vary wildly. The Leibniz Computer 
Center in Munich had several CDC computers, which sported non-standard 
word sizes (48-bit words), non-standard character sets (6-bit, A-Z are 
codes 1-32, 0-9 are 33-44), and a raw computing power that exceeded the 
best IBM machines by a factor of ten, and stayed the fastest machine on 
the market until about 1960 (when it was outdone by its own chief 
engineer who had founded his own company, Cray Research *g*).

You can find this and other CDC-related character sets at 
http://www.informatik.uni-hamburg.de/RZ/software/gnu/utilities/recode_9.html 
if you're interested :-)
Actually, the "recode" tool still understand these character sets, 
supposedly because recode originated on a CDC :-)

These encodings are more of historical interest than anything else, 
though. I'm pretty sure that only few machines with truly non-ASCII 
non-EBCDIC encodings exist, and that even fewer would call for an Erlang 
port...

Regards,
Jo