Language change proposal

Michael Hobbs michael@REDACTED
Tue Nov 4 17:22:19 CET 2003


Joachim Durchholz said:
> Michael Hobbs wrote:
>> That line seems to imply that if an entity contains an encoding
>> declaration, then the whole entity must be encoded with that encoding.
>> This presents a chicken-or-egg problem in that how is an XML processor
>> to process an encoding declaration before it knows what the encoding
>> is?
>
> The first byte of an entity is always a specific character (probably "<"
>  for XML).
> Assuming the entity is correct, the XML processor can infer at least a
> first estimate of what encoding was used, and later check it against the
>  encoding declarations.

Okay, before you wrote this, I hadn't realized that every character
encoding (besides UTF-16 and EBCDIC) is a superset of ASCII. I had
ass-u-me-d that there are some character encodings that have a wildly
different binary representation, like ASCII vs. EBCDIC. After doing some
searching though, I have discovered that every character encoding that I
could find uses the same standard Latin letters and symbols for the bytes
between 0x20 - 0x7F.

The world is a little less chaotic that I had thought,
- Michael Hobbs






More information about the erlang-questions mailing list