[eeps] EEP 35 "Binary string modules"

Richard O'Keefe <>
Wed Nov 24 04:10:19 CET 2010

On 24/11/2010, at 3:47 PM, Kenji Rikitake wrote:

> Richard:
> Accepting overlong sequences has been well-known to cause cross-site scripting
> and unwanted sequence injection
> (e.g., ASCII 0x21 -> 0xC0 0xA1, ".", extensively used for path characters).

But that's precisely the "bad magic" I was talking about.
You have to
(1) ensure that the decoding process cannot overflow buffers
(2) check for bad magic *after* decoding.

> At least for encoding to UTF-8 0x21 should be 0x21 and must not be 0xc0 0xa1.

I mentioned IGOR.  Yes, output should be strict.

There are other illegal sequences as well.  For example, UTF-8
quite clearly says that characters outside the BMP should be
transmitted as *single* multibyte sequences, not first converted
to a surrogate pair and then transmitted as *two* multibyte
sequences.  But that's what Java's "UTF-8" does (or did).

More information about the eeps mailing list