[eeps] EEP 35 "Binary string modules"
Richard O'Keefe
ok@REDACTED
Wed Nov 24 04:10:19 CET 2010
On 24/11/2010, at 3:47 PM, Kenji Rikitake wrote:
> Richard:
>
> Accepting overlong sequences has been well-known to cause cross-site scripting
> and unwanted sequence injection
> (e.g., ASCII 0x21 -> 0xC0 0xA1, ".", extensively used for path characters).
But that's precisely the "bad magic" I was talking about.
You have to
(1) ensure that the decoding process cannot overflow buffers
(2) check for bad magic *after* decoding.
> At least for encoding to UTF-8 0x21 should be 0x21 and must not be 0xc0 0xa1.
I mentioned IGOR. Yes, output should be strict.
There are other illegal sequences as well. For example, UTF-8
quite clearly says that characters outside the BMP should be
transmitted as *single* multibyte sequences, not first converted
to a surrogate pair and then transmitted as *two* multibyte
sequences. But that's what Java's "UTF-8" does (or did).
More information about the eeps
mailing list