[eeps] EEP 35 "Binary string modules"
Tue Nov 23 13:37:08 CET 2010
Some thoughts on EEP35:
* Usage of the UTF-8 (also RFC3629) in the "utf-8" encoded binaries must
be explicitly addressed in the EEP. Just using the word "Unicode"
does not sufficiently address the details, because in the current
implementation of Erlang, the lists representing character strings use
the UTF-8 *character numbers*, while the binaries use encoded UTF-8
This may affect EEP10 also, because it does not specifically mention
the usage of UTF-8 character number (max 10ffff#16 as in RFC3629) in
the Erlang lists representing character strings.
* Issues of overlong encoding (RFC3629 Section 3) must be explicitly
addressed in the EEP also.
From RFC3629 Section 3:
"Implementations of the decoding algorithm above MUST protect against
decoding invalid sequences. For instance, a naive implementation may
decode the overlong UTF-8 sequence C0 80 into the character U+0000,
or the surrogate pair ED A1 8C ED BE B4 into U+233B4. Decoding
invalid sequences may have security consequences or cause other
problems. See Security Considerations (Section 10) below."
* BOM (Byte Order Mark) issues should also be addressed. I suggest
Erlang/OTP should follow the suggested use as represented in RFC3629
More information about the eeps