[erlang-questions] Unicode noncharacter inconsistency

Alisdair Sullivan alisdairsullivan@REDACTED
Wed Jul 27 16:06:28 CEST 2011


The binary Unicode 'switches' /utf8, /utf16, etc raise an error when asked to encode u+fffe or u+ffff (which are non characters, making this a defensible and perhaps sensible position), but not when asked to encode u+1fffe, u+1ffff, u+2fffe, ..., u+10ffff. I considered submitting a bug, but as these are ok for internal implementations and are just forbidden during interchange, it's arguable this does adhere to the spec.

Is there some rationale behind this inconsistency, or is it simply an oversight?

The reserved non characters u+fdd0 - u+fdef should also probably behave as u+fffe/ffff, as they are in the same category.


More information about the erlang-questions mailing list