[erlang-questions] Unicode noncharacter inconsistency

Wed Aug 17 15:00:23 CEST 2011

On Wed, Jul 27, 2011 at 4:06 PM, Alisdair Sullivan
<alisdairsullivan@REDACTED> wrote:
> The binary Unicode 'switches' /utf8, /utf16, etc raise an error when asked to encode u+fffe or u+ffff (which are non characters, making this a defensible and perhaps sensible position), but not when asked to encode u+1fffe, u+1ffff, u+2fffe, ..., u+10ffff. I considered submitting a bug, but as these are ok for internal implementations and are just forbidden during interchange, it's arguable this does adhere to the spec.
>
> Is there some rationale behind this inconsistency, or is it simply an oversight?

An oversight.

> The reserved non characters u+fdd0 - u+fdef should also probably behave as u+fffe/ffff, as they are in the same category.

We think that it is better to allow u+fffe and u+ffff as noncharacters
could be useful for internal processing. This change will make the
implementation consistent with RFC3629. (Another advantage is
that the conversion will be slightly faster since there is one test
less to perform.)

We will probably make this change in the R15 release.

-- 
Björn Gustavsson, Erlang/OTP, Ericsson AB