[eeps] Revision of the JSON EEP
Richard A. O'Keefe
ok@REDACTED
Tue Jul 29 04:12:18 CEST 2008
On 29 Jul 2008, at 11:41 am, David-Sarah Hopwood wrote:
> Richard A. O'Keefe wrote:
>> I should have waited until I'd read Saturday's mail,
>> Sunday's mail, and Monday's mail as well. Sigh.
>> Here is the THIRD draft.
>>
>> ------------------------------------------------------------------------
>
> A number is converted to an Erlang float if
> - it contains a decimal point, or
> - it contains an exponent, or
> - the option {float,true} was passed.
> A JSON number that looks like an integer will be converted to
> an Erlang integer unless {float,true} was provided.
>
> This represents equivalent JavaScript numbers (for example, "1e1"
> and "10")
> as different Erlang terms, i.e. which Erlang term is used can depend
> on
> the JSON encoder. Also, it cannot round-trip "-0", as distinct from
> "0".
Where does it say anywhere that "1e1" and "10" are equivalent
***in JSON***? I defy anyone to find anything to that effect in
RFC 4627 section 2.4 "Numbers", or for that matter at www.json.org.
On the contrary, www.json.org says
"A number is very much like a C or Java number"
and I need hardly remind readers that
- 1e1 and 10 are not equivalent in C or Java
- -0 is identical to 0 in C and Java
It so happens that Javascript identifies 1e1 and 10,
for the simple reason that Javascript does not have integers,
only "numbers".
There is no rule that says other JSON partners cannot make
this distinction. It is perfectly OK for a JSON library in
some language (such as Java, say) to distinguish between
numbers that Javascript identifies. For round trips, all
that matters is that numbers that Javascript *distinguishes*
are not mapped to the same representation. So
10 -> 10 -> 10
1e1 -> 10.0 -> 10.0
and as far as Javascript is concerned, 10 and 10.0 are the same.
As for -0, I've just double-checked in js, and
js> -0 == 0;
true
js> -0 === 0;
true
js> -0;
0
js> (-0).toString();
0
So it seems that as far as Javascript is concerned, -0 and 0
are the same, or at any rate, that as far as Javascript is
concerned, it is accepted to print -0 as "0".
However, I do agree that in IEEE arithmetic, -0.0 is not the
same as 0.0, and it is imaginable that someone with an -0.0
might print it as -0. But we can handle that by requiring
-0 to be read as -0.0.
A general point is that www.json.org is very explicit:
JSON "is based on a subset of the JavaScript Programming
Language, Standard ECMA-262 3rd Edition - December 1999."
But it "is a text format that is completely language independent".
That is, the semantics of JSON are not tied to the semantics of
Javascript. (Which is fortunate, otherwise we'd never be able to
implement it in anything _but_ Javascript.) Javascript is not the
only language using JSON.
>
> This would be more deterministic:
>
> A number is converted to an Erlang integer if its mathematical
> value is exactly representable as an integer, and it is not
> negative zero, and the option {float,true} was not passed.
> (A negative zero is "-" followed by any representation of
> zero.)
> Otherwise, it is converted to the nearest representable Erlang
> float using IEEE 754 round-to-even.
It is NOT TOLERABLE to convert numbers with float syntax to integers,
because that would prevent round-tripping between Erlang and other
JSON partners (Lisp, Scheme, Prolog, Smalltalk, Python, &c).
For example, in Python,
>>> repr(0);
'0'
>>> repr(0.0);
'0.0'
If a Python program sends an Erlang program a number using JSON, we
want to be able to send the SAME number back.
Note also that converting "if its mathematical value is exactly
representable as an integer" means converting 1e308 to a bignum,
which is probably not intended.
>
>
> There's another round-trip issue besides the four listed in the
> Rationale:
> JSON strings can include Unicode escapes that cause the string not
> to be well-formed (for example, "\uDC00"). These will not round-trip,
> because they will not convert to UTF-8. I don't think this is
> necessarily
> much of a problem as long as it is documented.
Such JSON strings are not well formed JSON.
www.json.org is not explicit about this, but the RFC comes
close: \u may only be used to present "a character" that
"is in the Basic Multilingual Plane", and technically,
the surrogates are not characters. A surrogate MAY
appear in such a way, but only as one of a pair of
surrogates: "To escape an extended character that is not
in the Basic Multilingual Plane, the character is represented
as a twelve-character sequence, encoding the UTF-16 surrogate pair."
Any other occurrence of \uDC00, for example, is just as much an
error as an occurrence of hex FF (which is not legal in UTF-8).
More information about the eeps
mailing list