[eeps] JSON

Tue Jul 5 02:09:40 CEST 2011

I'm a little confused.
RFC 4627 says explicitly in section 4:

	An implementation may set limits on the range of numbers.

Considering that other JSON implementations I'm aware of (Javascript,
Python, R) limit themselves to IEEE doubles (or in the case of Python,
to bignums for things that look like integers), I don't see any need
for Erlang to handle floats any way other than as native Erlang floats,
and for an implementation in pure Erlang, I don't see any need to do
anything other than use list_to_integer/1 or list_to_float/1 as
appropriate.

> This is not what I meant. Your example is within the range of ieee754 doubles.
> Rounding should be expected within the range +/- 5.0e-324 to 1.7976...e308.
> Values outside of this are currently not representable in erlang.
> These values are representable as strings, however.
> In the interests of fidelity, these json numbers should not automatically result in errors.

You cannot represent JSON numbers as strings if you want to claim "fidelity".

>>> json.loads('1.2e500')
inf

The problem here is that we can't do that, because (at least in R14B), Erlang flatly
refuses to treat inf as a number.

1> 1.2e300*3.4e300.
** exception error: bad argument in an arithmetic expression
     in operator  */2
        called as 1.2e300 * 3.4e300

4> 1.2e500.
* 1: illegal float

4> list_to_float("1.2e500").
** exception error: bad argument
     in function  list_to_float/1
        called as list_to_float("1.2e500")

I repeat:

   An implementation may set limits on the range of numbers.

and from section 2.4:

   Numeric values that cannot be represented as sequences of digits
   (such as Infinity and NaN) are not permitted.

Now this is as muddled as you might expect the JSON RFC to be, but
the intent seems clear enough:  go outside the range of what
Javascript can handle as a finite number and it's no use whining
when you get the load of trouble you just asked for.

I do not think it is necessary for Erlang to go to heroic extremes
to accept arbitrarily bad JSON.

>>> Another possible approach I failed to mention is to replace invalid or problematic escape sequences with the unicode replacement codepoint, U+FFFD. This follows the JSON and Unicode spec recommendations, but loses potentially signifigant user data.
>>> 
>> 
>> Yeah, my point of view is from a database author, so my stance is
>> towards "give back what was given" as much as possible so I never
>> considered the replacement scheme there. Adding it as an option seems
>> sane, though I could see there being a couple corner cases with
>> combining characters (ie, two high byte combining characters in a row,
>> is that one or two U+FFFD insertions?).

If you want to '"give back what was given" as much as possible' then
you have no alternative but to store the actual byte stream verbatim.