[eeps] JSON

Fri Jul 1 07:12:23 CEST 2011

As the author of EEP 18 I'd like to respond to this.

On 1/07/2011, at 1:28 PM, Paul Davis wrote:

> 
> Firstly, the statement in 4627, "The names within an object SHOULD be
> unique." is referenced and the EEP falls on the side of rejecting JSON
> that has repeated names. Personally I would prefer to accept repeated
> names because that allows us to capture the lowest common denominator
> JSON implementation.

The problem here is that there are lots of possibilities:
 - take the first binding for a name
 - take the last binding for a name
 - combine the bindings using a user-specified function
 - do something weirder
 - return a list of {Name,Value} pairs in left to right order
   [not an option if the user wants a dictionary]
 - return a list of {Name,Value} pairs in right to left order
 - return a list of {Name,Value} pairs using some other order
 ...

> The other important bit of this is that other
> languages don't complain on repeated keys. JavaScript, Ruby, and
> Python all just end up using the last defined value.

In the case of JavaScript, this really does seem to be a property
of the *language*:
js> var y = {a: 1, a: 2, b: 3, b: 4};
js> y.a;
2
js> y.b;
4
In the case of Ruby and Python, it's a property of a library, not
of the language.  The 'json' module that comes with python does
this.  But it also does things that I regard as undesirable:
	json.loads('{a:1}')
dies horribly with a somewhat misleading error message.  Nothing
stops there being other JSON parsers for Python (and there are),
and nothing stops them making another choice.

People using JSON to ship around data that originated as Erlang
property lists would surely be expecting the _first_ value for a
repeated key to be taken.

I can see two ways around this.

(1) Follow the herd, and quietly take the last value for a repeated key.
    I have to admit that the JSON parser I wrote in Smalltalk does
    exactly this.
(2) Add another option {repeats,first|last|error} with default last.
    This would be my preference.
> 
> The EEP discusses the difference between an integer and a float in
> multiple places. The language seemed to revolve around being able to
> detect the difference between a float and an integer. I know that it's
> technically true that Erlang can detect the difference and JSON can't,
> but when working with JSON in Erlang it's never been an issue in my
> experience.

JSON can't detect anything.  The basic problem is that >Javascript<
has only one type for all numbers.

The question here is that if you see [1,1.0] in JSON you do not know
whether the sender intended them to be the same or different.  Indeed,
if you see [123456712345671234567, 123456712345671234568] you do not
know whether the sender intended the numbers to be the same or different.
It's even the case that if you see [1.0] you do not know whether the
sender intended that number to be usable as a subscript.

If you are just holding onto the data and then giving it back, no real
problem.  The problem arises if you do calculations.

Some possibilities include
 - always read JSON numbers as IEEE doubles; that way you certainly
   get the same kind and value of number as Javascript would get
 - read things that look like integers as integers, things that look
   like floats as floats
   -- this is what the json module that comes with Python does;
      it is also what my Smalltalk library does.
 - read things that have integral values (whether written with an
   exponent or not, whether with a ".0" or not) as integers, things
   with a fractional part as floats

I agree completely that "read things that look like integers as integers
and things that look like floats as floats" seems to be the best
default.  It's just not the _only_ _sensible_ thing to offer.

> 
> The most controversial aspect of the EEP in terms of something not
> caused by RFC 4627 is the representation of an empty object.

The *right* answer of course is to use frames (Joe Armstrong's
"proper structs"), so it *should* be <{}>.

> Everything in the EEP about encodings should be removed and replaced
> with "we only support well formed UTF-8."

There are two possible situations.
(1) You are dealing with *text* data.  In that case, encoding or decoding
    is somebody else's problem.  A JSON writer needs to know what characters
    have to be escaped in the output, but that's it.
(2) You are dealing with *binary* data.  In that case, encoding or
    decoding is the JSON library's problem.

When the EEP was written, things were a little bit fuzzy.  It is now clear
that iolists are *binary* data, not text data.  There is now explicit
Unicode support elsewhere in Erlang, so that conversion to/from UTF-8 can
done in a separate step, if needed.  At any rate, UTF-8 is *required* as
the default.  Other options could be added later if anyone really cares.
> 
> Second, the EEP uses the function names term_to_json and json_to_term.
> I would comment that these infer that the conversion is lossless.

Apart from the fact that it should be "imply", not "infer",
I deny this.  They come from the Prolog naming tradition and
simply imply what the input is and what the output is and make
*no* claim about losslessness.  (If the conversions were lossless,
the "to_" part would not be present in the name.)

Note, for example, that Erlang already has binary_to_term/1,
which does NOT convert every possible binary to a term.

However, there is nothing wrong with json:encode and json:decode
as names either.

> For instance, consider pattern matching a term returned from
> json:decode. One of the proposals is to convert keys to atoms when
> they can be converted. This means your return value might have some
> keys as binaries and some as atoms.

And if you have an atom in a pattern, that key will be that atom.

> If you're writing a function to do
> some mutation to the returned term that touches the key, it is quite
> possible that you have to special case both term and atom return
> types. The other obvious argument (which is detailed in the EEP) is
> that it's an attack vector by malicious clients. It's possible to send
> many JSON texts with many different keys that eventually kills the
> Erlang VM. I'm all for "let it fail" but "here is how to kill me" is
> probably not good.

The fundamental problem here is the fixed atom table.
SWI Prolog hasn't had that problem for a long time.
The Logix implementation of Flat Concurrent Prolog faced it and fixed it.
It's a serious weakness in Erlang that can and should be fixed.