[eeps] JSON

Wed Jul 6 04:03:23 CEST 2011

On Tue, Jul 5, 2011 at 7:59 PM, Richard O'Keefe <ok@REDACTED> wrote:
>
> On 5/07/2011, at 7:11 PM, Paul Davis wrote:
>>> You haven't answered the question.  You accused me of "twisting the spec".
>>> What spec, where, and twisted how?
>>
>> Obviously RFC 4627. You're arguing that repeated keys and unqouted
>> keys are equivalent violations of that RFC which is not true.
>
> It is also a total fabrication that I am arguing or ever have
> argued any such thing.
>
> Killfile time.
>
>

Quoting your earlier mail:

> Why strain out the gnat of unquoted keys and swallow the camel of
> duplicate keys?  If you want ambiguous illegal JSON quietly accepted,
> why not want unambiguous illegal JSON quietly extended too?

You are equating duplicate keys and unquoted keys by referring to both
of them as illegal JSON. Duplicate keys are expressly allowed by RFC
4627 where as unquoted keys are specifically forbidden. These are not
equivalent cases and should not be treated as such.

The spec very clearly states "The names within an object SHOULD be
unique." which means that if we have reason and consider the issue
carefully, we are free to allow repeated keys. The reasons I have
cited previously are that the Erlang data structure that most aptly
fits an object is a proplist (universally accepted among all JSON
parsers I have seen) which explicitly allows repeated keys. It also
happens that the other major JSON implementations will accept repeated
keys as input. What exactly happens to repeated keys in
implementations (ie, Python, Ruby, JavaScript's JSON.parse) that don't
support them is undefined behavior. Pragmatically speaking, none of
the other implementations complain and repeated keys are relatively
rare in practice.

On the other hand, the effort required to detect and react to repeated
keys comes with a cost. To detect a repeated key we have to write
extra code to do so. Because Erlang has no built-in hash/dict/set data
type, there is no C API available to do this check directly (recall
that this EEP is for BIFs (now NIFs which is from where I concern
myself)). This leads to the fact that we must pick one of two options:
naively scan the object members testing for equality which comes with
performance concerns, or implement some sort of set membership
function that will require all of the testing and validation that goes
along with it.

On the face of it, its hard to make a choice between being
conservative and reacting to repeated keys, or instead treating them
as valid JSON. But if we look to the wider community its fairly
obvious that *not* making the extra checks is working. Of all the
Erlang JSON parsers I know of, only one rejects repeated keys (doing
so with the naive (N^2)/2 loop). And further more, this was because
the author was intentionally trying to conform to EEP0018. In all
other cases I've read the code for, this check did not exist.

On the other hand, the spec very clearly states that keys are to be
JSON strings which are defined as being enclosed by the ASCII
character: ". To come to the conclusion that an Erlang JSON parser
should accept unquoted keys we must make use of RFC 4627's statement
"A JSON parser MAY accept non-JSON forms or extensions." So here we
must make a convincing argument that adding support for this non-JSON
is more beneficial than rejecting it.

The argument in favor of accepting non-JSON is that it's ubiquitous.
Here we can look to both the existing Erlang and even non-Erlang
parsers to see if there are common JSON parsers that accept some
superset that we should investigate supporting. Unfortunately, the
only parser I am aware of that attempts to do this is once again, the
one Erlang parser that is attempting to conform to EEP0018. Most
Erlang parsers, Ruby, Python, and (in my opinion the closest thing we
have to a reference), JavaScript's own JSON.parse reject unquoted
keys.

The reason that unquoted keys often comes up, is that JavaScript
allows unquoted keys, and hence, there must be a lot of people trying
to paste JavaScript into something that will be consumed by a JSON
parser. I can only comment that if it is indeed so common, why does
everyone reject it?

Now, assume for a moment that we were to make the decision that it's
worth implementing unquoted JSON keys. Now we get to make up a scheme
for trying to parse them. The parser I know of that does this keeps it
simple by looking for strings of characters that are not interrupted
by control characters which is a reasonable approach. However there's
also another logical extension which is to adopt the actual JavaScript
parser EBNF. This of course seems quite reasonable. We're interested
in allowing valid JavaScript object keys. Unfortunately, the
JavaScript EBNF for unquoted keys is not simple. It defines acceptable
strings of characters as a series of characters that are members of a
given set of Unicode code point classes.

This becomes hard enough that there was an extended discussion of how
to even generate an Erlang file that manages the lookup of "code point
-> class" relationships. And after winding conversation with the
author he realized that no other major library supports this, promptly
abandoned the Unciode EBNF, and started considering removal of all
support for unquoted keys.

In the end you may not even read this which is upsetting because you
have very valuable input that I appreciate. However, if you feel that
my input is without value then I wish you the best.