[eeps] JSON

Tue Jul 5 09:11:58 CEST 2011

On Tue, Jul 5, 2011 at 12:15 AM, Richard O'Keefe <ok@REDACTED> wrote:
>
> On 5/07/2011, at 2:09 PM, Paul Davis wrote:
>>>
>>> What spec and where?  RFC 4627 says that keys SHOULD be unique.
>>> The author of the RFC says that it SHOULD read that they MUST be unique.
>>>
>>
>> I really can't say that I care what he was thinking when he wrote it.
>> He could've been thinking about squirrels playing in the tree.
>
> You haven't answered the question.  You accused me of "twisting the spec".
> What spec, where, and twisted how?

Obviously RFC 4627. You're arguing that repeated keys and unqouted
keys are equivalent violations of that RFC which is not true. Repeated
keys are specifically allowed, while unquoted keys are something that
requires people to borrow something from a random language that
happens to share two letters of the acronym commonly used to refer to
RFC 4627.

In other words, unquoted keys should be as likely to be implemented as
Lisp S-Expression extensions.

>>>
>>
>> I'm not sure how you got here. The wording is quite clear that we are
>> within the limits of RFC 4627 to accept and generate duplicate keys.
>
> The wording is quite clear that we are within the limits of RFC 4627
> to *accept* duplicate keys, yes.
>
> That we are within the limits of RFCS 4627 to *generate* duplicate
> keys, in a narrow, pedantic, and practically insignificant way, yes.
> But the plain fact of the matter as things stand in the actual world,
> if you generate a JSON object with a repeated key you have *NO* idea
> what the receiver will do with it.  It might take the first binding.
> It might take the last binding.  It might take both but act in some
> other ways as if it had done something else.  It might raise an
> exception.  It might launch competition-buster missiles at your site.
>

You're making the argument that there's a difference between parsing
and generating. Minus the occasional offhand references the RFC makes
to implementation details, what in the spec suggests that an
implementation is bound to do different things in either direction?

Not only can Erlang not know what the receiver will do with repeated
keys, no one else can know what Erlang will do.

>> Arguing that Crockford wasn't thinking about squirrels and was
>> thinking unique keys doesn't provide a solid argument for adopting
>> this particular constraint. There may be other arguments to include
>> it, but right now I don't see a clear winner amongst any of the
>> possible behaviors that have been discussed other than supporting all
>> of them and making it configurable.
>
> Which has already been proposed.  Twice.
>

I reckon cause its reasonable. But why not just let people plug in
their own JSON parser?

>>
>>>
>>> I note that Javascript and Python accept several things that are
>>> not legal according to the JSON RFC.  That's fine.  They are allowed
>>> to support extensions.
>>>
>>> I note also that Python's json module is perfectly happy to GENERATE
>>> illegal JSON:
>>>        >>> json.dumps(12)
>>>        '12'
>>> not legal JSON
>>>        >>> json.dumps([1.2e500])
>>>        '[Infinity]'
>>> not legal JSON
>>>
>>
>> You're missing a key qualifier.
>>
>> json.dumps(12) is an illegal JSON text but is a valid JSON value. Do
>> you want a parser that can't parser all JSON values?
>
> Yes.  Of b----y course!  That's like saying "You say you want a
> C compiler.  So why are you whining that the compiler accepts
> 'errno++;' as a legal program?  Statements are C, aren't they?"
>
> I want a JSON parser that parses JSON values *AS PARTS OF JSON texts*
> and only as parts of JSON texts.  Or more precisely, if there is also
> a parser for arbitrary JSON values, I want that clearly distinguished
> from the parser for JSON texts.
>

Why would you want this? A JSON text *is* a JSON value by definition.
Rejecting a JSON value that isn't a JSON text is just silly. The only
sane explanation for why JSON text even exists is to facilitate the
"encoding detection" fairy tale that was also "specified" in 4627.

> This *is* a weakness of EEP 18.  I completely failed to make that
> distinction clear, and for that I apologise.  Clearly a revision is
> needed, but I don't have time to do that today.
>
> But if you read what I wrote before again, you'll see that I was not
> criticising Python for *accepting* JSON values that are not JSON
> texts but for *generating* them without warning.

No, I understood that quite plainly. I was specifically pointing out
that "12" is a valid JSON value and there's no reason other than the
RFC 4627's author thinking about squirrels in trees that we shouldn't
produce valid JSON values.

>
>>> It is OK to ACCEPT such extensions; it is not OK to GENERATE them.
>
>>> Tell you what.  Why not fish around in some CouchDB stores and see just
>>> how many repeated keys there are?
>>>
>>
>> Probably very few. But seeing as the spec allows this behavior, who cares?
>
> There is at least no doubt that the spec strongly discourages such
> behaviour.  And if you don't care, why the heck are you arguing about it?
>

Strongly discourages, as in specifically allows.

> Do I have to remind readers that existing JSON parsers do *different*
> things with repeated keys?  So that JSON objects with repeated keys are
> *not* safe for inter-operation?  So that silently allowing them does
> nobody any kindness?

Define safe. I haven't seen an implementation that barfs on them. Some
do weird things. There's no consensus. The spec is wonky. Why concern
ourselves with it?

>
>>> And they will say "Then why in the name of common human decency didn't
>>> you tell me LAST year when I entered the data with repeated keys?  How
>>> am I supposed to fix it NOW?"
>>>
>>
>> "Do it the same way you did it last year," would be the obvious
>> solution I suppose.
>
> The situation we were discussing is one where doing it the same way you
> did it last year is precisely what your hypothetical victims *are* doing
> and being punished for.
>

I apologize for under specifying the hypothetical. I was considering
that they were sending the same object with duplicate keys but keys in
a different sort order.

>>
>> I prefaced this very specifically with "in my experience" because I
>> was trying to say "I thought this, it is not something I would find
>> terribly surprising for other people to think, perhaps we should
>> change it in an attempt to avoid such confusion."
>
> But the *only* thing in your experience that supports this wild flight
> of generalisation is term_to_binary/1, which _does_ accept all terms.
> Other "_to_" functions in Erlang just do not provide grounds for a
> belief that "_to_" functions are were or ever should be total.

Perhaps. I did spend 10 minutes Googling for Erlang JSON parsers that
used term_to_json/json_to_term instead of encode/decode. I found one.
It doesn't appear to be used and the last commit was three years ago.
All of the other major libraries appear to have standardized on
encode/decode. FWIW.

>>
>>
>> I definitely agree that fixing the atom table would be even better. If
>> that were to come to pass I would be whole heartedly in favor of keys
>> to atoms, but until then I'm still in favor of binaries because of the
>> oppurtunnity for errors, especially the ones that implicitly depend on
>> system state.
>
> I wonder if you noticed the fine print in EEP 18 which makes
> turning keys to binaries the default?

Its like the fine print that says "by default this oven is safe, but
if you turn the knob 45 degrees clockwise it my result in nuclear
meltdown". Ok, perhaps that's a bit much. More like "by default this
oven is safe, but if you turn the knob 45 degrees clockwise something
might break and it might depend on the phase of the moon and who
controls the Senate and what you put into the stove, you'll never know
because you'll never be able to reproduce it".