[eeps] JSON

Tue Jul 5 10:12:22 CEST 2011

On Tue, Jul 5, 2011 at 3:40 AM, Masklinn <masklinn@REDACTED> wrote:
> On 2011-07-05, at 09:11 , Paul Davis wrote:
>>>> I'm not sure how you got here. The wording is quite clear that we are
>>>> within the limits of RFC 4627 to accept and generate duplicate keys.
>>>
>>> The wording is quite clear that we are within the limits of RFC 4627
>>> to *accept* duplicate keys, yes.
>>>
>>> That we are within the limits of RFCS 4627 to *generate* duplicate
>>> keys, in a narrow, pedantic, and practically insignificant way, yes.
>>> But the plain fact of the matter as things stand in the actual world,
>>> if you generate a JSON object with a repeated key you have *NO* idea
>>> what the receiver will do with it.  It might take the first binding.
>>> It might take the last binding.  It might take both but act in some
>>> other ways as if it had done something else.  It might raise an
>>> exception.  It might launch competition-buster missiles at your site.
>> You're making the argument that there's a difference between parsing
>> and generating. Minus the occasional offhand references the RFC makes
>> to implementation details, what in the spec suggests that an
>> implementation is bound to do different things in either direction?
> I believe it's a pretty well accepted principle of software
> engineering, usually referred to as "Postel's Law" or "The Robustness
> Principle" (RFC 760, 791, 793), that protocol implementations should
> try their best to interpret what is given to them, but be as strict
> as possible in what they produce.
>
> Furthermore, considering the bane that are silent undefined behaviors
> in C and C++ (and what you're advocating is basically UB), being as
> restrictive as possible in the serialized output is by far the best
> course of action, because it's the least surprising and the one which
> will yield most compatibility with consumers.
>

I would say its often quoted and leads to very bad ends. A bit of
Wikipedia out of curiosity even leads me to an RFC that supports the
position to avoid relying on Postel's Law aptly named "On the Design
of Application Protocols":

    http://tools.ietf.org/html/rfc3117#section-4.5

The antithesis of Postel's law is HTML. Anyone who's ever done web
development trying to make a wide range of browsers behave anywhere
near the same understands this pain.

Granted the obvious response is "but most implementations don't repeat
keys". To which I would respond that no they don't, but neither do
they fail in accepting that input and neither is it against the spec.

>> Not only can Erlang not know what the receiver will do with repeated
>> keys, no one else can know what Erlang will do.
> Right, so Erlang should do as little interpretation as possible and
> strive for the most compatible output possible. Multidicts are not
> widely spread data structures, and apart from that it means the
> interpretation of repeated keys is — as Richard noted repeatedly —
> entirely implementation-dependent and not something which should be
> risked by Erlang's standard library.

Here we're working under different definitions of "compatible". I'm
lean towards "receiver doesn't launch missiles." To me compatible says
"user gave X, I transformed to Y, most everyone I know accepts Y" thus
compatible.

>>
>>>> Arguing that Crockford wasn't thinking about squirrels and was
>>>> thinking unique keys doesn't provide a solid argument for adopting
>>>> this particular constraint. There may be other arguments to include
>>>> it, but right now I don't see a clear winner amongst any of the
>>>> possible behaviors that have been discussed other than supporting all
>>>> of them and making it configurable.
>>> Which has already been proposed.  Twice.
>> I reckon cause its reasonable. But why not just let people plug in
>> their own JSON parser?
> Time constraints I would expect. Creating a JSON parser (and serializer)
> is simple enough, creating a good API for a pluggable parser (and
> serializer) less so.
>
>>>>> I note that Javascript and Python accept several things that are
>>>>> not legal according to the JSON RFC.  That's fine.  They are allowed
>>>>> to support extensions.
>>>>>
>>>>> I note also that Python's json module is perfectly happy to GENERATE
>>>>> illegal JSON:
>>>>>        >>> json.dumps(12)
>>>>>        '12'
>>>>> not legal JSON
>>>>>        >>> json.dumps([1.2e500])
>>>>>        '[Infinity]'
>>>>> not legal JSON
>>>>>
>>>>
>>>> You're missing a key qualifier.
>>>>
>>>> json.dumps(12) is an illegal JSON text but is a valid JSON value. Do
>>>> you want a parser that can't parser all JSON values?
>>>
>>> Yes.  Of b----y course!  That's like saying "You say you want a
>>> C compiler.  So why are you whining that the compiler accepts
>>> 'errno++;' as a legal program?  Statements are C, aren't they?"
>>>
>>> I want a JSON parser that parses JSON values *AS PARTS OF JSON texts*
>>> and only as parts of JSON texts.  Or more precisely, if there is also
>>> a parser for arbitrary JSON values, I want that clearly distinguished
>>> from the parser for JSON texts.
>>>
>> Why would you want this? A JSON text *is* a JSON value by definition.
> But not all JSON values are JSON texts.
>
> That happens to be one of the things which is actually well-defined
> in the JSON grammar: a json text is exactly this
>
>    JSON-text = object | array
>
> Not Number, not String, not null, not true and not false.
>
>> Rejecting a JSON value that isn't a JSON text is just silly. The only
>> sane explanation for why JSON text even exists is to facilitate the
>> "encoding detection" fairy tale that was also "specified" in 4627.
> No, json texts are the top-level elements of the JSON grammar.
>
> A JSON text is basically a JSON document. Not all JSON values are
> JSON documents.

I understand that this is a common reading of RFC 4627. I'm
specifically stating that RFC 4627 is nutty in this distinction in an
attempt to "support encodings" with it's hair brained ideas for
introspecting byte streams.

>
>>> This *is* a weakness of EEP 18.  I completely failed to make that
>>> distinction clear, and for that I apologise.  Clearly a revision is
>>> needed, but I don't have time to do that today.
>>>
>>> But if you read what I wrote before again, you'll see that I was not
>>> criticising Python for *accepting* JSON values that are not JSON
>>> texts but for *generating* them without warning.
>>
>> No, I understood that quite plainly. I was specifically pointing out
>> that "12" is a valid JSON value and there's no reason other than the
>> RFC 4627's author thinking about squirrels in trees that we shouldn't
>> produce valid JSON values.
> The role of JSON parsers is to parse JSON documents, not JSON values.
>
> That's like saying an XML parser should accept "3" because "3" is a valid
> attribute value. It's nonsense.

I would say the proper XML analogy would be to omit the doctype. I'm
not saying that any arbitrary string of bytes is valid JSON. I'm just
saying that the JSON text state of the EBNF was part of a silly game
to do encodings.

>
>>>>> It is OK to ACCEPT such extensions; it is not OK to GENERATE them.
>>>
>>>>> Tell you what.  Why not fish around in some CouchDB stores and see just
>>>>> how many repeated keys there are?
>>>> Probably very few. But seeing as the spec allows this behavior, who cares?
>>> There is at least no doubt that the spec strongly discourages such
>>> behaviour.  And if you don't care, why the heck are you arguing about it?
>> Strongly discourages, as in specifically allows.
> Those terms are not incompatible, you can allow something and still
> discourage it. For instance, hammering nails in your hands is generally
> strongly discouraged, but not forbidden.
>
> That is the exact meaning of the word "SHOULD" in RFCs:
>
>> SHOULD   This word, or the adjective "RECOMMENDED", mean that there
>   may exist valid reasons in particular circumstances to ignore a
>   particular item, but the full implications must be understood and
>   carefully weighed before choosing a different course.
>
>

And to me "language has no builtin standard definition of object and
most of the time people use a list of two-tuples that naturally and
easily can handle repeated keys" represents a "valid reason in
particular circumstances to ignore a particular item" and that this is
an understood and carefully weighed point of view. Granted making
something a stdlib function means that the larger community has to
come to an agreement on what "understood and carefully weighed" mean.