[eeps] JSON
Masklinn
masklinn@REDACTED
Tue Jul 5 09:40:42 CEST 2011
On 2011-07-05, at 09:11 , Paul Davis wrote:
>>> I'm not sure how you got here. The wording is quite clear that we are
>>> within the limits of RFC 4627 to accept and generate duplicate keys.
>>
>> The wording is quite clear that we are within the limits of RFC 4627
>> to *accept* duplicate keys, yes.
>>
>> That we are within the limits of RFCS 4627 to *generate* duplicate
>> keys, in a narrow, pedantic, and practically insignificant way, yes.
>> But the plain fact of the matter as things stand in the actual world,
>> if you generate a JSON object with a repeated key you have *NO* idea
>> what the receiver will do with it. It might take the first binding.
>> It might take the last binding. It might take both but act in some
>> other ways as if it had done something else. It might raise an
>> exception. It might launch competition-buster missiles at your site.
> You're making the argument that there's a difference between parsing
> and generating. Minus the occasional offhand references the RFC makes
> to implementation details, what in the spec suggests that an
> implementation is bound to do different things in either direction?
I believe it's a pretty well accepted principle of software
engineering, usually referred to as "Postel's Law" or "The Robustness
Principle" (RFC 760, 791, 793), that protocol implementations should
try their best to interpret what is given to them, but be as strict
as possible in what they produce.
Furthermore, considering the bane that are silent undefined behaviors
in C and C++ (and what you're advocating is basically UB), being as
restrictive as possible in the serialized output is by far the best
course of action, because it's the least surprising and the one which
will yield most compatibility with consumers.
> Not only can Erlang not know what the receiver will do with repeated
> keys, no one else can know what Erlang will do.
Right, so Erlang should do as little interpretation as possible and
strive for the most compatible output possible. Multidicts are not
widely spread data structures, and apart from that it means the
interpretation of repeated keys is — as Richard noted repeatedly —
entirely implementation-dependent and not something which should be
risked by Erlang's standard library.
>
>>> Arguing that Crockford wasn't thinking about squirrels and was
>>> thinking unique keys doesn't provide a solid argument for adopting
>>> this particular constraint. There may be other arguments to include
>>> it, but right now I don't see a clear winner amongst any of the
>>> possible behaviors that have been discussed other than supporting all
>>> of them and making it configurable.
>> Which has already been proposed. Twice.
> I reckon cause its reasonable. But why not just let people plug in
> their own JSON parser?
Time constraints I would expect. Creating a JSON parser (and serializer)
is simple enough, creating a good API for a pluggable parser (and
serializer) less so.
>>>> I note that Javascript and Python accept several things that are
>>>> not legal according to the JSON RFC. That's fine. They are allowed
>>>> to support extensions.
>>>>
>>>> I note also that Python's json module is perfectly happy to GENERATE
>>>> illegal JSON:
>>>> >>> json.dumps(12)
>>>> '12'
>>>> not legal JSON
>>>> >>> json.dumps([1.2e500])
>>>> '[Infinity]'
>>>> not legal JSON
>>>>
>>>
>>> You're missing a key qualifier.
>>>
>>> json.dumps(12) is an illegal JSON text but is a valid JSON value. Do
>>> you want a parser that can't parser all JSON values?
>>
>> Yes. Of b----y course! That's like saying "You say you want a
>> C compiler. So why are you whining that the compiler accepts
>> 'errno++;' as a legal program? Statements are C, aren't they?"
>>
>> I want a JSON parser that parses JSON values *AS PARTS OF JSON texts*
>> and only as parts of JSON texts. Or more precisely, if there is also
>> a parser for arbitrary JSON values, I want that clearly distinguished
>> from the parser for JSON texts.
>>
> Why would you want this? A JSON text *is* a JSON value by definition.
But not all JSON values are JSON texts.
That happens to be one of the things which is actually well-defined
in the JSON grammar: a json text is exactly this
JSON-text = object | array
Not Number, not String, not null, not true and not false.
> Rejecting a JSON value that isn't a JSON text is just silly. The only
> sane explanation for why JSON text even exists is to facilitate the
> "encoding detection" fairy tale that was also "specified" in 4627.
No, json texts are the top-level elements of the JSON grammar.
A JSON text is basically a JSON document. Not all JSON values are
JSON documents.
>> This *is* a weakness of EEP 18. I completely failed to make that
>> distinction clear, and for that I apologise. Clearly a revision is
>> needed, but I don't have time to do that today.
>>
>> But if you read what I wrote before again, you'll see that I was not
>> criticising Python for *accepting* JSON values that are not JSON
>> texts but for *generating* them without warning.
>
> No, I understood that quite plainly. I was specifically pointing out
> that "12" is a valid JSON value and there's no reason other than the
> RFC 4627's author thinking about squirrels in trees that we shouldn't
> produce valid JSON values.
The role of JSON parsers is to parse JSON documents, not JSON values.
That's like saying an XML parser should accept "3" because "3" is a valid
attribute value. It's nonsense.
>>>> It is OK to ACCEPT such extensions; it is not OK to GENERATE them.
>>
>>>> Tell you what. Why not fish around in some CouchDB stores and see just
>>>> how many repeated keys there are?
>>> Probably very few. But seeing as the spec allows this behavior, who cares?
>> There is at least no doubt that the spec strongly discourages such
>> behaviour. And if you don't care, why the heck are you arguing about it?
> Strongly discourages, as in specifically allows.
Those terms are not incompatible, you can allow something and still
discourage it. For instance, hammering nails in your hands is generally
strongly discouraged, but not forbidden.
That is the exact meaning of the word "SHOULD" in RFCs:
> SHOULD This word, or the adjective "RECOMMENDED", mean that there
may exist valid reasons in particular circumstances to ignore a
particular item, but the full implications must be understood and
carefully weighed before choosing a different course.
More information about the eeps
mailing list