[eeps] JSON

Paul Davis paul.joseph.davis@REDACTED
Sun Jul 3 06:43:42 CEST 2011


A continuation email to get Alisdair in on this thread.

On Thu, Jun 30, 2011 at 9:28 PM, Paul Davis <paul.joseph.davis@REDACTED> wrote:
> Thanks for the thread continuation email, Joe. I wasn't subscribed
> when this thread started.
>
> Having implemented a couple versions of functions that are similar to
> the definition in EEP0018 yet having diverged from it greatly I
> thought I might share some thoughts on the EEP as its defined and as
> it might be used.
>
> Firstly, I would like to say that the EEP does a pretty good job of
> identifying areas where RFC 4627 has some issues. The various details
> are fairly well presented on the edge cases. I'll make a few notes in
> order of increasing importance.
>
> Firstly, the statement in 4627, "The names within an object SHOULD be
> unique." is referenced and the EEP falls on the side of rejecting JSON
> that has repeated names. Personally I would prefer to accept repeated
> names because that allows us to capture the lowest common denominator
> JSON implementation. The other important bit of this is that other
> languages don't complain on repeated keys. JavaScript, Ruby, and
> Python all just end up using the last defined value.
>
> The EEP discusses the difference between an integer and a float in
> multiple places. The language seemed to revolve around being able to
> detect the difference between a float and an integer. I know that it's
> technically true that Erlang can detect the difference and JSON can't,
> but when working with JSON in Erlang its never been an issue in my
> experience. For instance, consider a quick snippet:
>
> 1> case 1.0 of 1 -> yay; 1.0 -> nay end.
> nay
> 2> case 1.0 of A when A == 1 -> yay; A when A == 1.0 -> nay end.
> yay
>
> You can't pattern match on the term, but the guard is fairly trivial.
> I would argue that "if it looks like an integer, its probably an
> integer". From the point of view of writing a function that expected
> input from a JSON decoder, I would consider requiring the pattern
> matched value of being a specific type to be a buggy implementation
> (barring highly specific constraints to the contrary). In the end, I
> would basically say to relax the spec here. If someone needs to
> guarantee the type of a number they should do it in client code.
>
> The second point on numbers in the EEP is the part about numeric
> ranges. Obviously we'd like to support as much as we possibly can. The
> EEP rightly points out that ranges are not specified (explicitly
> mentioned as unspecified) and that an implementation is free to do as
> it wants. This is a second vote in favor of "if it looks like an
> integer, treat it like one" because of the case for bignums. If a
> client needs to support some mutation of the output of the parser
> they're more than welcome to write a simple function to do the
> conversion.
>
> The most controversial aspect of the EEP in terms of something not
> caused by RFC 4627 is the representation of an empty object. Alisdair
> Sullivan (the author of jsx) and I have spent many hours arguing over
> this exact detail. If you find us bored on IRC at the same time, we
> generally agree to start this argument out of good nature just to
> amuse ourselves as a continuing joke. But the bottom line is that
> we're both familiar with the issue, the ups and downs of what we
> prefer and still cannot come to agreement on it. Mandating a specific
> one is going to upset some people and please others. Making it
> optional is a fine compromise, but then we'll argue over the default.
> I obviously can't offer much in the way of advice on this issue except
> to say that I am right and the one true object representation is {[]}.
> XD
>
> Now we get to the crazy.
>
> Everything in the EEP about encodings should be removed and replaced
> with "we only support well formed UTF-8." I'm aware that's a bit
> controversial so let me try and explain without too much of a rant
> against 4627.
>
> First, exhibit A, quoting RFC 4627:
>
>    "JSON text SHALL be encoded in Unicode."
>
> This is plainly a misunderstanding of what Unicode is. You can't
> "encode in Unicode". Unicode is not an encoding. Unicode is a large
> set of intertwined specifications on how to represent written language
> in a computer. It's about as sane as saying "JSON text SHALL be
> encoded in Salted Sardines".
>
> Now its possible that we consider the greater context and try to parse
> this along the lines of "JSON text SHALL be represented as an encoding
> of Unicode characters" and we can start to make a bit more headway.
> The RFC goes into a bit of an explanation of how you can detect the
> encoding based on the first four bytes of the JSON text. Although, as
> pointed out in the EEP this is kinda nutty because it is
> unintentionally saying "JSON text SHALL be encoded in UTF-8, UTF-16BE,
> UTF-16LE, UTF-32BE and UTF-32LE". Which is a considerably different
> proposition.
>
> The bottom line is that JSON has no reliable inband method for
> determining character encoding. The only way to make it work sanely is
> to declare one and stick to it. Fortunately, most implementations seem
> to follow along only supporting UTF-8. jsx is the only library I know
> that attempts to support the spec point by point on this (disregarding
> languages that have a notion of Unicode strings as separate from
> normal strings).
>
> Now if that's not crazy enough we get to the \uHHHH escape sequence
> definition. RFC 4627 says that strings may contain a \u escape
> followed by four hex characters. The RFC has some language about the
> Basic Multilingual Plan and an example of surrogate pairs but fails to
> cover the various ways this might break. For instance what happens if
> one of a surrogate pair is present without an appropriate mate?
>
> I was told specifically by Douglass Crockford in a thread on
> es5-discuss that implementations are expected to treat these escapes
> as bytes. There's no easy way to say it, this is just nuts. This
> requirement means that a conforming JSON parser must allow string data
> into Erlang terms that would raise exceptions when passed through the
> functions in the unicode module (this has bitten us in CouchDB
> multiple times).
>
> </rant>
>
> Anyway, that's a long way of saying that JSON is weirder than it looks
> at first glance. A lot of these issues are the cause of patches that
> have been applied to mochijson2.
>
>
> Back to more concrete EEP related things:
>
> First, the current EEP makes a few comments about being a
> specification of how data would be converted as it passes through
> functions to convert to and from JSON. I'm a firm believer in this
> approach. As it mentions later on, there are two fairly fundamentally
> different API's to make the conversion: value based and event based.
> My primary concern is value based, but there are very obvious
> scenarios where an event based parser might be preferable (CouchDB
> uses both). Out of pragmatism I might say that BIF's should be value
> based because the event based opens up a lot more surface area to spec
> out without too much prior art that I'm aware of but that's not a
> major issue.
>
> Second, the EEP uses the function names term_to_json and json_to_term.
> I would comment that these infer that the conversion is lossless.
> There some discussion in the EEP admitting that it isn't and there's
> no way to make it so. I would suggest changing them to either
> json:encode and json:decode, or perhaps erlang:encode_json and
> erlang:decode_json as the community sees fit. Its minor but it seems
> to not seem to suggest the identity conversions are guaranteed thus
> removing the need for a lot of text on why its not lossless.
>
> Other places in the spec talk about JSON strings compared to binaries
> and lists. I'm pretty sure the EEP rules out converting from JSON *to*
> an Erlang string. This is good because other languages do not conflate
> [102, 111, 111] with "foo" and allowing a conversion there would lend
> itself to very confusing conversations with non-Erlangers.
>
> The discussions on when to convert atoms to JSON strings and JSON
> strings to atoms should probably be removed. In my experience, it is
> best if atoms can be converted to JSON strings because it allows
> people to write json:encode({[{foo, bar}]}). On the other hand, the
> ability to convert keys to atoms may look fine at first glance but in
> reality can cause lots of funny bugs.
>
> For instance, consider pattern matching a term returned from
> json:decode. One of the proposals is to convert keys to atoms when
> they can be converted. This means your return value might have some
> keys as binaries and some as atoms. If you're writing a function to do
> some mutation to the returned term that touches the key, it is quite
> possible that you have to special case both term and atom return
> types. The other obvious argument (which is detailed in the EEP) is
> that it's an attack vector by malicious clients. It's possible to send
> many JSON texts with many different keys that eventually kills the
> Erlang VM. I'm all for "let it fail" but "here is how to kill me" is
> probably not good.
>
> Another fun one that I learned from a not-JSON example is the
> to_existing_atom option. This could lead to different results based on
> when the JSON is parsed and what code is loaded. Basically, the output
> of the json:decode function would depend on what the current atom
> table is. When you get into code reloading and other areas, this can
> get a bit wonky. While it may prevent the attack vector, it introduces
> the possibility of very hard to track down bugs when people are
> iterating over a proplist only to discover that on one VM a key is a
> binary and not the other.
>
> If you say "check for both" we get to the conclusion "always use
> binaries" which is most sane. If users need to extract keys to be
> later used as atoms, they can do that quite easily.
>
>
> With all that said, I would vote that Erlang doesn't adopt an
> "official" implementation for at least the time being. JSON is a very
> enchantingly simple specification, but when push comes to shove its
> terribly complex to nail down into any sort of consensus. I like the
> libraries I've written. Alisdair's JSX is extremely well written. Yet
> we have wildly different ideas of "how things should be" in relation
> to JSON.
>
> In closing, the original motivation for this thread was Robert Virding
> musing, "The important thing is that it is *there* and that it is a
> good representation, otherwise we might end up with something bad just
> because that is all there is." I would argue that there are a number
> of quality JSON implementations and choosing one now would be to end
> innovation on the matter.
>
> Thanks,
> Paul J Davis
>
>
>
>
> On Thu, Jun 30, 2011 at 6:05 PM, Joe Williams <joe@REDACTED> wrote:
>> This is for Paul Davis.
>>
>> --
>> Name: Joseph A. Williams
>> Email: joe@REDACTED
>> Blog: http://www.joeandmotorboat.com/
>> Twitter: http://twitter.com/williamsjoe
>>
>> On Thursday, June 30, 2011 at 2:51 PM, Loïc Hoguin wrote:
>>
>> On 06/30/2011 11:39 PM, Robert Virding wrote:
>>
>> At the Erlang Factory in London after the EEPs run-through we had a
>> small very informal discussion. As a result of that and after a
>> discussion on erlang-questions I think it is very important that we
>> decide something about eep-18 and JSON. I think we should propose a
>> standard representation and write an OTP module which implements
>> encoding/decoding this. The first version doesn't have to be that
>> fast, mochijson2 which is being used apparently isn't fast, and it can
>> be improved later both with better erlang and NIFs. The important
>> thing is that it is *there* and that it is a good representation,
>> otherwise we might end up with something bad just because that is all
>> there is.
>>
>> jsx is already very good. It implements the EEP and is faster and more
>> convenient to use than mochijson2 IMHO.
>>
>> See https://github.com/talentdeficit/jsx
>>
>> --
>> Loïc Hoguin
>> Dev:Extend
>> _______________________________________________
>> eeps mailing list
>> eeps@REDACTED
>> http://erlang.org/mailman/listinfo/eeps
>>
>>
>> _______________________________________________
>> eeps mailing list
>> eeps@REDACTED
>> http://erlang.org/mailman/listinfo/eeps
>>
>>
>



More information about the eeps mailing list