[erlang-questions] Why EEP-0018 "JSON bifs" (and conforming libraries) are "wrong" about object encoding (i.e. `[{}]`)

Tue Aug 26 23:30:10 CEST 2014

    Today I've made a small attempt at finding a suitable JSON
encoding / decoding library that fit my needs.  (Until now I was using
`mochijson2` [1], which seems to be rather old...)  However the
current email is not about my requirements, or the found libraries,
but on the current direction the recent JSON libraries are heading
towards, namely how objects are encoded as `[{}] | list({string(),
json()})`.

    First of all a small survey about existing implementations:
    * `mochijson2` [1] uses (what I call) "mochi", i.e. `{struct,
list({string(), json()})}`;
    * `jsx` [2] uses EEP-0018, i.e. `[{}] | list({string(), json()})`;
    * `jiffy` [3] uses EEP-0018;
    * `ejson` [4] uses something in between "mochi" and EEP-0018,
namely `{list({string(), json()})}`;
    * `jsonx` [5] uses "mochi";
    * `kvc` [6] (although a querier) uses either "mochi" or EEP-0018;
    * `valijate` [7] (although only a validator) uses something
similar to "mochi";
    * others are either "mochi"-compliant or EEP-0018 compliant;
(many more seem to be EEP-0018 than "mochi";)

    Granted that EEP-0018 clearly states that a library could offer
the user the option to choose how the object is to be encoded as an
Erlang term (options A through F), and it could know how to correctly
interpret one as such.  (Unfortunately none of the encoding / decoding
libraries do support this choice.)


    At a first scratch there seems nothing wrong with either approach
(either "mochi" or EEP-0018).  Except maybe the fact that when writing
a multi-headed pattern-matching function, in case of EEP-0018 one must
match first the list, then the object, or else the object might be
misinterpreted as a list.

    Although I do have to say that "mochi" does have a clearly
unambiguous way to detect the value type, that is almost impossible to
get wrong.  But granted that deconstructing EEP-0018 compliant objects
with plain Erlang libraries (like `proplists`) is slightly more
straight-forward, because one can use the object value as a list,
whereas in "mochi" one would need to first extract the list, which
boils down to `Object` vs. `element(2, Object)`.

    Therefore why do I say EEP-0018 (and conforming libraries) are
wrong?  (In fact so wrong that I felt the need to write such a lengthy
email?)  Because when one wants to extend the proposed JSON term
syntax, or perhaps use it for something else (but still related to
JSON), things start to crumble.


    Let's take for example `valijate` [7] which allows one to easily
validate (among others) JSON values with a simple schema.  For example
one can say `{array, string}` to denote a schema which matches any
list made only of strings.

    One could write the `validate` function for EEP-0018 as:

~~~~ (not tested)
validate (List, {array, ElementSchema}) ->
    case List of
        [{}] -> false;
        [Head | _] when is_tuple (Head) -> false;
        _ when is_list (List) ->
            lists:foldl(
                fun (true, A) -> A; (false, _) -> false end, true,
                lists:map (fun (Element) -> validate(Element,
ElementSchema) end, List));
        _ -> false;
    end;
~~~~

    Compared with the following for "mochi":

~~~~ (not tested)
validate (List, {array, ElementSchema}) ->
    if
        is_list (List) ->
            lists:foldl(
                fun (true, A) -> A; (false, _) -> false end, true,
                lists:map (fun (Element) -> validate(Element,
ElementSchema) end, List));
        true -> false
    end;
~~~~

    No big issue so far, except a few extra matches (which could
become tiresome to write if we have more than one type of schema that
applies to lists).


    Let's move a little bit further.  Say we now want to be able to
write in `valijate` something like the following Erlang type:
`[list(string()), list(integer())]` (i.e. a list made of exactly two
elements, the first a list of strings, the second a list of integers).
Although I don't know how (or if) `valijate` is able to express this,
I would have expected something like this to work:

      [{array, string}, {array, integer}]

    Let's complicate it further and assume that we want to be able to
validate if a list is a set (i.e. non-repeating elements), and we
introduce a new schema type called `set`.  Let's see how a schema
would look like for a list made of exactly two elements, the first an
array of strings, the second a set of integers:

      [{array, string}, {set, integer}]

    However, before writing the implementation, let's imagine how a
schema for an objects would look like.  Assuming we want to keep as
close as possible to the original EEP-0018 proposed term syntax, one
could imagine something like this:

    * for an object with any number of attributes, whose keys and
values must independently match the given schemas (i.e. the expected
object behaves like a dictionary):

      [{<schema-for-key>, <schema-for-value}]

    * for an object with as many attributes as given by the tuples,
whose keys exactly match the given key literals (maybe in a different
order), and whose values match the given schemas (i.e. the expected
object behaves like a record):

      [{<literal-key-1>, <schema-for-value-1}, {<literal-key-2>,
<schema-for-value-2>}, ...]

    Let's try to give two examples:
    * an object with any key, and a string as value (pick any):
      [{string, string}]
      [{any, string}]
    * an object with exactly one key named either "string", or "any",
whose value is a string:
      [{string, string}]
      [{any, string}]

    Let's also try to provide the schema for an object with exactly
two attributes, one named `array` and with a string value, the second
named `set` with an integer value:

      [{array, string}, {set, integer}]

    Darn...  I can't discern between the schema for a dictionary of
strings or the schema for a record with a single attribute named
"string" and a string value.  Similarly I can't discern between the
schema for a two element array (one element a list, the other a set)
or a two attribute object one named "array", the other "set".
(Granted I can start complicating the schema syntax, but that would
get further from a "simple" approach where the schema resembles
closely the actual value.)


    Let's see if "mochi" could do it:
    * for a dictionary:
      {object, {<schema-for-key>, <schema-for-value}}
    * for a record:
      {object, [{<literal-key-1>, <schema-for-value-1}, ...]}
    * (an actual matching JSON would be encoded as:)
      {object, [{<literal-key-1>, <actual-value-1}, ...]}

    For example:
      {object, {string, string}} -- the string dictionary
      {object, [{string, string}]} -- the record with a string attribute
      {object, [{array, string}, {set, integer}]} -- the object with
"array" and "set" attributes
      [{array, string}, {set, integer}] -- the list with an array of
strings and a set of integers


    OK...  Let's assume that the validation use-case is not of
interest, and we could live with a certain syntax for the JSON values
and another one for the schema.  Fine!

    Say however that we want now to make a small extension to our
favourite JSON encoding library to better suit the following scenario:
 we need to implement a small web-service which gets a JSON from
somewhere as a binary (maybe from a database or file), i.e.
<<"{...}">>, of which we are certain it is correctly encoded, thus we
don't need to parse it, and we need to "wrap" it into another JSON
which holds some meta-data, like for example `{"ok" : true, "outcome"
: <the-JSON-we-got-from-somewhere>}`.  However as stated we would like
the to reuse our favourite JSON library to encode such a wrapper JSON,
but without first parsing the JSON binary.

    We could therefore extend the JSON library to allow us to put
inside JSON terms, values which are to be "pasted" directly in the
result (this is equivalent to the `RawMessage` from Go's JSON
library).

    However the question is how to flag these raw values?  We
obviously can't use a binary as that is the representation of a
string.  We could choose something like `{raw, <<"...">>}`.

    OK, let's now try to see how a list composed of a single element,
namely the raw JSON, would look like:

      [{raw, <<"...">>}]

    Doesn't that resemble an object with an attribute named "raw" and
a string as value?  In EEP-0018 it surely does, but in "mochi" it
doesn't (that would be `{object, [{raw, <<"...">>}]}`).

    (Or maybe we want to be able to use records directly as JSON
values, tagged like `{record, <actual-record>}` which would call an
external formatter, or even simpler directly using `<actual-record>`
provided that it's tag isn't `object`.  Or perhaps `{dict,
<actual-dict>}`, `{gb_tree, <actual-gb-tree>}`, etc.)


    I hope that the two given examples argument my case against
`[{}]`.  (I.e. it hampers extensibility of the proposed JSON term
syntax.)


    Moreover the other choice `{[{key, value}, ...]}` is marginally
better than EEP-0018, because now it suggests that people can match an
object by simply stating `is_tuple (JSON)`, which would make
implementing extensions like the raw message one harder.


    The reason that I wrote this email is because I have invested
quite some time in writing a "few" JSON utility functions (including
complex schema validation, destructuring, etc.) which heavily use and
extend the "mochi" variant.  Based on this experience and a small
analysis I've done today, I concluded that EEP-0018 would be quite
cumbersome for expressing any kind of extension without a lot of
pattern-matching to catch the extensions.  However by no mean do I
expect developers to change their libraries to suite such a usage, I
only wanted to provide a counter-argument to EEP-0018.  Moreover, now
that Erlang has hash objects, hopefully these can be used to express
objects, and this problem would go away.

    Hopefully I haven't offended anyone, (I apologize in advance,)
    Ciprian.


    [1] https://github.com/mochi/mochiweb/blob/master/src/mochijson2.erl
    [2] https://github.com/talentdeficit/jsx
    [3] https://github.com/davisp/jiffy
    [4] https://github.com/benoitc/ejson
    [5] https://github.com/iskra/jsonx
    [6] https://github.com/etrepum/kvc
    [7] https://github.com/eriksoe/valijate