[erlang-questions] json_to_term EEP

Fri Aug 1 01:10:07 CEST 2008

On 31 Jul 2008, at 6:06 pm, Willem de Jong wrote:
> Of course, but if the Erlang team creates special, fast support in C  
> it
> would be good if it could be used by as many people as possible.

Agreed.

But remember, the existence of an EEP is no guarantee whatever that
the proposal will ever be accepted.  Even the publication of an EEP
is simply acceptance by the moderator that the proposal meets the
formal criteria and isn't ragingly insane.  Rival EEPs addressing
the same area are allowed to exist, and are even a good idea.

To put it bluntly, there is no way I am ever going to put a SAX-like
interface in *my* EEP for JSON.  (I am not rejecting the idea that a
JSON converter might accept or deliver JSON forms incrementally; the
issues there are rather different.)

Anyone who thinks differently is not only free to write their own
EEP, they are *welcome* to do so.  It will be *good* for Erlang if
different ideas about how to do things are clearly written up and
available for discussion.
>

> I personally like working with a SAX parser.
> See the example below - I quite enjoyed writing it.

I'm sure you did, but the example does not in fact
work with a SAX parser.  It works with an (apparently non-existent)
parser that delivers a *data structure*, and it consumes the data
structure.  If you are going to work with a data structure, why
not the very same data structure that you are supposed to be getting?
It's like saying "well, I could have a pizza delivered to my door,
but instead I'll have all the ingredients delivered in separate
deliveries and then I'll make the pizza".
>
> The question is whether the things that an ESIS/SAX-like interface
> let you do are things that people particularly *want* to do with JSON.
> I have no idea.
>
> The point is, that the Erlang team would probably like to implement  
> only 1 very
> fast JSON parser in C.

The snag is that you CAN have a "very fast" JSON->term parser,
but you CAN'T have a "very fast" JSON->event stream parser,
because you have the extra overhead of creating event terms
and either calling a handler function (which then has to go to
all the trouble of decoding what the parser _knew_) or sending
messages to another process (ditto).  The people who *want* a JSON
form as an Erlang term would be very ill served by a SAX-like
interface, and the intrinsic overheads are such that the people
who want a SAX-like interface would get little benefit from an
implementation in C.

> In my opinion, that should be a SAX-like parser, because
> it is easy to create DVM output based on SAX output, but pointless  
> to do it the
> other way around.

JSON is so simple that producing a term from a sequence of events
is scarcely any easier than writing a parser in the first place.
Really, the only thing you are spared is handling UTF-8.

As for it being pointless to turn DOM (or DVM) into SAX,
opinions may vary.  I've had good reason to do it several times.

I find it telling that all the JSON parsers for Erlang that I've looked
at generate terms; not one of them offers a SAX-like interface.
Doubtless there are many more that I haven't looked at, so I cannot
claim that there are no JSON/SAX parsers for Erlang, or that nobody
has a need for one.  I certainly can claim that if anyone did want a
JSON/SAX parser, it would be quite easy to take one of the existing
freely available JSON parsers and modify it to send events instead of
building a result.

If people were routinely pumping Brobdingnagian JSON messages around
the Web, it would be important to use an event stream interface to
keep process sizes reasonable.  It does not appear that they are.
The Agile slogan YAGNI! applies, I think.

> A sax parser may create the following events (that is: call its  
> callback
> function with the following arguments, while parsing):
>
> E = [startDocument,startObject, {key,"menu"}, startObject, {key,"id"},
>  {value,"file"}, {key,"popup"}, startObject, {key,"menuitem"},
>  startArray,startObject, {key,"value"}, {value,"New"},  
> {key,"onclick"},
>  {value,"CreateNewDoc()"}, endObject,startObject, {key,"value"},
>  {value,"Close"}, {key,"onclick"}, {value,"CloseDoc()"}, endObject,
>  endArray,endObject,endObject,endObject, endDocument].
>
As a data structure, this is far bigger than the simple term would be.
It *has* to be more expensive to create this.
It becomes clear later in your message that this is not what you
really mean:  you mean something like
	json_event_stream_parser(IO_Data, Handler, Initial_State)
where
	Handler :: JSON_Event -> State -> State

> Below an example of a callback function to process these events -  
> this function would be called by the SAX parser when it has  
> processed another relevant part of the JSON document. The parser  
> passes the value
> returned by the function to the next invocation (second argument of  
> the function, the first argument is the SAX event).
>
> dvm(startDocument, _) ->
>   start;
> dvm(startObject, Stack) ->
>   [[]| Stack];
> dvm(startArray, Stack) ->
>   [[]| Stack];
> dvm({key, _} = Event, Stack) ->
>   [Event|Stack];
> dvm({value, Value}, start) ->
>   {value, Value};
>
Technically, the JSON RFC does not allow this.
It does seem sensible to handle it though.
>
> dvm({value, Value}, [{key, Key}, List | T]) ->
>   [[{Key, Value} | List] | T];
> dvm({value, Value}, [List | T]) ->
>   [[Value | List] | T];
> dvm(endObject, [List | T]) ->
>   dvm({value, {lists:reverse(List)}}, T);
> dvm(endArray, [List | T]) ->
>   dvm({value, lists:reverse(List)}, T);
> dvm(endDocument, {value, R}) ->
>   R.
>
In short, you are proposing that an interface that most Erlang
JSON users do not appear to have a need for should be
privileged so that an interface that there IS a demonstrated
need for can be implemented on top of it much more expensively.

I do not find this convincing.

That does not matter.
Write an EEP of your own.  Spell out the details.
Put it on the supermarket shelf and see if anyone
makes chop suey with it.