[erlang-questions] json_to_term EEP

Hynek Vychodil vychodil.hynek@REDACTED
Wed Jul 30 12:07:49 CEST 2008


On Wed, Jul 30, 2008 at 3:34 AM, Richard A. O'Keefe <ok@REDACTED>wrote:

> It would be nice if people would read the EEP.
>
> On 30 Jul 2008, at 2:55 am, Hynek Vychodil wrote:
>
>> I would prefer to always have strings in *one* format and not special case
>> keys with atoms sometimes. Otherwise to be certain you would have to match
>> both atom and binary to find key. Unless you *always* use atoms for keys,
>> which could easily explode.
>>
>
> In the EEP, json_to_term(IO_Data, Options) has an option
>        {label,binary}
> or      {label,atom}
> or      {label,existing_atom}
> There is no corresponding option for strings, which are
> always binaries.  (The idea is that strings are
> unpredictable data, whereas labels are predictable structure.)
> {label,binary} says to leave all labels as binaries.
>    This would have been intolerable before <<"...">> syntax
>    was introduced; now the main thing is that it wastes space.
> {label,atom} says to convert to an atom any label that CAN
>    be converted to an atom, the main limitation being that
>    Erlang atoms are not yet Unicode-ready.  (Someone else has
>    an EEP about that, I believe.)  This is perfect for
>    communicating with a TRUSTED source, just like receiving
>    Erlang term_to_binary() values and decoding them.
> {label,existing_atom} means that a module that mentions
>    certain atoms in pattern matches against formerly-JSON
>    labels can be confident of finding those atoms, while
>    other labels may remain binaries.
>
> Options are a way of coping with different people's different
> situations and needs; the trick is to have just enough of them.
>
>  I argue unification,
>>
>
> Unification of what with what?
>
>  so transforming all to atom is insecure and result is don't use this way
>> at all.
>>
>
> WITHIN a trust boundary, all is well.  Not all communication
> crosses trust boundaries, otherwise term_to_binary() would be
> of little or no use.
>
>
>> Aside non-uniformity of  list_to_existing_atom way, there is performance
>> drawback too. For each key you must call
>> list_to_existing_atom(binary_to_list(X)) and binary_to_list causes GC
>> pressure in this usage. I would not have use this variant, too.
>>
>
> What performance drawback?  What call to binary_to_list()?  Whoever said
> the binary EXISTED in the first place?  The EEP is a proposal for putting
> these conversion functions in the Erlang core, eventually to be
> implemented in C.  So implemented, the alleged performance drawback simply
> does not exist.


All JSON data coming outside Erlang are binary in first state, there is no
Erlang lists outside Erlang.

>
>
>>
>  P.S.: Why non-uniform is problem.
>>
>
> It is a problem for people who EXPECT a uniform translation,
> and not for people who don't.
>
>  One can argue, it looks nicer. OK. One can argue, binary->atom
>> transformation is done only for exists atoms and all atoms which used in
>> comparisons are exists. BAD, imagine for example store Erlang term for long
>> time or send to other nodes
>>
>
> Again, you are overlooking the fact that different people have
> different needs, and that the translation of labels can be (and
> IS, in the EEP) an OPTION.  You are also overlooking the fact
> that *considered as JSON*, the forms are entirely equivalent,
> and that since JSON explicitly says that the order of key:value
> pairs does not matter, there is uncertainty about precisely
> what Erlang term you get anyway.
>
> In fact, for binary storage, conversion to existing atoms is
> *better* than conversion to binaries, because the Erlang
> term-to-binary format uses a compression scheme for atoms
> that it does not use for binaries.
>
> Admittedlty, the answer to that is to extend the compression
> scheme to binaries as well.
>
> You are overlooking the fact, that there are another scenarios. For
example:

1/ Read and parse JSON {"a":1, "b":2, "c":3} on one erlang node with one set
of existing atoms (a,b).

2/ Store Erlang term to file [{a,1}, {b,2}, {<<"c">>, 3}]

3/ In another erlang node with existing atom list {a,c} (for examle in some
module you want detect c key of data take from JSON) you load and parse same
JSON {"a":1, "b":2, "c":3} and from parser you get [{a,1}, {<<"b">>,2}, {c,
3}]

4/ Than you load stored erlang term from file and two think happend. You
take [{a,1}, {b,2}, {<<"c">>, 3}] and existing atoms are now {a,b,c}.

5/ Read and poarse JSON {"a":1, "b":2, "c":3} again and you take [{a,1},
{b,2}, {c, 3}]

6/ Great, you have terms [{a,1}, {b,2}, {c, 3}], [{a,1}, {b,2}, {<<"c">>,
3}] and [{a,1}, {<<"b">>,2}, {c, 3}] as Erlang term representing same JSON
input {"a":1, "b":2, "c":3}. What the hell, there is some totaly wrong,
isn't it?

Erlang is way how to make things safe and reliable. Converting keys to atoms
is not safe and reliable so don't do it, It hurts you!

-- 
--Hynek (Pichi) Vychodil
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20080730/b5686863/attachment.htm>


More information about the erlang-questions mailing list