<div dir="ltr"><br><br><div class="gmail_quote">On Wed, Jul 30, 2008 at 3:34 AM, Richard A. O'Keefe <span dir="ltr"><<a href="mailto:ok@cs.otago.ac.nz">ok@cs.otago.ac.nz</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
It would be nice if people would read the EEP.<div class="Ih2E3d"><br>
<br>
On 30 Jul 2008, at 2:55 am, Hynek Vychodil wrote:<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
I would prefer to always have strings in *one* format and not special case keys with atoms sometimes. Otherwise to be certain you would have to match both atom and binary to find key. Unless you *always* use atoms for keys, which could easily explode.<br>
</blockquote>
<br></div>
In the EEP, json_to_term(IO_Data, Options) has an option<br>
{label,binary}<br>
or {label,atom}<br>
or {label,existing_atom}<br>
There is no corresponding option for strings, which are<br>
always binaries. (The idea is that strings are<br>
unpredictable data, whereas labels are predictable structure.)<br>
{label,binary} says to leave all labels as binaries.<br>
This would have been intolerable before <<"...">> syntax<br>
was introduced; now the main thing is that it wastes space.<br>
{label,atom} says to convert to an atom any label that CAN<br>
be converted to an atom, the main limitation being that<br>
Erlang atoms are not yet Unicode-ready. (Someone else has<br>
an EEP about that, I believe.) This is perfect for<br>
communicating with a TRUSTED source, just like receiving<br>
Erlang term_to_binary() values and decoding them.<br>
{label,existing_atom} means that a module that mentions<br>
certain atoms in pattern matches against formerly-JSON<br>
labels can be confident of finding those atoms, while<br>
other labels may remain binaries.<br>
<br>
Options are a way of coping with different people's different<br>
situations and needs; the trick is to have just enough of them.<br>
<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
I argue unification,<br>
</blockquote>
<br>
Unification of what with what?<div class="Ih2E3d"><br>
<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
so transforming all to atom is insecure and result is don't use this way at all.<br>
</blockquote>
<br></div>
WITHIN a trust boundary, all is well. Not all communication<br>
crosses trust boundaries, otherwise term_to_binary() would be<br>
of little or no use.<div class="Ih2E3d"><br>
<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<br>
Aside non-uniformity of list_to_existing_atom way, there is performance drawback too. For each key you must call list_to_existing_atom(binary_to_list(X)) and binary_to_list causes GC pressure in this usage. I would not have use this variant, too.<br>
</blockquote>
<br></div>
What performance drawback? What call to binary_to_list()? Whoever said<br>
the binary EXISTED in the first place? The EEP is a proposal for putting<br>
these conversion functions in the Erlang core, eventually to be<br>
implemented in C. So implemented, the alleged performance drawback simply<br>
does not exist.</blockquote><div><br>All JSON data coming outside Erlang are binary in first state, there is no Erlang lists outside Erlang.<br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div class="Ih2E3d"><br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<br>
</blockquote>
<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
P.S.: Why non-uniform is problem.<br>
</blockquote>
<br></div>
It is a problem for people who EXPECT a uniform translation,<br>
and not for people who don't.<div class="Ih2E3d"><br>
<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
One can argue, it looks nicer. OK. One can argue, binary->atom transformation is done only for exists atoms and all atoms which used in comparisons are exists. BAD, imagine for example store Erlang term for long time or send to other nodes<br>
</blockquote>
<br></div>
Again, you are overlooking the fact that different people have<br>
different needs, and that the translation of labels can be (and<br>
IS, in the EEP) an OPTION. You are also overlooking the fact<br>
that *considered as JSON*, the forms are entirely equivalent,<br>
and that since JSON explicitly says that the order of key:value<br>
pairs does not matter, there is uncertainty about precisely<br>
what Erlang term you get anyway.<br>
<br>
In fact, for binary storage, conversion to existing atoms is<br>
*better* than conversion to binaries, because the Erlang<br>
term-to-binary format uses a compression scheme for atoms<br>
that it does not use for binaries.<br>
<br>
Admittedlty, the answer to that is to extend the compression<br>
scheme to binaries as well.<br>
<br>
</blockquote></div>You are overlooking the fact, that there are another scenarios. For example:<br><br>1/ Read and parse JSON {"a":1, "b":2, "c":3} on one erlang node with one set of existing atoms (a,b).<br>
<br>2/ Store Erlang term to file [{a,1}, {b,2}, {<<"c">>, 3}]<br><br>3/ In another erlang node with existing atom list {a,c} (for examle in some module you want detect c key of data take from JSON) you load and parse same JSON {"a":1, "b":2, "c":3} and from parser you get [{a,1}, {<<"b">>,2}, {c, 3}]<br>
<br>4/ Than you load stored erlang term from file and two think happend. You take [{a,1}, {b,2}, {<<"c">>, 3}] and existing atoms are now {a,b,c}.<br><br>5/ Read and poarse JSON {"a":1, "b":2, "c":3} again and you take [{a,1}, {b,2}, {c, 3}]<br>
<br>6/ Great, you have terms [{a,1}, {b,2}, {c, 3}], [{a,1}, {b,2}, {<<"c">>, 3}] and [{a,1}, {<<"b">>,2}, {c, 3}] as Erlang term representing same JSON input {"a":1, "b":2, "c":3}. What the hell, there is some totaly wrong, isn't it?<br>
<br>Erlang is way how to make things safe and reliable. Converting keys to atoms is not safe and reliable so don't do it, It hurts you!<br><br>-- <br>--Hynek (Pichi) Vychodil<br>
</div>