<html><head><meta http-equiv="Content-Type" content="text/html charset=windows-1252"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">I am also pretty pleased with cson :-)<div>Looks very nice.</div><div><br></div><div>A small question.</div><div>How are strings encoded?</div><div><br></div><div>a) <utf8-octet-string-length>"<utf8-chars> </div><div>b) <number-of-unicode-chars>"<integers>*</div><div>c) <number-of-unicode-chars>"<utf8-char>*</div><div>d) Other?</div><div><br></div><div>Thanks</div><div><br></div><div>/Tony</div><div><br></div><div><div><div>On 29 jul 2013, at 03:10, Richard A. O'Keefe <<a href="mailto:ok@cs.otago.ac.nz">ok@cs.otago.ac.nz</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><br>On 26/07/2013, at 10:55 PM, Motiejus Jakštys wrote:<br><blockquote type="cite">Found it.<br><br><a href="http://tnetstrings.org/">http://tnetstrings.org/</a><br></blockquote><br>May I suggest *not* using that approach?<br><br>Let's start with size. I have a small JSON test collection.<br>Here's the result of reading each element of that collection<br>and<br> json - writing as JSON with a newline after every comma and<br> a space after every colon, but no indentation<br> xml - using <n/>, <t/>, <f/>, <v>number</v>, <s>string</s>,<br> <a>e1 ... en</a> or <d>e1 ... en</d>, with the key<br> strings appearing as key='..' attributes on the children<br> of <d> elements<br> cson - described below, using base 10 for integers<br> c64 - same as cson, but using base 64 for integers<br> tns - using the encoding described at <a href="http://tnetstrings.org">tnetstrings.org</a><br><br><br>json<span class="Apple-tab-span" style="white-space:pre"> </span>xml<span class="Apple-tab-span" style="white-space:pre"> </span>cson<span class="Apple-tab-span" style="white-space:pre"> </span>c64<span class="Apple-tab-span" style="white-space:pre"> </span>tns<br>9<span class="Apple-tab-span" style="white-space:pre"> </span>31<span class="Apple-tab-span" style="white-space:pre"> </span>8<span class="Apple-tab-span" style="white-space:pre"> </span>8<span class="Apple-tab-span" style="white-space:pre"> </span>16<br>385<span class="Apple-tab-span" style="white-space:pre"> </span>470<span class="Apple-tab-span" style="white-space:pre"> </span>341<span class="Apple-tab-span" style="white-space:pre"> </span>336<span class="Apple-tab-span" style="white-space:pre"> </span>387<br>201<span class="Apple-tab-span" style="white-space:pre"> </span>273<span class="Apple-tab-span" style="white-space:pre"> </span>167<span class="Apple-tab-span" style="white-space:pre"> </span>165<span class="Apple-tab-span" style="white-space:pre"> </span>204<br>428<span class="Apple-tab-span" style="white-space:pre"> </span>545<span class="Apple-tab-span" style="white-space:pre"> </span>362<span class="Apple-tab-span" style="white-space:pre"> </span>350<span class="Apple-tab-span" style="white-space:pre"> </span>422<br>2864<span class="Apple-tab-span" style="white-space:pre"> </span>3262<span class="Apple-tab-span" style="white-space:pre"> </span>2676<span class="Apple-tab-span" style="white-space:pre"> </span>2550<span class="Apple-tab-span" style="white-space:pre"> </span>2903<br>680<span class="Apple-tab-span" style="white-space:pre"> </span>905<span class="Apple-tab-span" style="white-space:pre"> </span>543<span class="Apple-tab-span" style="white-space:pre"> </span>534<span class="Apple-tab-span" style="white-space:pre"> </span>659<br>25<span class="Apple-tab-span" style="white-space:pre"> </span>57<span class="Apple-tab-span" style="white-space:pre"> </span>22<span class="Apple-tab-span" style="white-space:pre"> </span>17<span class="Apple-tab-span" style="white-space:pre"> </span>34<br>258<span class="Apple-tab-span" style="white-space:pre"> </span>356<span class="Apple-tab-span" style="white-space:pre"> </span>204<span class="Apple-tab-span" style="white-space:pre"> </span>204<span class="Apple-tab-span" style="white-space:pre"> </span>249<br>33<span class="Apple-tab-span" style="white-space:pre"> </span>48<span class="Apple-tab-span" style="white-space:pre"> </span>27<span class="Apple-tab-span" style="white-space:pre"> </span>27<span class="Apple-tab-span" style="white-space:pre"> </span>33<br>529<span class="Apple-tab-span" style="white-space:pre"> </span>715<span class="Apple-tab-span" style="white-space:pre"> </span>421<span class="Apple-tab-span" style="white-space:pre"> </span>410<span class="Apple-tab-span" style="white-space:pre"> </span>515<br>1192<span class="Apple-tab-span" style="white-space:pre"> </span>1621<span class="Apple-tab-span" style="white-space:pre"> </span>939<span class="Apple-tab-span" style="white-space:pre"> </span>911<span class="Apple-tab-span" style="white-space:pre"> </span>1161<br>192<span class="Apple-tab-span" style="white-space:pre"> </span>255<span class="Apple-tab-span" style="white-space:pre"> </span>160<span class="Apple-tab-span" style="white-space:pre"> </span>154<span class="Apple-tab-span" style="white-space:pre"> </span>191<br>2425<span class="Apple-tab-span" style="white-space:pre"> </span>3289<span class="Apple-tab-span" style="white-space:pre"> </span>1853<span class="Apple-tab-span" style="white-space:pre"> </span>1807<span class="Apple-tab-span" style="white-space:pre"> </span>2378<br>23<span class="Apple-tab-span" style="white-space:pre"> </span>30<span class="Apple-tab-span" style="white-space:pre"> </span>20<span class="Apple-tab-span" style="white-space:pre"> </span>12<span class="Apple-tab-span" style="white-space:pre"> </span>26<br>23<span class="Apple-tab-span" style="white-space:pre"> </span>30<span class="Apple-tab-span" style="white-space:pre"> </span>20<span class="Apple-tab-span" style="white-space:pre"> </span>12<span class="Apple-tab-span" style="white-space:pre"> </span>26<br>32<span class="Apple-tab-span" style="white-space:pre"> </span>52<span class="Apple-tab-span" style="white-space:pre"> </span>13<span class="Apple-tab-span" style="white-space:pre"> </span>12<span class="Apple-tab-span" style="white-space:pre"> </span>39<br>50<span class="Apple-tab-span" style="white-space:pre"> </span>82<span class="Apple-tab-span" style="white-space:pre"> </span>37<span class="Apple-tab-span" style="white-space:pre"> </span>37<span class="Apple-tab-span" style="white-space:pre"> </span>50<br>42<span class="Apple-tab-span" style="white-space:pre"> </span>109<span class="Apple-tab-span" style="white-space:pre"> </span>35<span class="Apple-tab-span" style="white-space:pre"> </span>26<span class="Apple-tab-span" style="white-space:pre"> </span>61<br>7<span class="Apple-tab-span" style="white-space:pre"> </span>10<span class="Apple-tab-span" style="white-space:pre"> </span>5<span class="Apple-tab-span" style="white-space:pre"> </span>5<span class="Apple-tab-span" style="white-space:pre"> </span>6<br><br><br>I'm rather pleased that an encoding (cson) that I threw together<br>in a couple of minutes handily beats the rest, but not surprised.<br><br>There is a simple blunder in the TNetStrings design that causes<br>serious inefficiency if you try to transport nontrivial<br>data that way: the "type" code is at the wrong end.<br><br>You have to read an entire object before you can start<br>decoding it, which is just plain silly. _And_ it is hard<br>to transmit floating-point numbers accurately.<br><br>Not only that, you cannot stream the output. There is a<br>JSONGenerator class recently added to Java so that you can<br>stream large amounts of data out without actually having to<br>hold much in memory; this is a need people genuinely have.<br><br>Let's just look at the output code for three techniques, taken<br>from my Smalltalk library. To keep it simple, let's just look<br>at arrays.<br><br> printJsonOn: aStream<br> aStream nextPut: $[.<br> self do: [:each | each printJsonOn: aStream]<br> separatedBy: [aStream nextPut: $,; cr].<br> aStream nextPut: $].<br><br><br><span class="Apple-tab-span" style="white-space:pre"> </span>Output goes directly to the output stream with NO<br><span class="Apple-tab-span" style="white-space:pre"> </span>intermediate objects created. You can stream this<br><span class="Apple-tab-span" style="white-space:pre"> </span>without knowing the size of the virtual array until<br><span class="Apple-tab-span" style="white-space:pre"> </span>the end.<br><br> printCsonOn: aStream<br> self size printOn: aStream.<br> aStream nextPut: $[.<br> self do: [:each | each printCsonOn: aStream].<br><br><span class="Apple-tab-span" style="white-space:pre"> </span>Output goes directly to the output stream with NO<br><span class="Apple-tab-span" style="white-space:pre"> </span>intermediate objects created. You can stream this<br><span class="Apple-tab-span" style="white-space:pre"> </span>as long as you know the size of the virtual array<br><span class="Apple-tab-span" style="white-space:pre"> </span>at the beginning.<br><br> printTNetStringOn: aStream<br> |s|<br> s := StringBuffer new: self size * 6.<br> self do: [:each | each printTNetStringOn: s].<br> s size printOn: aStream.<br> aStream nextPut: $:; nextPutAll: s; nextPut: $].<br><br><span class="Apple-tab-span" style="white-space:pre"> </span>OUCH! You have to convert every element to a string,<br><span class="Apple-tab-span" style="white-space:pre"> </span>concatenate them, then write the size of the *string*<br><span class="Apple-tab-span" style="white-space:pre"> </span>(not the *array*), and then the string. The whole<br><span class="Apple-tab-span" style="white-space:pre"> </span>thing has to be held in memory as a string. You cannot<br><span class="Apple-tab-span" style="white-space:pre"> </span>stream this.<br><br>And having paid the heavy cost of building the output, you get<br>no special benefit from the input. Yes, you can preallocate<br>*strings*, but since you are never told the size of *arrays* or<br>*objects*, you cannot preallocate them.<br><br>What then is this "cson"?<br><br>It is the TNetString approach with three simple fixes:<br>(1) Type information goes at the BEGINNING, not the end.<br>(2) Sizes for arrays and objects are the element counts of the<br> arrays and objects, NOT the character counts (still less<br> the byte counts) of the strings that represent them,<br> so you can preallocate and stream input. For example,<br> given a path to an item, you could decode just that item<br> without allocating *any* space for unwanted stuff -- though<br> you would still have to decode it.<br>(3) Floats are represented as integers times a power of 2<br> so that they can easily be transported without rounding.<br>The output is a byte sequence; where characters appear they<br>are to be encoded using UTF-8.<br><br><span class="Apple-tab-span" style="white-space:pre"> </span><number>"+"<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span>a positive integer<br><span class="Apple-tab-span" style="white-space:pre"> </span><number>"-"<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span>a negative integer<br><span class="Apple-tab-span" style="white-space:pre"> </span><number>"*"<number>"+"<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span>a positive float with positive exponent<br><span class="Apple-tab-span" style="white-space:pre"> </span><number>"*"<number>"-" <span class="Apple-tab-span" style="white-space:pre"> </span>a negative float with positive exponent<br><span class="Apple-tab-span" style="white-space:pre"> </span><number>"/"<number>"+" <span class="Apple-tab-span" style="white-space:pre"> </span>a positive float with negative exponent<br><span class="Apple-tab-span" style="white-space:pre"> </span><number>"/"<number>"-" <span class="Apple-tab-span" style="white-space:pre"> </span>a negative float with negative exponent<br><span class="Apple-tab-span" style="white-space:pre"> </span><number>">"<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span>positive infinity or NaN<br><span class="Apple-tab-span" style="white-space:pre"> </span><number>"<"<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span>negative infinity or NaN<br><span class="Apple-tab-span" style="white-space:pre"> </span><number>"#"<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span>false<br><span class="Apple-tab-span" style="white-space:pre"> </span><number>"="<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span>true<br><span class="Apple-tab-span" style="white-space:pre"> </span><number>"!"<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span>null<br><span class="Apple-tab-span" style="white-space:pre"> </span><number>"<chars><span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span>a Unicode string<br><span class="Apple-tab-span" style="white-space:pre"> </span><number>"["(<item>)*<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span>a sequence<br><span class="Apple-tab-span" style="white-space:pre"> </span><number>"{"(<key><item>)*<span class="Apple-tab-span" style="white-space:pre"> </span>a dictionary<br><br>where <number> is a possibly empty sequence of decimal digits<br>with no leading zeros. (In particular, 0 -> "+", not "0+".)<br>This means that null, false, true come out as "!", "#", "=".<br>For extra compactness, numbers could be encoded in base 64,<br>using the digits 0-9A-Za-z$@,<br>but experimentally, that only saves a couple of percent.<br>The numbers reported above use decimal encoding.<br><br>There is no restriction here that the keys of an "object" can be only<br>strings; that's for a higher level protocol to decide. If sender and<br>receiver can both handle more general dictionaries, why not?<br><br><n>*<m> stands for <n>*(2**<m>) and <n>/<m> stands for <n>/(2**<m>).<br>The representation is unique: either <n> is empty and <m> is<br>empty (using "*+" for +0.0 and "*-" for -0.0) or <n> is an odd<br>integer. ">" and "<" are +infinity and -infinity respectively;<br>other leading numbers indicate NaNs.<br><br><br>TNetStrings claims the following advantages:<br><br>1. Trivial to parse in every language without making errors.<br><br> FALSE. To parse an array, you have to first read in the<br> whole string, and then recursively decode it. You can't<br> decode as you go because you don't find out what it _is_<br> until you read the end.<br><br>2. Resistant to buffer overflows and other problems.<br><br> MISLEADING. Whether there can be buffer overflows depends<br> on the implementation. My JSON parser is not subject to<br> buffer overflows, and I can't think why any competent programmer's<br> JSON parser would be.<br><br>3. Fast and low resource intensive.<br><br> FALSE.<br><br>4. Makes no assumptions about string contents and can store binary data<br> without escaping or encoding them.<br><br> UNCLEAR. Dan Bernstein's netstrings proposal was specific to 8-bit<br> characters. It was unsuitable for transmitting text between 8-bit<br> systems using different code pages. TNetStrings says that all<br> counts are *byte* counts and all data are *byte* sequences, which<br> leaves the handling of Unicode text -- essential if this is to serve<br> where JSON serves -- totally unclear. If Unicode text is transmitted<br> in UTF-8, then it *cannot* store binary data without escaping or<br> encoding. This may even be FALSE, because floating point numbers<br> count as binary data in my book, and certainly have to be encoded in<br> this format. The idea of representing 1.23e-20 as<br> 0.0000000000000000000123 strikes me as gratuitously odd; you don't<br> want to see 1.2345e300 !<br><br>5. Backward compatible with original netstrings.<br><br> TRUE, but so what?<br><br>6. Transport agnostic, so it works with streams, messages, files,<br> anything that's 8-bit clean.<br><br> TRUEish. There is an unstated assumption that we are dealing<br> with *byte* streams, not *text* streams, which is what makes<br> claim 4 almost certainly false. In any case, JSON also has<br> this property, and so does CSON.<br><br>CSON claims the following advantages:<br><br>1. Easy to generate, including streaming output as long as you<br> know array/object element counts when you start to write,<br> creating *no* intermediate data structures.<br><br>2. Easy to read, creating *no* intermediate data structures.<br><br>3. Handles Unicode.<br><br>4. Handles floating point numbers precisely as long as the<br> receiver is able to hold the numbers.<br><br>5. Encoded data are byte streams, just like JSON.<br><br>and the following disadvantage:<br><br>6. You can skip an unwanted item without *allocating* it but<br> not without *decoding* it.<br><br><br><br><br><br>_______________________________________________<br>erlang-questions mailing list<br><a href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a><br>http://erlang.org/mailman/listinfo/erlang-questions<br></blockquote></div><br><div>
<span class="Apple-style-span" style="border-collapse: separate; border-spacing: 0px; "><div><span class="Apple-style-span" style="color: rgb(51, 51, 51); font-family: Geneva, Arial, Helvetica, sans-serif; font-size: 12px; ">"Installing applications can lead to corruption over time. </span><span class="Apple-style-span" style="color: rgb(51, 51, 51); font-family: Geneva, Arial, Helvetica, sans-serif; font-size: 12px; ">Applications gradually write over each other's libraries, partial upgrades occur, user and system errors happen, and minute changes may be unnoticeable and difficult to fix"</span></div><div><span class="Apple-style-span" style="color: rgb(51, 51, 51); font-family: Geneva, Arial, Helvetica, sans-serif; font-size: 12px; "><br></span></div></span><br class="Apple-interchange-newline">
</div>
<br></div></body></html>