[erlang-questions] json to map

Roelof Wobben r.wobben@REDACTED
Thu Aug 27 13:04:22 CEST 2015


Can this be a way to solve the challenge : 


Op 27-8-2015 om 4:41 schreef Richard A. O'Keefe:
> On 26/08/2015, at 5:56 pm, Roelof Wobben <r.wobben@REDACTED> wrote:
>> the exact text of the challenge is here :
>> Configuration files can be conveniently represented as JSON terms.
> Yuck.  This has "representation" backwards.
> Here's what we have, using [thing] for "things"
> and (action) for "processes".
> In our heads                   In our processes        In our file system
> [abstract                      [stored
>   configuration -(implement) ->  configuration
>   value]                         data]
>     |                              |
> (conversion)                   (programmed conversion)
>     |                              |
>     v                              |
> [abstract                         v
>   JSON value]   -(implement) -> [stored JSON value]
>     |                              |
>     v                              |
> (unparse)                      (programmed unparsing)
>     |                              |
> [abstract token                   v
>   sequence]     -(implement) -> [stored token sequence]
>     |                              |
> (layout + unlex)               (programmed layout + unlex)
>     |                              |
>     v                              |
> [abstract character               v
>   sequence]     -(implement) -> [stored character sequence]
>     |                              |
> (Unicode encoding)             (programmed encoding)
>     |                              |
>     v                              |
> [abstract byte                    v
>   sequence]     -(implement) -> [stored byte sequence]
>     |                              |
> (compress,                     (programmed compression.
>   encrypt,                       encryption, signing,
>   sign, &c)                      and so on)
>    |                              |
>    v                              v
> [another abstract             [another stored
>   byte sequence]                byte sequence]  ---(store)---> [FILE]
> There is an ABSTRACT space of JSON terms.
> Each of the arrows (down and right) is a "representation" arrow.
> The thing at the tip of the arrow represents the thing at the
> base of the arrow.  The file we end up with is the thing that
> does the representing (GIVEN this framework), and the
> configuration data is what is represented.
> Don't take "stored" too literally.  A "stored" data in the
> middle column could be a data structure or a communication
> pattern.  I just mean that it's "inside the computer" in
> the sense that it is directly accessible to code.
> This diagram must commute, that is, whatever path you take
> through the arrows, you must end up with *equivalent* things.
> Not equal.
> Converting configuration values to JSON values need not be
> unique.  For example, a set of n elements might be converted
> to a JSON array without duplicates in n! ways.  But we can
> arrange to treat permuted arrays in certain contexts as
> equivalent.
> Converting JSON values to token sequences is not unique.
> For example, a JSON object doesn't *have* any order to it,
> but for unparsing, you have to pick an order.  Given an
> object with n pairs, there are n! ways to order them.
> We can arrange to treat those as equivalent.
> Unlexing, converting tokens to character sequences, is not
> unique. 1, 1e0, 10e-1, 1.0e1, &c are the same, so even
> without allowing leading zeros there are hundreds of
> ways (but not infinitely many ways) to represent a number
> token.  Most unicode characters can be represented in two
> ways (/ can be represented in three), so a string of n
> characters can be unlexed in at least 2**n ways.  (It's
> worse than that because \u002f and \u002F are equivalent,
> so / has four alternatives.)
> Layout can insert arbitrary amounts of white space between tokens,
> and there are infinitely many ways to do that.
> There are multiple definitions of JSON.  ECMA 404 stops at
> the level of Unicode character sequences, and has nothing
> to say about encoding.  There are LOTS of encodings.
> There are also many compression, encryption, and digital
> signature algorithms, which be freely composed.
> JSON qua JSON has nothing to say about how files are encoded
> or whether they are compressed, encrypted, or signed.  But
> to put text into a file, you have to encode it somehow, and
> you have to make some decision about other matters.  (And
> don't get me onto file systems with fixed length records,
> where you have to figure out how to fit a 1 million character
> string into 128 byte records...
>> Write
>> some functions to read configuration files containing JSON terms and
>> turn them into Erlang maps.
> What if a configuration file represents this JSON term:
>      [["target","some program"],
>       ["source","some other program"],
>       ["date",[2015,08,27,14,05]],
>       ["gibberish",[3,1,4,1,5,9,2,7]]]
> How are you supposed to convert *that* to an Erlang map?
> In any way that makes sense?
> Oh, I know:
>      {"": <<"[[\"target...7,]]]>>}
> or whatever the syntax for maps is.
> It technically satisfies the requirements!
> The first thing to do with these exercises is CRITICISE them.
> I do not mean to sneer at them and throw them away, but to
> start from a presupposition that the language is muddled,
> the contents confused, and the requirements either incomplete
> or inconsistent.  (Like practically *every* requirement we
> start with including some published standards.  I'm looking
> at you, ECMA 404!)
> I am not kidding.  You have to start out by trying to
> understand the requirements, EXPECTING to find problems,
> RESOLVING them, and writing down REVISED requirements
> that spell out everything you actually need to know.
> For example, you might include the following:
>   - Only the UTF-8 encoding is to be supported.
>   - No compression, encryption, or signing are to be supported.
>   - You may assume that the file system treats a file as
>     an arbitrary sequence of bytes with no record boundaries.
>   - You are to convert null, false, true to the Erlang atoms
>     'null, 'false', 'true'.
>   - You are to convert JSON numbers to Erlang floats.
>   - You are to convert JSON strings to Erlang binaries.
>   - You are to convert JSON arrays  to Erlang lists;
>     nothing else is to be converted to a list.
>   - You are to convert JSON objects to Erlang maps;
>     nothing else is to be converted to a map.
>   - You are not to worry about inverting the conversion
>     from configuration data to JSON terms; there is no
>     configuration data, that was just put in to make it
>     interesting.
>> Write some code to perform sanity checks
>> on the data in the configuration files.
> Here is another piece of confusion/incompleteness, or
> possibly even questionable advice.
> This presupposes some procedure where you FIRST convert
> a JSON text stored in a file to some Erlang term and
> THEN you check the sanity.  Or at least, it seems to.
> Another approach is to check as you go so that there is
> never any insane Erlang data at all.
> This is highly topical, because we've recently seen a
> bunch of serious Android security bugs caused by
> overly trusting object deserialisation which allowed
> objects to be constructed violating their invariants.
> In fact this has triggered a burst of work on my
> Smalltalk system, because I had a great big OOPS:
> oh dear, I have the same problem.  So I'm now slogging
> through nearly a thousand files turning comments
> about invariants into executable code and writing
> invariants for the *shameful* number of classes that
> had none, so that the deserialisation code can call
> each newly reconstructed object's #invariant method
> before trusting it.
> So I strongly recommend validating data as you parse
> it, and if a sanity check is failed, crash immediately.
> This leaves nothing for subsequent sanity checks to do.
> UNLESS you have configuration data that's converted to
> JSON terms in such a way that not all terms represent
> valid configuration data.  But from what you quote,
> you haven't been given anything for sanity checks like
> that to DO.
> All things considerd, the exercise appears to be a
> cryptic way of saying "WRITE A JSON PARSER".
> For what it's worth, my JSON parser in Smalltalk is
> 117 lines for a tokeniser + 45 lines for a parser.
> Being stricter about the input would let me shave
> about 20 lines off the total.
> Much of the trickiness is in handling strings,
> where JSON requires that a character outside the
> Basic Multilingual plane must be encoded as a
> surrogate pair.
> Processing a sequence of characters as an Erlang
> string will probably make your life simpler; and
> processing a sequence of tokens as an Erlang list
> will also be likely to make your life simpler.

Dit e-mailbericht is gecontroleerd op virussen met Avast antivirussoftware.

More information about the erlang-questions mailing list