[erlang-questions] Erlang term to ASCII

Sun Feb 9 00:01:58 CET 2014

The way I did it recently was (a bit convoluted):
    LookingPrettierTerm = lists:flatten(io_lib:format("~p",[NastyTerm])),    LookingEvenPrettierTerm = re:replace(LookingPrettierTerm, "\\s+", "", [global,{return,list}]),    io:fwrite("Term looking pretty ~s~n",[LookingEvenPrettierTerm]),

Date: Fri, 7 Feb 2014 14:49:29 -0800
From: bob@REDACTED
To: a.brandon.clark@REDACTED
CC: erlang-questions@REDACTED
Subject: Re: [erlang-questions] Erlang term to ASCII

Hashing of a string representation is probably not a great way to ensure that both systems agree on that structure. Even different versions of Erlang disagree on what exactly the representation of ~p should be for a given term, and I'm not talking about whitespace. It's simply not well suited for your purpose, even though it might appear to work for most of the cases you have now.

You could write your own Erlang implementation of whatever the CMS does, but you're unlikely to implement it correctly by taking "~p" and then trying to normalize the output.

You really should come up with a scheme to either hash the data structure itself (like Erlang's phash2 but portable to both sides of your system), or you should hash the output of a representation that has absolutely no flexibility in how data is encoded. Erlang's External Term Format is a lot closer, but there are still multiple valid ways to encode the same thing (particularly the atom cache table, using bigger encodings than necessary, etc.).

On Fri, Feb 7, 2014 at 2:16 PM, Brandon Clark <a.brandon.clark@REDACTED> wrote:

The non-Erlang app is an in-house configuration management system.  It stores configuration settings in a SQL database and produces the app.config file used by my Erlang app on demand via REST interface; the configuration is delivered as a string in the body of an HTTP response.

My ultimate goal is to rework the Erlang app so that it fetches and applies most configuration updates automatically without restarting.  For the near term, that means I have to cope with strings.  In the long term, though, I have some sway over what the CMS does, provided I'm willing to get my hands dirty.  ("Don't bring problems -- bring solutions.")  I'm in favor of Erlang external terms and I think I can sell that solution with BERT and BERT-RPC.

Regardless of how we transmit the data, we still need a way to confirm that both systems agree on what the data structure contains.  The current favored solution is to ask both systems to compute a hash of their data.  We can normalize the data any way we want in preparation for hashing as long as both sides do it in exactly the same way.  Since the CMS is already pretty good at building Erlang terms as strings without newlines or indentation, I set out to find a way to make Erlang do the same thing.

~BC

On Fri, Feb 7, 2014 at 11:16 AM, Bob Ippolito <bob@REDACTED> wrote:

How is this other system maintaining the deeply-nested Erlang structure? How or why is it in text? How does it get rendered without newlines and indentation? Why not use a more predictable (less flexible) serialization format (erlang term format, JSON, protocol buffers, …)?

I do not think that this is the best approach. There are many possible representations of various tokens as Erlang source code and I would never trust two implementations to render it exactly the same way unless I wrote both of them. Some tokens that are going to be particularly problematic are lists of integers (which may or may not be rendered like strings, with various ways to escape), binaries (which may be rendered like strings or not), and floating point numbers.

On Fri, Feb 7, 2014 at 10:17 AM, Brandon Clark <a.brandon.clark@REDACTED> wrote:

I have 2 production systems, one Erlang and one not, both maintaining copies of a large, deeply-nested Erlang data structure.  I need to set up a monitoring script to confirm that both systems are holding identical copies of the data structure.

The non-Erlang system is holding an ASCII rendering of the data structure.  Computing an MD5 sum of this string is easy.  If I can get the Erlang system to convert its data structure to a string, I can have it compute an MD5 sum as well and the monitoring is simply a matter of comparing hashes.

I'm stuck on the process of converting the Erlang term to a string.
Str = io_lib:format("~p", [Data])

gives me what I want, except that it includes newlines and indentation that I can't expect the non-Erlang system to have.

Str = io_lib:format("~w", [Data])
eliminates the newlines and indentation but renders the textual components of Data as lists of integers, guaranteeing the result won't match the non-Erlang system.

So the question is, how do I get a "~p"-style rendering of an arbitrarily-large Erlang term without newlines and indentation?
Thank you!

~Brandon Clark

_______________________________________________

erlang-questions mailing list

erlang-questions@REDACTED

http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
erlang-questions@REDACTED
http://erlang.org/mailman/listinfo/erlang-questions 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20140208/839ec1ae/attachment.htm>