[erlang-questions] Erlang term to ASCII

Fri Feb 7 23:49:29 CET 2014

Hashing of a string representation is probably not a great way to ensure
that both systems agree on that structure. Even different versions of
Erlang disagree on what exactly the representation of ~p should be for a
given term, and I'm not talking about whitespace. It's simply not well
suited for your purpose, even though it might appear to work for most of
the cases you have now.

You could write your own Erlang implementation of whatever the CMS does,
but you're unlikely to implement it correctly by taking "~p" and then
trying to normalize the output.

You really should come up with a scheme to either hash the data structure
itself (like Erlang's phash2 but portable to both sides of your system), or
you should hash the output of a representation that has absolutely no
flexibility in how data is encoded. Erlang's External Term Format is a lot
closer, but there are still multiple valid ways to encode the same thing
(particularly the atom cache table, using bigger encodings than necessary,
etc.).

On Fri, Feb 7, 2014 at 2:16 PM, Brandon Clark <a.brandon.clark@REDACTED>wrote:

> The non-Erlang app is an in-house configuration management system.  It
> stores configuration settings in a SQL database and produces the app.config
> file used by my Erlang app on demand via REST interface; the configuration
> is delivered as a string in the body of an HTTP response.
>
> My ultimate goal is to rework the Erlang app so that it fetches and
> applies most configuration updates automatically without restarting.  For
> the near term, that means I have to cope with strings.  In the long term,
> though, I have some sway over what the CMS does, provided I'm willing to
> get my hands dirty.  ("Don't bring problems -- bring solutions.")  I'm in
> favor of Erlang external terms and I think I can sell that solution with
> BERT and BERT-RPC.
>
> Regardless of how we transmit the data, we still need a way to confirm
> that both systems agree on what the data structure contains.  The current
> favored solution is to ask both systems to compute a hash of their data.
>  We can normalize the data any way we want in preparation for hashing as
> long as both sides do it in exactly the same way.  Since the CMS is already
> pretty good at building Erlang terms as strings without newlines or
> indentation, I set out to find a way to make Erlang do the same thing.
>
> ~BC
>
>
>
>
>
>
> On Fri, Feb 7, 2014 at 11:16 AM, Bob Ippolito <bob@REDACTED> wrote:
>
>> How is this other system maintaining the deeply-nested Erlang structure?
>> How or why is it in text? How does it get rendered without newlines and
>> indentation? Why not use a more predictable (less flexible) serialization
>> format (erlang term format, JSON, protocol buffers, …)?
>>
>> I do not think that this is the best approach. There are many possible
>> representations of various tokens as Erlang source code and I would never
>> trust two implementations to render it exactly the same way unless I wrote
>> both of them. Some tokens that are going to be particularly problematic are
>> lists of integers (which may or may not be rendered like strings, with
>> various ways to escape), binaries (which may be rendered like strings or
>> not), and floating point numbers.
>>
>>
>>
>>
>> On Fri, Feb 7, 2014 at 10:17 AM, Brandon Clark <a.brandon.clark@REDACTED
>> > wrote:
>>
>>> I have 2 production systems, one Erlang and one not, both maintaining
>>> copies of a large, deeply-nested Erlang data structure.  I need to set up a
>>> monitoring script to confirm that both systems are holding identical copies
>>> of the data structure.
>>>
>>> The non-Erlang system is holding an ASCII rendering of the data
>>> structure.  Computing an MD5 sum of this string is easy.  If I can get the
>>> Erlang system to convert its data structure to a string, I can have it
>>> compute an MD5 sum as well and the monitoring is simply a matter of
>>> comparing hashes.
>>>
>>> I'm stuck on the process of converting the Erlang term to a string.
>>>
>>> Str = io_lib:format("~p", [Data])
>>>
>>> gives me what I want, except that it includes newlines and indentation
>>> that I can't expect the non-Erlang system to have.
>>>
>>> Str = io_lib:format("~w", [Data])
>>>
>>> eliminates the newlines and indentation but renders the textual
>>> components of Data as lists of integers, guaranteeing the result won't
>>> match the non-Erlang system.
>>>
>>> So the question is, how do I get a "~p"-style rendering of an
>>> arbitrarily-large Erlang term *without* newlines and indentation?
>>>
>>> Thank you!
>>>
>>> ~Brandon Clark
>>>
>>>
>>> _______________________________________________
>>> erlang-questions mailing list
>>> erlang-questions@REDACTED
>>> http://erlang.org/mailman/listinfo/erlang-questions
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20140207/c9e67d1c/attachment.htm>