[erlang-questions] Maps

Tue May 14 19:37:15 CEST 2013

On Tue, May 14, 2013 at 12:25 PM, Jeremy Ong <jeremy@REDACTED> wrote:
>> Code upgrades involve only invoking new functions from old functions
>> -- these are syntactically different functions and hence may have
>> different static types.
>
> What about running processes? How does a gen_server upgrade work? The state
> isn't explicitly mutated but it does need to transition.

The old loop() function invokes the new upgrade() function, which
invokes the new loop() function.  The two loop() functions are
separate functions and hence may have different types.  There's no
magic or hand-waving involved.

>> Yes, and OCaml provides Obj.magic (C-style type cast).  However it,
>> like list_to_atom, are *explicit* -- I know immediately by its
>> presence that something very weird is going on which warrants closer
>> inspection, and its uses are almost *always* at I/O boundaries.
>>
> C++ is an obvious extreme of a language featuring many things that could
> involve "code smell" but each of which may be situationally useful. Again,
> this is an extreme example but you get the idea.

Extreme indeed ;)  However if feature X is useful only in certain
situations but is a code smell in most others, then I claim that
feature X is poorly designed.  C++ templates come to mind -- yes, they
are useful for massive abstraction, but they provide enough means of
abuse that one should consider them a poorly designed feature.
(OCaml's module system is a superb example of a feature intended for
massive abstraction that makes doing "bad" things nigh impossible.)

>> JSON is not dynamic.  I must know exactly what fields exist in any
>> JSON object I'm working with, or I'm doing something wrong.  (This
>> design pattern is evinced by the fact that JSON keys may not be
>> arbitrary values.)
>
> Perhaps I misunderstand what you mean by the term dynamic. I should be able
> to access the field "foo" or 5 inside a json object regardless of whatever
> other crap is there.

Static structure does not preclude subtyping.  OCaml's functional
objects capture this notion perfectly -- an object has a set of
methods which are known at compile time (hence static).  However any
given function need only be aware of the presence of the subset of
methods in which it is interested.  Note that this does not preclude a
function from creating a copy of said object with only those methods
modified!

By "dynamic" I mean not determinable at compile time: e.g. the length
of a list is generally not knowable at compile time, as the point of a
list is expressly to sequence an arbitrary number of items.

Not knowing which members are present in a structure renders the
structure nigh useless -- if a structure is missing some data my code
expects it to contain, then it is for all purposes broken.  (Note that
using the absence of a member to indicate a "null" or default value is
folly -- how am I to distinguish between code wishing to use a default
value, and code which incorrectly did not specify a value due to a
typo?)

Naturally we encounter a problem at the I/O boundaries of our program
-- if we are reading JSON data from the network, we can in no way
guarantee it will contain (a superset of) the fields we expect.  But
this is exactly where input sanitation must take place: if we blindly
accept foreign data, we open ourselves up to security risks (viz.
list_to_atom).  To ensure correct and secure operation, we must
explicitly sanitize the homogeneous dynamic data from the network (via
parsing) to obtain heterogeneous static data (via construction).

Of course we may permit the function "binary_to_frame" as a shortcut,
but, like list_to_atom and the infamous eval, we lose the ability to
typecheck the result (unless we decorate with assertions, which
fortunately Erlang provides for), and we open up security holes.

> Yea I'm wrong here, for some reason, I thought JSON keys could be integers
> as well.

IIRC Javascript object keys can be (this is how arrays work), perhaps
this is what you were recalling?

>> The problem you describe is one of data interface.  Assume for a
>> moment that the network does not exist, and all JSON data lives purely
>> within Erlang.  JSON data may then happily be represented as a
>> heterogeneous static structure (e.g. a frame) rather than a
>> homogeneous dynamic one (e.g. a keylist).
>
> A better example would be the interface between Erlang and Ruby. Ruby hashes
> are heterogeneous and dynamic and allow for integer, atom, or string keys.
> If I want full interoperability between them, I will need to use a
> heterogeneous structure.

In more strict statically typed languages (e.g. the ML family), this
issue is resolved by tagged unions -- the type of a Ruby key could be
"{int, integer()} | {atom, atom()} | {str, string()}".  This preserves
type-ability at the cost of (admittedly annoying) extra keystrokes.

Erlang, being dynamically typed, maintains an implicit tag for each
type, the full union of which is denoted "any()"; hence the above use
case may be supported by claiming that the Ruby dict() maps type any()
to type any().

Obviously this only works for maps, not frames -- the reason I argue
*for* frames (not against maps! I don't care if there's both!) is
precisely because I *don't* want to be able to do this: I want my
frames to have static structure, and I want the compiler (or Dialyzer)
to kick me in the pants if they don't!