[erlang-questions] Best Practice in Map keys: Atoms or Binaries (or Strings)?

Sun Oct 2 12:13:05 CEST 2016

On Sat, Oct 1, 2016 at 10:36 PM, Lloyd R. Prentice <lloyd@REDACTED>
wrote:

> Can you please explain this.

In languages such as OCaml, you can define algebraic datatypes which encode
certain invariants of your program. By doing so, you can sometimes build
your construction such that illegal states cannot be represented. This
"make the illegal states impossible" approach has been handled by Yaron
Minsky, among others. Concretely, a good way of defining the state of a
file is:

type file_state = Closed | Open of (char Stream.t)

In Erlang, we would write:

-type file_state() = closed | {open, port()}

but the gist is the same thing. When we match on the file_state, the
interesting thing happens:

let barney fs =
    match fs with
    | Closed -> ...
    | Open stream ->
        ...

Note how we only have access to the 'stream' component when the file is
open, but not when it is closed. If we manage this invariant in the code,
there is no way to accidentally sit with a closed file descriptor[0]. In
Erlang, we can get much the same flow:

barney(.., closed) -> ...;
barney(.., {open, Port}) ->
    ...

Note again how the only way to get to the port field is when we have an
open port. The--admittedly contrived--naive approach is to write:

-record(state, {
    status = closed :: closed | open,
    port = undefined :: undefined | port()
}).

But note that we can thus write:

State1 = #state{ status = closed, port = undefined },
State2 = #state{ status = open, port = Port },
State3 = #state{ status = closed, port = Port },
State4 = #state{ status = open, port = undefined }

Here states 1 and 2 are valid states, but 3 and 4 are not. Yet our record
allows for their representation! It is a common mistake in programming to
write down such illegal states[1], and by alluding to the algebraic
datatype, you can avoid them. The key idea is to encode your state as a
term which has no extra information at any point, but is precise as to what
data/information you have at a given point in time.

For a real-world example, see https://github.com/shopgun/turtle , in which
this technique is used in a couple of places. Turtle is a wrapper for
RabbitMQ making the official driver a bit more OTP-like. In RabbitMQ
(AMQP), you first open a connection and draw channels inside the
connection. Communication happens on a certain channel, not on the
connection. In order to handle connections as Fred Hebert writes in his
"Its about the guarantees" post[2], we want to start up processes in a
known state, and then switch their internals once we have a valid
connection.

In particular, if you want to publish to RabbitMQ, you want to add a
publisher process to your own supervision tree. This process will have to
wait until a connection is established to RabbitMQ and then it will need to
draw the channel and connect. The publisher is a gen_server and its
Module:init/1 callback is:

https://github.com/shopgun/turtle/blob/401aea5dc13256f1ed5fbf70830e86
153a4db740/src/turtle_publisher.erl#L151-L155

init([{takeover, Name}, ConnName, Options]) ->
    process_flag(trap_exit, true),
    Ref = gproc:nb_wait({n,l,{turtle,connection, ConnName}}),
    ok = exometer:ensure([ConnName, Name, casts], spiral, []),
    {ok, {initializing_takeover, Name, Ref, ConnName, Options}};

The process_flag/2 is for handling the fact the official driver cannot
close down appropriately. By writing the publisher with trap_exit, we can
protect the rest of the Erlang system against its misbehavior. We set up
gproc to tell us when there is a connection ready. Then we tell exometer to
create a spiral so we can track the behavior of the publisher in our
metrics solution. Finally, we get into the "initializing" state. Note how
we don't use the "real" state here. Then later on in the file, we handle
the message from grpoc:

https://github.com/shopgun/turtle/blob/401aea5dc13256f1ed5fbf70830e86153a4db740/src/turtle_publisher.erl#L210-L230

handle_info({gproc, Ref, registered, {_, Pid, _}}, {initializing, N, Ref,
CName, Options}) ->
    {ok, Channel} = turtle:open_channel(CName),
    #{ declarations := Decls, passive := Passive, confirms := Confirms} =
Options,
    ok = turtle:declare(Channel, Decls, #{ passive => Passive }),
    ok = turtle:qos(Channel, Options),
    ok = handle_confirms(Channel, Options),
    {ok, ReplyQueue, Tag} = handle_rpc(Channel, Options),
    ConnMRef = monitor(process, Pid),
    ChanMRef = monitor(process, Channel),
    reg(N),
    {noreply,
      #state {
        channel = Channel,
        channel_ref = ChanMRef,
        conn_ref = ConnMRef,
        conn_name = CName,
        confirms = Confirms,
        corr_id = 0,
        reply_queue = ReplyQueue,
        consumer_tag = Tag,
        name = N}};

When we have a valid connection, gproc tells us. And we are in the
initializing state. So we then set up the fabric in RabbitMQ, set
appropriate monitors, register ourselves and then build up the "real" state
record for the process.

The idea here is that we have a special state for when we are initializing
which is different from the normal operating state in the system. This
avoids having to populate our #state{} record with lots of "undefined"
values. In turn, we can define the type scheme of the #state{} record more
precisely because when we initialize it, every value is a valid value. Now
the dialyzer becomes far more powerful because it has a simpler record to
work with type-wise.

The original question was about maps. Since maps are dynamic in nature, you
can sometimes use them by avoiding to populate fields before they are
available and have valid data in them. This gives you the same structure as
above, albeit simpler. You could encode the closed state as

State1 = #{}

and the open state as

State2 = #{ port => Port }

Now, any match which needs the port must match on it:

case State of
    #{ port := Port } -> ...
end,

so you cannot by accident have an uninitialized port. In Erlang/OTP Release
19.x we even have the dialyzer able to work with maps, so we can type this
as well and get the dialyzer to figure out where there are problems in the
code base.

The method is not nearly as powerful as it is in OCaml. Static type systems
can tell you, at compile time, where your constructions are wrong. With a
bit more type-level work, you can encode even more invariants. For
instance, you can discriminate a public key from a private key in a
public-key cryptosystem, without ever having a runtime overhead of doing so.

[0] Of course, this is slightly false. Network sockets may close for other
reasons for instance.

[1] The Go programming language is notorious for doing this. Either by
returning a pair (result, error) where the error if nil if the result is
valid and vice versa. Or by using a struct in which certain fields encode
if other fields are valid.

[2] http://ferd.ca/it-s-about-the-guarantees.html

-- 
J.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20161002/cd3ada47/attachment.htm>