[erlang-questions] Best practices for distributing an application over erlang?

Tue May 3 10:04:34 CEST 2011

On Wed, Feb 16, 2011 at 5:58 AM, Gregory Haskins
<gregory.haskins@REDACTED>wrote:

> Hi All,
>
> I have been struggling trying to wrap my head around what might be the
> best-practice for distributing a clustered erlang-application to a
> plurality of erlang nodes, and keeping the cluster up to date from a
> central location.
>
> Ideally, the subordinate erlang nodes that have just-enough logic to
> connect to one another (e.g. cookies, ssl-keys, gossip-controller
> addresses, etc are pre-distributed) but otherwise no specific
> personality is known apriori.  I can then implement some kind of central
> service to remotely load/run a specific application (and any of its
> dependencies).  IOW: This is really about bootstrap for the time being
> (e.g. Hot update is nice, but not a short term requirement at the
> moment).  The right conceptual vehicle for the payload would be a
> standard erlang-release.
>
> So I have given this a lot of thought and read a ton of the books/doc on
> the various options here.  For instance, I have seen Joe Armstrong's
> examples of sending a lambda down as payload in a message, using
> dist_ac, boot-server, release_handler, etc.  Every solution seems to
> present its own set of problems/limitations, so I am unclear as to what
> might be my best option here.
>

Oh dear - confusion reigns.

There is a fundamental problem - no two distributed systems have the same
requirements as regards fault-tolerance, behavior under load, latency,
scalability etc.
there are just two many variables.

So what do we do in practice? - one common solution is the BFI (Brute force
and Ignorance)
solution. Assume a system of K physically separated nodes. K being say 5..25
(this is all very
hand waving for K < 5 a different approach is probably needed) - for K <
1000 (ish) then
full replication of all the code on a local file system is perfectly
feasible. Forget everything
about hot code upgrades etc. Simple take an entire node out of service,
change all the code on the
node, put it back into service. This implies that you implement some kind of
data exchange protocol
to get all the active data you need from a node before closing it down and
to restore the data
before you restart it.

This is for planned outages - you can actually do this pretty quickly so
it's not so bad - if it takes more
than a few seconds then look at your algorithms, something is wrong.

For non-planned outages (crashes) your design should put the system into a
stable state and
try to recover.

You can of course, make things far more fine grained than this, using
load_binary() etc.
In my experience is used in development to get a quick turn-round times when
developing.
It's very convenient to just throw new code at the system to see what will
happen, but this is
very different to a planned roll-out of new code in a production system.

I think you need to think about "what code replacement do I need in
development" (my suggestion would
be to just slurp new code over without worrying, if it crashes who cares,
restart everything) and "how do
I upgrade the code in a production system" (with great care node-at-a-time
round-robbin fashion).

It occurs to me that there is a good opportunity here for a complete
framework to support
common generic cases - ie think "gen_cluster" rather than
gen_supervisor/server. I have seen such beasts
but they have a heck of a lot more callbacks than gen_server :-)

>
> For instance, sending functions as payload only seems to work as long as
> that function only invokes modules that already loaded/available on the
> remote node.  I can, of course, ensure that all required modules are
> loaded on the remote node using things like code:load_binary(), but this
> would mean I would need to a) figure out what modules are needed as part
> of the release I want to distribute, and b) manually load each and every
> module to the remote node, right?
>
>
Well manually isn't so bad - a program does if for you - Its just a matter
of sending a few
Kbytes .. MBytes down a wire, hardly worth optimising ...

> As far as dist_ac, this seems to be more geared towards creating
> active/standby replicas of an app, rather than a process-peer model, so
> that's out.
>
> boot-server is getting warmer, though I really want my grid to use
> SSL-mutualauth + cookies, and I am not sure if the erl_prim_loader can
> work over SSL inets.
>
> release_handler is getting warmer, though I still need to address the
> bootstrap problem (either pre-load the initial release or use the
> erl_prim_loader, as above).  The other problem here is its focus is
> really on hot-update, and at this juncture I really only care about
> remote-update (e.g. cold restart is probably ok for now).
>

>
> One solution I thought of is a play on the boot-server model, where I
> can have one erlang application that connects in whatever ssl/cookie
> protocol I want, contacts the service, downloads the release.tar.gz, and
> then fires up a new beam instance with the release...hot updates, if
> desired, could just use standard release-handler calls direct to the
> subordinate vm.  The problem with this solution is then I get into "well
> how do I keep my loader-app up to date?" and my head explodes in a
> circular reference ;).
>

Why not just have one simple roll-your-own boot-loader that you *never*
change. If you make it very simple you'll never have to change it.

There are bits of the system you can change later and bits you cannot change
later.
Make you mind up. Remember that what you can and cannot change at run-time
is a design decision.
Just make you mind up and don't change it. Your brain will go into a loop if
you start thinking
about how to change the boot loader which you had previously decided that by
design
"could not be changed". This is a common mistake in system design - first
you say
"X is fixed" you design the system with this in mind. Some person says "what
happens if we change X"
and you brain goes into a loop - "correct answer - we can't X is fixed"

Having fix points in a design make life really really easy

Decide that "I will never change the boot loader" and write your own boot
loader
so that you completely understand it and never change it - think of it as
hardware.

I've given an example below

>
> Long story short(er):  I suspect I am just making all of this harder
> than it needs to be, and there must be a ton of you who are just doing
> similar operations all the time.  Any help/pointers/guidance/suggestions
> appreciated.
>
>
It's actually not that difficult. You need to be able to remote stop a
system
send a file to the stopped system, restart the the stopped system all in a
controlled manner.

I did this on planet lab a few years ago. I made a simple socket server with
a layer of encryption.
the bootstrap opened a single port and listened a term-to-binary encoded
message.
Something like:

(this the boot loader - never change it - it is so simple you will never
have to change it)

boot(Socket) ->
    receive
        {tcp, Socket, Bin} ->
               {M,F,A} = binary_to_term(Bin),
               apply(M,F,A),
               boot(Socket);
    end

Then I send tuples like {file,write_file,[<<Bin>>]}, etc. to the remote
nodes from a single master node.
You can just ping the remote nodes or keep a socket open to detect failure.

You don't have to use *any* of the release stuff nor the kernel distribution
stuff if you don't want to.

/Joe

> Kind Regards,
> -Greg
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20110503/9811b0f0/attachment.htm>