[erlang-questions] Leader elections and quorum stuff

Mon Aug 10 19:29:34 CEST 2015

The current state of the art for deciding who decides whether a node is
down depends on what you mean by 'decides', 'is', and 'down'.

It turns out to be quite difficult to determine that a node is down; for
example, an issue with networking could cause a node to appear down to some
other nodes, but not others.  Is the node down?  Who can say.

One way of dealing with the problem is to centralize the work distribution,
and have slave workers check out jobs and check in results.  Any worker
that doesn't check in within a particular timeout period is assumed to have
died, and the work is returned to the queue.  Any worker that 'recovers'
and tries to check in its overdue work is ignored.

This is obviously a single point of failure and doesn't scale further than
your single master machine can handle work.  That said, it probably works
for 95% of job distribution problems in the real world.

Beyond that, you get into the world of needing leader election and
quorums.  Basho has done a great deal of work in this area -- see, e.g.
riak_ensemble.  Another approach might be to use, eg., zookeeper or etcd as
work queues.  It's a very complicated area, but if you need it, you need
it.

Pragmatically I'd recommend going with the single master unless you can't
have unavailability until operator intervention when the master crashes.
It's conceptually and operationally easier to deal with, unless you truly
are near that scaling wall.

On Mon, Aug 10, 2015 at 9:36 AM, Roger Lipscombe <roger@REDACTED>
wrote:

> I've got a situation where I have a cluster of nodes.
>
> What's the current state of the art for deciding who decides whether a
> node is down? To rephrase: are there any good algorithms (or Erlang
> libraries) that decide which subset of nodes should monitor another
> (all other?) nodes? I don't want every node monitoring every node (or
> do I?)
>
> Also, once they've detected a failure, how to distribute the dead node's
> work?
>
> By work, each node is running a *large* number of different long-lived
> jobs. If one of the nodes dies, I need to distribute those jobs fairly
> across the other nodes in the cluster. A single job should not run in
> more than one place.
>
> Assume that every node knows about every other node's assigned work,
> either through some kind of gossip protocol, or through a shared
> store.
>
> I'm kinda assuming that the monitoring nodes will hold a quick
> election, so that there's only a single arbiter, but anything that
> shows how to do that without a single leader would be good too.
>
> Thanks,
> Roger.
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20150810/cf6380de/attachment.htm>