[erlang-questions] Leader elections and quorum stuff

Roger Lipscombe roger@REDACTED
Mon Aug 10 18:36:56 CEST 2015


I've got a situation where I have a cluster of nodes.

What's the current state of the art for deciding who decides whether a
node is down? To rephrase: are there any good algorithms (or Erlang
libraries) that decide which subset of nodes should monitor another
(all other?) nodes? I don't want every node monitoring every node (or
do I?)

Also, once they've detected a failure, how to distribute the dead node's work?

By work, each node is running a *large* number of different long-lived
jobs. If one of the nodes dies, I need to distribute those jobs fairly
across the other nodes in the cluster. A single job should not run in
more than one place.

Assume that every node knows about every other node's assigned work,
either through some kind of gossip protocol, or through a shared
store.

I'm kinda assuming that the monitoring nodes will hold a quick
election, so that there's only a single arbiter, but anything that
shows how to do that without a single leader would be good too.

Thanks,
Roger.



More information about the erlang-questions mailing list