[erlang-questions] Leader elections and quorum stuff
Roger Lipscombe
roger@REDACTED
Mon Aug 10 18:36:56 CEST 2015
I've got a situation where I have a cluster of nodes.
What's the current state of the art for deciding who decides whether a
node is down? To rephrase: are there any good algorithms (or Erlang
libraries) that decide which subset of nodes should monitor another
(all other?) nodes? I don't want every node monitoring every node (or
do I?)
Also, once they've detected a failure, how to distribute the dead node's work?
By work, each node is running a *large* number of different long-lived
jobs. If one of the nodes dies, I need to distribute those jobs fairly
across the other nodes in the cluster. A single job should not run in
more than one place.
Assume that every node knows about every other node's assigned work,
either through some kind of gossip protocol, or through a shared
store.
I'm kinda assuming that the monitoring nodes will hold a quick
election, so that there's only a single arbiter, but anything that
shows how to do that without a single leader would be good too.
Thanks,
Roger.
More information about the erlang-questions
mailing list