[erlang-questions] cluster configuration, or dynamic (re)configuration of distributed application nodes

Fri Jan 25 11:30:47 CET 2013

Hi,

I've faced similar problems in the past although on a much more fine
grained scale (tens of thousands of these processes, but less important).
It was also important to never have overlapping processes. In your case it
sounds like you could well live with having two applications running for
some time.

We ended up using a distributed consistent locking system. The way it works
is that any number of processes tries to grab a named lock and only one
process acquire the lock. The lock gives you a lease for a period of time
and you have to extend the lease before it expires. If the process or node
owning the lock dies and does not clean up after itself, the lock will be
released shortly. This works well for very fine-grained locks. Our
implementation is released under the MIT license and is on
https://github.com/wooga/locker Some more highlights: nodes can be added
and removed to the cluster dynamically, you can have replicas for scaling
reads. I will be presenting it at the Erlang Factory in San Francisco in
March. Disclaimer: it makes some really tough trade offs, for example
cluster reconfiguration assumes all nodes and the network are behaving well.

As you only have one big thing you want to move around a cluster, it sounds
more like you would want to elect a leader among your nodes. When the
leader dies (application is shut down or node goes away or it stops
checking in with the other nodes), you can elect a new leader and start the
application there. gen_leader could help you with this, although I've never
used it myself. Zookeper is a good option, but maybe overkill for just a
smaller piece of your overall app.

If you can live with multiple instances of your app running, it makes
things much more relaxed in terms of consistency. Maybe you could have a
look at how Riak is gossiping around the ring. It's relatively simple,
decentralized and robust. It could serve as an inspiration.

Knut

On Thu, Jan 24, 2013 at 7:13 PM, Motiejus Jakštys <desired.mta@REDACTED>wrote:

> Hi,
>
> we are about to investigate dynamic node reconfiguration of
> distributed applications. Did somebody do that already? How does
> kernel application react to application:set_env(kernel, ...)? Or is
> there a more elegant approach? We have a quite dynamic cluster of
> nodes, and we need one application running all the time. We have our
> own solution which is quite simple, but it *seems* like it should be
> better done in OTP way.
>
> Reason: we have a "cluster manager" application, which collects the
> cluster health status: started applications, requests per second,
> various tests, etc. This information is updated every second via many
> rpc calls to all the nodes.
>
> It would be possible to have the same application running on all the
> nodes, but then they all have the same information. Which is data
> duplication. Not too many messages, but why repeat ourselves?
>
> How do/would you approach this "cluster health" problem?
>
> --
> Motiejus Jakštys
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130125/800741c9/attachment.htm>