[erlang-questions] Registries: How to best work with late-joining nodes?

Oliver Korpilla Oliver.Korpilla@REDACTED
Wed Jul 19 16:29:56 CEST 2017


Hello.

I have no control when the nodes in my application start up. I have a simple star structure where all worker nodes try to connect to a central node as soon as they come up. They may have to retry until that central node is even available.

There is no guarantee that worker nodes start within a certain time frame or even after the central node.

My central node does:
* Start gproc (with gproc_dist set to all)
* Start mnesia
* Start further supervisors and workers 

My worker nodes do:
* Connect to central node
* Start gproc
* Start mnesia
* Starts further supervisors and workers

>From reading gproc source code I see that gproc uses the list of currently connected nodes for picking the leader. I seem to have occasional races where each of the joining nodes see only the central node and not each other and then proceed to declare themselves leader and not talk to each other at all.

But this seems to be only the tip of the iceberg pointing to a deeper problem - that in general nodes joining a cluster late seem to create a lot of process, especially when I cannot guarantee that the center of my star cluster is there first.


*** My question is this: Do other registries like syn or gproc/locks_leader better with late-joining nodes? What process registries do you use for such scenarios? ***


I love the feature-rich API of gproc, and I already have a lot of code centered around it, so anyone who can point me to do this properly or better - syncing up late nodes - would render me a great service. Building in another registry would require some effort and I would want to make sure this is even required.

Thank you and kind regards,
Oliver



More information about the erlang-questions mailing list