[erlang-questions] Distributed Process Registry

Sun Feb 8 21:51:52 CET 2015

On 02/08/2015 11:56 AM, Roberto Ostinelli wrote:
> Dear list,
> I have 3 interconnected nodes to which various devices connect to.
> Once a device connects to one of those nodes, the related TCP socket events are handled by a device_loop process on the node that it originally connected to.
>
> Every device is identified via its id (a binary). I need to enable communication from one device to the other based on these ids, even within different nodes. I have around 150k device processes per node (so up to 500k in total).
>
> So, I basically need a global process registry. Not new, but haven't used one in a while now.
>
> As far as I can tell, my main options to send messages from one device process to the other based on their id are the erlang global module, ulf's gproc, or implement a custom solution based on, for instance, mnesia in ram only.
>
>
> I was first thinking of leaning towards using the erlang global module, since register_name/2,3 now also allows general terms to be used as Name. The advantages I see:
>
>   * It is a simple built-in mechanism.
>   * If a node goes down, the global names registered on that node are unregistered automatically.
>   * If a new node is added, the global names registered are propagated automatically.
>
> The cons:
>
>   * I always feel that process registration should be used to identify long-running services.
>   * I don't know if 500k is an acceptable number (i.e. if the global module is made to support my use case).
>
>
> I also looked into gproc. The advantages I see:
>
>   * Actively maintained, it seems to have been built for my use case.
>
> The cons:
>
>   * For the distributed part it relies on gen_leader. I've heard too many horror stories on gen_leader. Maybe that's not a thing anymore.
>   * Not sure what happens if a node goes down / a new node is added.
>
>
> I've considered a custom solution based on mnesia distributed ram-only tables that would store the pids of the device loops based on their binary id.The advantages I see:
>
>   * Mnesia will take care of distributing, handling down events, etc.
>
> The cons:
>
>   * I need to reinvent the wheel and ensure that when a node goes down, all the device entries in the distributed mnesia tables related to that node are removed.
>
>
>
> Has someone recently implemented a distributed process registry and can shed some light for me?
>
> Thank you in advance for your advice ^^_
> r.
>
>
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
You are missing a few options:

http://www.erlang.org/doc/man/pg2.html
* Any term can be used for a name

https://github.com/okeuday/cpg/
* By default uses string (list of integer) names, but can be changed with group_storage application env setting (e.g., to dict)
* Supports any number of scopes, which are atoms that are used as locally registered cpg process identifiers (pg2 only supports the single global scope stored in ETS)
* Supports the via syntax, like gproc does, with variations that allow pools to be created (https://github.com/okeuday/cpg/blob/master/test/cpg_test.erl#L83-L104)

Both pg2 and cpg allow you to avoid centralized global state (the state used in gproc, locks_leader, mnesia, global) so that netsplits do not require an arbitrary process to resolve state conflicts.  That is very important for reliability.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20150208/89733eca/attachment.htm>