[erlang-questions] Distributed Process Registry

Sun Feb 8 22:10:14 CET 2015

On 02/08/2015 12:59 PM, Roberto Ostinelli wrote:
> Hello Michael,
> I know of those but I've left them out as I do not need group mechanisms: I'm not interested in broadcasting messages to multiple devices. Just a 1-to-1 messaging.
That usage of pg2 and cpg just means that you would only have a single pid in a group.  It might seem a bit different, but I don't think there is a better alternative.
>
> Is there any reason why using these process groups be beneficial in my use case?
The main reason is that you avoid the need to resolve state conflicts when global state gets merged after a netsplit.  With pg2 and cpg, all the state relevant to the local node is stored locally and remote state gets merged as nodes are added.  When a node dies, its pids are removed, as expected, but there is no need for centralized global state.
>
> Thank you for your input,
> r.
>
>
>
> On 08/feb/2015, at 21:51, Michael Truog <mjtruog@REDACTED <mailto:mjtruog@REDACTED>> wrote:
>
>> On 02/08/2015 11:56 AM, Roberto Ostinelli wrote:
>>> Dear list,
>>> I have 3 interconnected nodes to which various devices connect to.
>>> Once a device connects to one of those nodes, the related TCP socket events are handled by a device_loop process on the node that it originally connected to.
>>>
>>> Every device is identified via its id (a binary). I need to enable communication from one device to the other based on these ids, even within different nodes. I have around 150k device processes per node (so up to 500k in total).
>>>
>>> So, I basically need a global process registry. Not new, but haven't used one in a while now.
>>>
>>> As far as I can tell, my main options to send messages from one device process to the other based on their id are the erlang global module, ulf's gproc, or implement a custom solution based on, for instance, mnesia in ram only.
>>>
>>>
>>> I was first thinking of leaning towards using the erlang global module, since register_name/2,3 now also allows general terms to be used as Name. The advantages I see:
>>>
>>>   * It is a simple built-in mechanism.
>>>   * If a node goes down, the global names registered on that node are unregistered automatically.
>>>   * If a new node is added, the global names registered are propagated automatically.
>>>
>>> The cons:
>>>
>>>   * I always feel that process registration should be used to identify long-running services.
>>>   * I don't know if 500k is an acceptable number (i.e. if the global module is made to support my use case).
>>>
>>>
>>> I also looked into gproc. The advantages I see:
>>>
>>>   * Actively maintained, it seems to have been built for my use case.
>>>
>>> The cons:
>>>
>>>   * For the distributed part it relies on gen_leader. I've heard too many horror stories on gen_leader. Maybe that's not a thing anymore.
>>>   * Not sure what happens if a node goes down / a new node is added.
>>>
>>>
>>> I've considered a custom solution based on mnesia distributed ram-only tables that would store the pids of the device loops based on their binary id.The advantages I see:
>>>
>>>   * Mnesia will take care of distributing, handling down events, etc.
>>>
>>> The cons:
>>>
>>>   * I need to reinvent the wheel and ensure that when a node goes down, all the device entries in the distributed mnesia tables related to that node are removed.
>>>
>>>
>>>
>>> Has someone recently implemented a distributed process registry and can shed some light for me?
>>>
>>> Thank you in advance for your advice ^^_
>>> r.
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> erlang-questions mailing list
>>> erlang-questions@REDACTED
>>> http://erlang.org/mailman/listinfo/erlang-questions
>> You are missing a few options:
>>
>> http://www.erlang.org/doc/man/pg2.html
>> * Any term can be used for a name
>>
>> https://github.com/okeuday/cpg/
>> * By default uses string (list of integer) names, but can be changed with group_storage application env setting (e.g., to dict)
>> * Supports any number of scopes, which are atoms that are used as locally registered cpg process identifiers (pg2 only supports the single global scope stored in ETS)
>> * Supports the via syntax, like gproc does, with variations that allow pools to be created (https://github.com/okeuday/cpg/blob/master/test/cpg_test.erl#L83-L104)
>>
>> Both pg2 and cpg allow you to avoid centralized global state (the state used in gproc, locks_leader, mnesia, global) so that netsplits do not require an arbitrary process to resolve state conflicts.  That is very important for reliability.
>>
>>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20150208/42939b09/attachment.htm>