pg2 is broken R13B04

Tue May 4 16:16:18 CEST 2010

I should also add that we have had pg2 crash a VM on us, where we have some groups with in excess of 1,000,000 members (when only 140 processes have done a pg2:join - the join is done in the process's init function so we know it's only sent once).

Of course, what happens is:

1) A huge message is built and sent.

2) This message is created and processes in a list comprehension.

We have implemented a workaround by creating pg3 that only permits a single join per process.

We did this in the pg2:join_group by modifying the ets:update counter UpdateOp from {2,+1}, to {2,+1,1,1} (and similar logic in pg2:leave_group).

Matt

________________________________
From: Evans, Matthew
Sent: Tuesday, May 04, 2010 10:12 AM
To: 'erlang-bugs@REDACTED'
Subject: pg2 is broken R13B04

Hi,

So after more tests I have seen that pg2 is definitely not working as intended.

It appears that the root problem is how a new instantiation of pg2 within a cluster of Erlang nodes gets its data.

The following sequences of events occur.

1) All nodes do a net_kernel:monitor_nodes(true) in the init function of pg2.

2) The new instance of pg2 will send {new_pg2, node()} to all other nodes in the pool.

3) The new instance of pg2 will send {nodeup, Node} to itself (where nodes is a list of nodes()).

What it appears is that when only 2 nodes are in the pool things are generally ok. However, the synchronization process gets muddied when there are many members.

The process of updating the nodes is that upon receipt of {new_pg2,Node} or {nodeup,Node} to literally go through the table of pids in the ets pg2_table and build a list similar to:

[proxy_micro_cache,[<6325.319.0>,<6324.324.0>]]]

This is dispatched to the new pg2 instance. The problem is every node does that, so the new pg2 instance will end up with a table like:

[proxy_micro_cache,
  [<6437.319.0>,<6437.319.0>,<6437.319.0>,<6437.319.0>,
   <6437.319.0>,<6437.319.0>,<6436.324.0>,<6436.324.0>,
   <6436.324.0>,<6436.324.0>,<6436.324.0>,<6436.324.0>]]]

Where there are many instances for each Pid since each node has sent its copy of the data, causing that process to be replicated many times (i.e. the call to ets:update_counter in pg2:join_group)!!!

This problem is compounded further on nodes that join later, or when a VM stops and is restarted.

An additional problem arises when another process in the new group sends pg2:join on its own. In that case there is a timing window whereby the new instance could get that new entry more than once.

My recommendation is:

1) Have a new pg2:join function called pg2:join_once. In this case a process will never be permitted to have more than 1 join.

2) When a new node joins one could either select only one node to get its data from, or have all nodes in the system send the result of ets:tab2list(pg2_table) to the new node, then have that data inserted directly into its local ets table, as opposed to going through the process of join_group (possibly with the additional step erlang:monitor/2). In this way a process that has been registered more than once would be inserted into the local ets table as a single operation as opposed to many times.

3) Possibly defer new requests to pg2:join on the new instance until synchronization is complete.

I understand that gproc is on the way, but I suspect that pg2 does need fixing.

Regards

Matt