[erlang-bugs] [erlang-questions] Weird pg2 behavior

Hans Bolinder hans.bolinder@REDACTED
Wed Apr 16 10:46:22 CEST 2008


[Matthew Dempsky:]
> I think I came up with a race condition scenario for pg2:
> 
> Node1 and Node2 are running and communicating.
> Pid is on Node1, and a member of pg2 group Group.
> Node3 starts up and connects to Node2 first.
> Node3 and Node2's pg2 processes exchange their group tables,
> and Node3 ends up with Pid in the Group table.
> Node1 goes down before Node3 connects to it.
> A nodedown message is sent to Node2, who clears Pid from Group.
> Node1 restarts and exchanges group tables with Node2 and Node3 again,
> and Pid remains in the Group table.
> 
> I'm thinking a possible solution would be when a pg2 process receives
> a {nodeup, Node} message, it first removes all pids from Node in its
> ets tables and relies on the exchange to reinstate them.

This solution probably works most of the time. But if Global doesn't
maintain a fully connected network, or Global encounters problems
while connecting nodes, it probably won't work.

I think pg2 should monitor all pids rather than link to local pids
only. Carefully setting up monitors, like Global does since R11B-4,
should work.

Best regards,

Hans Bolinder, Erlang/OTP team



More information about the erlang-bugs mailing list