[erlang-patches] Fix race in mnesia_monitor - lost/ignored node-up event
Thu Aug 29 11:43:18 CEST 2013
On 08/29/2013 11:00 AM, Jonas Falkevik wrote:
> if network goes down during a short period, approx 40s, a node_down event is generated followed by a node_up event, which is not handled properly.
> The node_down& node_up events can be received before the remote linked process by mnesia_monitor is generating an EXIT-message.
> And since the mnesia_monitor is handling the node_up event only after the EXIT-message, and some logic to set the mnesia node as down, we have a race.
> Hence network partition is not detected for all cases.
> To reproduce the problem I have used two virtual machines and unplugging the cable during approx. 40s.
> While doing a net_adm:ping/1 between the nodes.
> I haven't been able to do any automated test-case... yet.
> Please have a look at the following patch to fix the problem, there are most certainly a better way of fixing the race if you dig deeper into or know the mnesia internals.
> I can rebase the patch on maint-branch if needed, currently it is based on the master branch.
> git fetch git://github.com/falkevik/otp.git mnesia_monitor_nodedown_race_fix
> erlang-patches mailing list
I've fetched your patch and assigned your patch to be reviewed by
BR Fredrik Gustafsson
Erlang OTP Team
More information about the erlang-patches