Redundancy, gen_leader and network splits

Ulf Wiger (AL/EAB) ulf.wiger@REDACTED
Fri Jun 3 16:22:37 CEST 2005


Tim Bates wrote:
>
> Does the version of gen_leader in jungerl fix the
> bug that was discussed on this mailing list a while
> ago in the election process?

I'm not sure. The guys in Göteborg are working on it,
but I think that a new leader-election algorithm 
basically has to be invented.  ;-)

> Finally, I want to be absolutely sure that there 
> won't be two nodes both thinking they're the leader
> at the same time, which I don't think gen_leader ensures.

It doesn't. It's of course a pathological case in 
that there is no generic solution to the problem.
What one would like to have is a method to detect
it.

One thing that should work is to turn off auto-connect
on your nodes (this is done with the configuration
parameter -kernel dist_auto_connect always | never | once)
If you set it to e.g. 'once', two nodes will not  
reconnect unless at least one of the nodes restarts
(in which case it is able to connect automatically
to the other). This way, if you suffer intermittent
loss of erlang communication, the network will stay
separated until you've figured out what to do.
You can detect the situation e.g. by letting the 
gen_leader processes ping each other through a 
"back door" (say, using UDP). If one leader gets a
backdoor ping from a node that's not in the nodes()
list, you have a problem, and need to decide who gets
to yield.

I don't think gen_leader has to be modified in order
to do this. You can do it on the side and reboot
the "minority leader", once identified.

/Uffe



More information about the erlang-questions mailing list