Redundancy, gen_leader and network splits
Fri Jun 3 18:03:24 CEST 2005
The guys in Göteborg came up with a complete new algorithm, of which
a beta version was only released to Sean so far...
We are writing up a paper about it for the Erlang workshop at the moment.
I guess Hans Svensson (the brain behind all this) will post the new
version soon after we finished the paper. We have very extensively
tested that version, with two new testing technologies. "Testing can
only reveal errors, not the absense of it" (Dijkstra), hence we cannot
be 100% sure. However, there are few products on the market
tested that well.
Ulf Wiger (AL/EAB) wrote:
>Tim Bates wrote:
>>Does the version of gen_leader in jungerl fix the
>>bug that was discussed on this mailing list a while
>>ago in the election process?
>I'm not sure. The guys in Göteborg are working on it,
>but I think that a new leader-election algorithm
>basically has to be invented. ;-)
>>Finally, I want to be absolutely sure that there
>>won't be two nodes both thinking they're the leader
>>at the same time, which I don't think gen_leader ensures.
>It doesn't. It's of course a pathological case in
>that there is no generic solution to the problem.
>What one would like to have is a method to detect
>One thing that should work is to turn off auto-connect
>on your nodes (this is done with the configuration
>parameter -kernel dist_auto_connect always | never | once)
>If you set it to e.g. 'once', two nodes will not
>reconnect unless at least one of the nodes restarts
>(in which case it is able to connect automatically
>to the other). This way, if you suffer intermittent
>loss of erlang communication, the network will stay
>separated until you've figured out what to do.
>You can detect the situation e.g. by letting the
>gen_leader processes ping each other through a
>"back door" (say, using UDP). If one leader gets a
>backdoor ping from a node that's not in the nodes()
>list, you have a problem, and need to decide who gets
>I don't think gen_leader has to be modified in order
>to do this. You can do it on the side and reboot
>the "minority leader", once identified.
More information about the erlang-questions