gen_leader bug

Ulf Wiger (AL/EAB) ulf.wiger@REDACTED
Mon Jan 10 11:52:58 CET 2005


Thomas,

Having looked into this briefly, I'm of the opinion that 
lexcompare/2 can only return 'equal' if the leaders have been 
misconfigured. I think that it would be better for it to 
exit, but as far as I can tell from the code, and based on my 
recollection, lexcompare/2 can only return 'equal' if:

(1) two nodes have been accepted by the same number of peers 
    (which is possible), and
(2) they have the same position in the list of candidates

If (2) is true, the leaders are misconfigured. They should all 
have identical candidate lists. (*)

The documentation states that "the list of candidates must be 
known from the start".

This is perhaps a bit vague. It should also state that all leader 
candidates must be started with identical candidate lists.

If my assumption seems incorrect, please let me know.

Regards,
Uffe

(*) Obviously, this calls for some caution when the candidate 
list needs to be updated. One way to do this is during a code
change (which, using the OTP release handler support is 
synchronized across nodes.) A reasonable restriction during 
code change is that if one participating node dies while 
upgrade is in progress, the code change is rolled back.
At the moment, gen_leader callbacks are able to update the 
candidate list during code change, but only by breaking the
record abstraction. One additional exported function, e.g.
set_candidates/2, from gen_leader.erl could fix this.

-----Original Message-----
From: owner-erlang-questions@REDACTED [mailto:owner-erlang-questions@REDACTED]On Behalf Of Fee, Thomas
Sent: den 4 januari 2005 22:44
To: erlang-questions@REDACTED
Subject: gen_leader bug


Hello All, 
We have encountered a problem with gen_leader. All our servers died simultaneously. The bug is this: 
The function lexcompare returns one of: 'equal', 'less', or 'greater'. The function safe_loop calls lexcompare when it receives a 'capture' message from a server. Unfortunately, safe_loop only handles 'less' and 'greater' in the lexcompare returned result.
Our server ran into a scenario when 'equal' was returned from lexcompare. Hence, the servers crashed. 
What should be done to fix this missing case clause problem? 
Thanks - 
- Thomas Fee 


DISCLAIMER: This message (including any files transmitted with it) may contain confidential and/or proprietary information, is the property of Interactive Data Corporation and/or its subsidiaries, and is directed only to the addressee(s). If you are not the designated recipient or have reason to believe you received this message in error, please delete this message from your system and notify the sender immediately. An unintended recipient's disclosure, copying, distribution, or use of this message or any attachments is prohibited and may be unlawful.



More information about the erlang-questions mailing list