gen_leader bug
Ulf Wiger (AL/EAB)
ulf.wiger@REDACTED
Mon Jan 10 11:52:58 CET 2005
Thomas,
Having looked into this briefly, I'm of the opinion that
lexcompare/2 can only return 'equal' if the leaders have been
misconfigured. I think that it would be better for it to
exit, but as far as I can tell from the code, and based on my
recollection, lexcompare/2 can only return 'equal' if:
(1) two nodes have been accepted by the same number of peers
(which is possible), and
(2) they have the same position in the list of candidates
If (2) is true, the leaders are misconfigured. They should all
have identical candidate lists. (*)
The documentation states that "the list of candidates must be
known from the start".
This is perhaps a bit vague. It should also state that all leader
candidates must be started with identical candidate lists.
If my assumption seems incorrect, please let me know.
Regards,
Uffe
(*) Obviously, this calls for some caution when the candidate
list needs to be updated. One way to do this is during a code
change (which, using the OTP release handler support is
synchronized across nodes.) A reasonable restriction during
code change is that if one participating node dies while
upgrade is in progress, the code change is rolled back.
At the moment, gen_leader callbacks are able to update the
candidate list during code change, but only by breaking the
record abstraction. One additional exported function, e.g.
set_candidates/2, from gen_leader.erl could fix this.
-----Original Message-----
From: owner-erlang-questions@REDACTED [mailto:owner-erlang-questions@REDACTED]On Behalf Of Fee, Thomas
Sent: den 4 januari 2005 22:44
To: erlang-questions@REDACTED
Subject: gen_leader bug
Hello All,
We have encountered a problem with gen_leader. All our servers died simultaneously. The bug is this:
The function lexcompare returns one of: 'equal', 'less', or 'greater'. The function safe_loop calls lexcompare when it receives a 'capture' message from a server. Unfortunately, safe_loop only handles 'less' and 'greater' in the lexcompare returned result.
Our server ran into a scenario when 'equal' was returned from lexcompare. Hence, the servers crashed.
What should be done to fix this missing case clause problem?
Thanks -
- Thomas Fee
DISCLAIMER: This message (including any files transmitted with it) may contain confidential and/or proprietary information, is the property of Interactive Data Corporation and/or its subsidiaries, and is directed only to the addressee(s). If you are not the designated recipient or have reason to believe you received this message in error, please delete this message from your system and notify the sender immediately. An unintended recipient's disclosure, copying, distribution, or use of this message or any attachments is prohibited and may be unlawful.
More information about the erlang-questions
mailing list