Mon Jan 10 17:39:50 CET 2005
Our server farm has nodes that join the cloud at different times, as you
might guess. First, I don't know how we set things up for a new node to
inform other nodes about its availability. Secondly, I suspect that a race
condition probably exists in whatever technique we employ for that purpose.
Probably, there is no lock/sync involved across the set of all nodes.
Martin Logan incorporated gen_leader into our system. He can probably make
better sense of all this.
- Thomas Fee (eSignal / FutureSource)
[mailto:owner-erlang-questions@REDACTED] On Behalf Of Ulf Wiger (AL/EAB)
Sent: Monday, January 10, 2005 4:53 AM
Subject: RE: gen_leader bug
Having looked into this briefly, I'm of the opinion that
lexcompare/2 can only return 'equal' if the leaders have been
misconfigured. I think that it would be better for it to
exit, but as far as I can tell from the code, and based on my
recollection, lexcompare/2 can only return 'equal' if:
(1) two nodes have been accepted by the same number of peers
(which is possible), and
(2) they have the same position in the list of candidates
If (2) is true, the leaders are misconfigured. They should all
have identical candidate lists. (*)
The documentation states that "the list of candidates must be
known from the start".
This is perhaps a bit vague. It should also state that all leader
candidates must be started with identical candidate lists.
If my assumption seems incorrect, please let me know.
(*) Obviously, this calls for some caution when the candidate
list needs to be updated. One way to do this is during a code
change (which, using the OTP release handler support is
synchronized across nodes.) A reasonable restriction during
code change is that if one participating node dies while
upgrade is in progress, the code change is rolled back.
At the moment, gen_leader callbacks are able to update the
candidate list during code change, but only by breaking the
record abstraction. One additional exported function, e.g.
set_candidates/2, from gen_leader.erl could fix this.
[mailto:owner-erlang-questions@REDACTED]On Behalf Of Fee, Thomas
Sent: den 4 januari 2005 22:44
Subject: gen_leader bug
We have encountered a problem with gen_leader. All our servers died
simultaneously. The bug is this:
The function lexcompare returns one of: 'equal', 'less', or 'greater'. The
function safe_loop calls lexcompare when it receives a 'capture' message
from a server. Unfortunately, safe_loop only handles 'less' and 'greater' in
the lexcompare returned result.
Our server ran into a scenario when 'equal' was returned from lexcompare.
Hence, the servers crashed.
What should be done to fix this missing case clause problem?
- Thomas Fee
DISCLAIMER: This message (including any files transmitted with it) may
contain confidential and/or proprietary information, is the property of
Interactive Data Corporation and/or its subsidiaries, and is directed only
to the addressee(s). If you are not the designated recipient or have reason
to believe you received this message in error, please delete this message
from your system and notify the sender immediately. An unintended
recipient's disclosure, copying, distribution, or use of this message or any
attachments is prohibited and may be unlawful.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the erlang-questions