gen_leader bug

Fee, Thomas TFee@REDACTED
Mon Jan 10 17:39:50 CET 2005


Dear Ulf,
Our server farm has nodes that join the cloud at different times, as you
might guess. First, I don't know how we set things up for a new node to
inform other nodes about its availability. Secondly, I suspect that a race
condition probably exists in whatever technique we employ for that purpose.
Probably, there is no lock/sync involved across the set of all nodes.

Martin Logan incorporated gen_leader into our system. He can probably make
better sense of all this.

- Thomas Fee (eSignal / FutureSource)


-----Original Message-----
From: owner-erlang-questions@REDACTED
[mailto:owner-erlang-questions@REDACTED] On Behalf Of Ulf Wiger (AL/EAB)
Sent: Monday, January 10, 2005 4:53 AM
To: erlang-questions@REDACTED
Subject: RE: gen_leader bug


Thomas,

Having looked into this briefly, I'm of the opinion that 
lexcompare/2 can only return 'equal' if the leaders have been 
misconfigured. I think that it would be better for it to 
exit, but as far as I can tell from the code, and based on my 
recollection, lexcompare/2 can only return 'equal' if:

(1) two nodes have been accepted by the same number of peers 
    (which is possible), and
(2) they have the same position in the list of candidates

If (2) is true, the leaders are misconfigured. They should all 
have identical candidate lists. (*)

The documentation states that "the list of candidates must be 
known from the start".

This is perhaps a bit vague. It should also state that all leader 
candidates must be started with identical candidate lists.

If my assumption seems incorrect, please let me know.

Regards,
Uffe

(*) Obviously, this calls for some caution when the candidate 
list needs to be updated. One way to do this is during a code
change (which, using the OTP release handler support is 
synchronized across nodes.) A reasonable restriction during 
code change is that if one participating node dies while 
upgrade is in progress, the code change is rolled back.
At the moment, gen_leader callbacks are able to update the 
candidate list during code change, but only by breaking the
record abstraction. One additional exported function, e.g.
set_candidates/2, from gen_leader.erl could fix this.

-----Original Message-----
From: owner-erlang-questions@REDACTED
[mailto:owner-erlang-questions@REDACTED]On Behalf Of Fee, Thomas
Sent: den 4 januari 2005 22:44
To: erlang-questions@REDACTED
Subject: gen_leader bug


Hello All, 
We have encountered a problem with gen_leader. All our servers died
simultaneously. The bug is this: 
The function lexcompare returns one of: 'equal', 'less', or 'greater'. The
function safe_loop calls lexcompare when it receives a 'capture' message
from a server. Unfortunately, safe_loop only handles 'less' and 'greater' in
the lexcompare returned result.
Our server ran into a scenario when 'equal' was returned from lexcompare.
Hence, the servers crashed. 
What should be done to fix this missing case clause problem? 
Thanks - 
- Thomas Fee 



DISCLAIMER: This message (including any files transmitted with it) may
contain confidential and/or proprietary information, is the property of
Interactive Data Corporation and/or its subsidiaries, and is directed only
to the addressee(s). If you are not the designated recipient or have reason
to believe you received this message in error, please delete this message
from your system and notify the sender immediately. An unintended
recipient's disclosure, copying, distribution, or use of this message or any
attachments is prohibited and may be unlawful.

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20050110/b7312d95/attachment.htm>


More information about the erlang-questions mailing list