<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=US-ASCII">
<META NAME="Generator" CONTENT="MS Exchange Server version 5.5.2658.2">
<TITLE>RE: gen_leader bug</TITLE>
</HEAD>
<BODY>
<P><FONT SIZE=2>Dear Ulf,</FONT>
<BR><FONT SIZE=2>Our server farm has nodes that join the cloud at different times, as you might guess. First, I don't know how we set things up for a new node to inform other nodes about its availability. Secondly, I suspect that a race condition probably exists in whatever technique we employ for that purpose. Probably, there is no lock/sync involved across the set of all nodes.</FONT></P>
<P><FONT SIZE=2>Martin Logan incorporated gen_leader into our system. He can probably make better sense of all this.</FONT>
</P>
<P><FONT SIZE=2>- Thomas Fee (eSignal / FutureSource)</FONT>
</P>
<BR>
<P><FONT SIZE=2>-----Original Message-----</FONT>
<BR><FONT SIZE=2>From: owner-erlang-questions@erlang.org [<A HREF="mailto:owner-erlang-questions@erlang.org">mailto:owner-erlang-questions@erlang.org</A>] On Behalf Of Ulf Wiger (AL/EAB)</FONT>
<BR><FONT SIZE=2>Sent: Monday, January 10, 2005 4:53 AM</FONT>
<BR><FONT SIZE=2>To: erlang-questions@erlang.org</FONT>
<BR><FONT SIZE=2>Subject: RE: gen_leader bug</FONT>
</P>
<BR>
<P><FONT SIZE=2>Thomas,</FONT>
</P>
<P><FONT SIZE=2>Having looked into this briefly, I'm of the opinion that </FONT>
<BR><FONT SIZE=2>lexcompare/2 can only return 'equal' if the leaders have been </FONT>
<BR><FONT SIZE=2>misconfigured. I think that it would be better for it to </FONT>
<BR><FONT SIZE=2>exit, but as far as I can tell from the code, and based on my </FONT>
<BR><FONT SIZE=2>recollection, lexcompare/2 can only return 'equal' if:</FONT>
</P>
<P><FONT SIZE=2>(1) two nodes have been accepted by the same number of peers </FONT>
<BR><FONT SIZE=2> (which is possible), and</FONT>
<BR><FONT SIZE=2>(2) they have the same position in the list of candidates</FONT>
</P>
<P><FONT SIZE=2>If (2) is true, the leaders are misconfigured. They should all </FONT>
<BR><FONT SIZE=2>have identical candidate lists. (*)</FONT>
</P>
<P><FONT SIZE=2>The documentation states that "the list of candidates must be </FONT>
<BR><FONT SIZE=2>known from the start".</FONT>
</P>
<P><FONT SIZE=2>This is perhaps a bit vague. It should also state that all leader </FONT>
<BR><FONT SIZE=2>candidates must be started with identical candidate lists.</FONT>
</P>
<P><FONT SIZE=2>If my assumption seems incorrect, please let me know.</FONT>
</P>
<P><FONT SIZE=2>Regards,</FONT>
<BR><FONT SIZE=2>Uffe</FONT>
</P>
<P><FONT SIZE=2>(*) Obviously, this calls for some caution when the candidate </FONT>
<BR><FONT SIZE=2>list needs to be updated. One way to do this is during a code</FONT>
<BR><FONT SIZE=2>change (which, using the OTP release handler support is </FONT>
<BR><FONT SIZE=2>synchronized across nodes.) A reasonable restriction during </FONT>
<BR><FONT SIZE=2>code change is that if one participating node dies while </FONT>
<BR><FONT SIZE=2>upgrade is in progress, the code change is rolled back.</FONT>
<BR><FONT SIZE=2>At the moment, gen_leader callbacks are able to update the </FONT>
<BR><FONT SIZE=2>candidate list during code change, but only by breaking the</FONT>
<BR><FONT SIZE=2>record abstraction. One additional exported function, e.g.</FONT>
<BR><FONT SIZE=2>set_candidates/2, from gen_leader.erl could fix this.</FONT>
</P>
<P><FONT SIZE=2>-----Original Message-----</FONT>
<BR><FONT SIZE=2>From: owner-erlang-questions@erlang.org [<A HREF="mailto:owner-erlang-questions@erlang.org">mailto:owner-erlang-questions@erlang.org</A>]On Behalf Of Fee, Thomas</FONT>
<BR><FONT SIZE=2>Sent: den 4 januari 2005 22:44</FONT>
<BR><FONT SIZE=2>To: erlang-questions@erlang.org</FONT>
<BR><FONT SIZE=2>Subject: gen_leader bug</FONT>
</P>
<BR>
<P><FONT SIZE=2>Hello All, </FONT>
<BR><FONT SIZE=2>We have encountered a problem with gen_leader. All our servers died simultaneously. The bug is this: </FONT>
<BR><FONT SIZE=2>The function lexcompare returns one of: 'equal', 'less', or 'greater'. The function safe_loop calls lexcompare when it receives a 'capture' message from a server. Unfortunately, safe_loop only handles 'less' and 'greater' in the lexcompare returned result.</FONT></P>
<P><FONT SIZE=2>Our server ran into a scenario when 'equal' was returned from lexcompare. Hence, the servers crashed. </FONT>
<BR><FONT SIZE=2>What should be done to fix this missing case clause problem? </FONT>
<BR><FONT SIZE=2>Thanks - </FONT>
<BR><FONT SIZE=2>- Thomas Fee </FONT>
</P>
<BR>
<BR>
<P>DISCLAIMER: This message (including any files transmitted with it) may contain confidential and/or proprietary information, is the property of Interactive Data Corporation and/or its subsidiaries, and is directed only to the addressee(s). If you are not the designated recipient or have reason to believe you received this message in error, please delete this message from your system and notify the sender immediately. An unintended recipient's disclosure, copying, distribution, or use of this message or any attachments is prohibited and may be unlawful.</P>
<P>
</P>
</BODY>
</HTML>