<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    On 5/18/12 11:00 AM, Ovid wrote:

    <blockquote

      cite="mid:1337331624.44767.YahooMailNeo@web162101.mail.bf1.yahoo.com"

      type="cite">

      <div style="color: rgb(0, 0, 0); background-color: rgb(255, 255,

        255); font-family: Courier

        New,courier,monaco,monospace,sans-serif; font-size: 10pt;"><br>

        <div><span>Use case 1: If the *total* of all of those small

            amounts exceeds a daily cap or an all-time cap, all 75 boxes

            must immediately stop spending bidding in auctions. It seems

            that each box can run a separate Erlang process and write

            out "winning bid" information to an Mnesia database and all

            boxes can read the total amount spent from that to determine

            if it should stop bidding.</span></div>

        <div><span><br>

          </span></div>

      </div>

    </blockquote>

    Have you considered using the opposite direction? A bidder takes a

    lease on a part of the possible cap. If there is a lot of remaining

    money you can pick out a fairly large lease and as you get closer to

    the spending limit you can decrease the amount of extra money you

    can get for bids. For instance, you may know there is $100 in the

    pool and a bidder needs to do a bid. Hence it allocates $5 to itself

    and can now roam on those $5 as it sees fit.<br>

    <br>

    If the leaser crashes, you have a monitor on it, so you will get

    notified and can recover money it did not spend.<br>

    <br>

    You will still need some database solution that is running on

    multiple nodes to battle a single-point-of-failure. But this idea

    works even if this is the case.<br>

    <br>

    The advantage is that this scales a lot better. A bidder now knows

    for how much it is allowed to bid and can then operate independently

    on a synchronization point of "How much more am I allowed to spend?"<br>

    <br>

    <blockquote

      cite="mid:1337331624.44767.YahooMailNeo@web162101.mail.bf1.yahoo.com"

      type="cite">

      <div style="color: rgb(0, 0, 0); background-color: rgb(255, 255,

        255); font-family: Courier

        New,courier,monaco,monospace,sans-serif; font-size: 10pt;">

        <div><span>This seems trivial to set up.</span></div>

      </div>

    </blockquote>

    It isn't. But Erlang could perhaps lend you some tools to make this

    work.<br>

    <blockquote

      cite="mid:1337331624.44767.YahooMailNeo@web162101.mail.bf1.yahoo.com"

      type="cite">

      <div style="color:#000; background-color:#fff; font-family:Courier

        New, courier, monaco, monospace, sans-serif;font-size:10pt">

        <div><span>Use case 2: we periodically need to reauthenticate to

            the auction system. We MUST NOT have all 75 boxes trying to

            reauthenticate at the same time because we will be locked

            out of the system if we do this. Having a central box

            handling reauthentication is a single point of failure that

            we would like to avoid, but we don't know what design

            pattern Erlang would use to ensure that only one of the 75

            Erlang instances would attempt to reauthenticate at any one

            time (all 75 boxes can share the same authentication token).</span></div>

        <br>

      </div>

    </blockquote>

    Your problem is that of a hypothesis: waiting-requires-locking. That

    is, if you need to wait on others, you need to synchronize who is

    doing things in what order - and that requires you to block on a

    single point. This in turn makes it hard to avoid the

    single-point-of-failure.<br>

    <br>

    If you know that you may have up to K simultaneous authentications

    running it becomes easier to handle because then you have some

    leverage in how much synchronization that is needed. <br>

    <br>

    There is no really good solution though. A problem here is the

    split-brain scenario, where your network gets disconnected, but

    individual nodes are still operating and can authenticate. In that

    case, you might have double auths if you pick the lowest possible

    node in a list.<br>

    <br>

    What you should really do is to use *risk* as a deciding factor. You

    must evaluate the risk of something happening to the impact. It is,

    for instance, more likely that a node is lost than the network

    connectivity is in a split brain where you can still authenticate.

    Hence you decide to take that risk probably - knowing that certain

    split-brain scenarios cannot be handled by the solution.<br>

    <br>

    If there is anything I wish to tell people about distributed

    programming it is that it is a fuzzy logic. On a single machine you

    are *not* safe since it can die. On multiple machines you have an

    error rate and different types of errors. What is important is that

    you control the error rate rather than let it flow by itself. You

    will almost never hit a situation where 100% stability can be

    guaranteed if you also need speed. So it becomes a question of risk

    management and trade-offs.<br>

    <br>

    <pre class="moz-signature" cols="72">-- 

Jesper Louis Andersen

  Erlang Solutions Ltd., Copenhagen, DK</pre>

  </body>

</html>