<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
On 5/18/12 11:00 AM, Ovid wrote:
<blockquote
cite="mid:1337331624.44767.YahooMailNeo@web162101.mail.bf1.yahoo.com"
type="cite">
<div style="color: rgb(0, 0, 0); background-color: rgb(255, 255,
255); font-family: Courier
New,courier,monaco,monospace,sans-serif; font-size: 10pt;"><br>
<div><span>Use case 1: If the *total* of all of those small
amounts exceeds a daily cap or an all-time cap, all 75 boxes
must immediately stop spending bidding in auctions. It seems
that each box can run a separate Erlang process and write
out "winning bid" information to an Mnesia database and all
boxes can read the total amount spent from that to determine
if it should stop bidding.</span></div>
<div><span><br>
</span></div>
</div>
</blockquote>
Have you considered using the opposite direction? A bidder takes a
lease on a part of the possible cap. If there is a lot of remaining
money you can pick out a fairly large lease and as you get closer to
the spending limit you can decrease the amount of extra money you
can get for bids. For instance, you may know there is $100 in the
pool and a bidder needs to do a bid. Hence it allocates $5 to itself
and can now roam on those $5 as it sees fit.<br>
<br>
If the leaser crashes, you have a monitor on it, so you will get
notified and can recover money it did not spend.<br>
<br>
You will still need some database solution that is running on
multiple nodes to battle a single-point-of-failure. But this idea
works even if this is the case.<br>
<br>
The advantage is that this scales a lot better. A bidder now knows
for how much it is allowed to bid and can then operate independently
on a synchronization point of "How much more am I allowed to spend?"<br>
<br>
<blockquote
cite="mid:1337331624.44767.YahooMailNeo@web162101.mail.bf1.yahoo.com"
type="cite">
<div style="color: rgb(0, 0, 0); background-color: rgb(255, 255,
255); font-family: Courier
New,courier,monaco,monospace,sans-serif; font-size: 10pt;">
<div><span>This seems trivial to set up.</span></div>
</div>
</blockquote>
It isn't. But Erlang could perhaps lend you some tools to make this
work.<br>
<blockquote
cite="mid:1337331624.44767.YahooMailNeo@web162101.mail.bf1.yahoo.com"
type="cite">
<div style="color:#000; background-color:#fff; font-family:Courier
New, courier, monaco, monospace, sans-serif;font-size:10pt">
<div><span>Use case 2: we periodically need to reauthenticate to
the auction system. We MUST NOT have all 75 boxes trying to
reauthenticate at the same time because we will be locked
out of the system if we do this. Having a central box
handling reauthentication is a single point of failure that
we would like to avoid, but we don't know what design
pattern Erlang would use to ensure that only one of the 75
Erlang instances would attempt to reauthenticate at any one
time (all 75 boxes can share the same authentication token).</span></div>
<br>
</div>
</blockquote>
Your problem is that of a hypothesis: waiting-requires-locking. That
is, if you need to wait on others, you need to synchronize who is
doing things in what order - and that requires you to block on a
single point. This in turn makes it hard to avoid the
single-point-of-failure.<br>
<br>
If you know that you may have up to K simultaneous authentications
running it becomes easier to handle because then you have some
leverage in how much synchronization that is needed. <br>
<br>
There is no really good solution though. A problem here is the
split-brain scenario, where your network gets disconnected, but
individual nodes are still operating and can authenticate. In that
case, you might have double auths if you pick the lowest possible
node in a list.<br>
<br>
What you should really do is to use *risk* as a deciding factor. You
must evaluate the risk of something happening to the impact. It is,
for instance, more likely that a node is lost than the network
connectivity is in a split brain where you can still authenticate.
Hence you decide to take that risk probably - knowing that certain
split-brain scenarios cannot be handled by the solution.<br>
<br>
If there is anything I wish to tell people about distributed
programming it is that it is a fuzzy logic. On a single machine you
are *not* safe since it can die. On multiple machines you have an
error rate and different types of errors. What is important is that
you control the error rate rather than let it flow by itself. You
will almost never hit a situation where 100% stability can be
guaranteed if you also need speed. So it becomes a question of risk
management and trade-offs.<br>
<br>
<pre class="moz-signature" cols="72">--
Jesper Louis Andersen
Erlang Solutions Ltd., Copenhagen, DK</pre>
</body>
</html>