<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Answers inline.<br>
<br>
On 12-05-18 5:00 AM, Ovid wrote:
<blockquote
cite="mid:1337331624.44767.YahooMailNeo@web162101.mail.bf1.yahoo.com"
type="cite">
<div style="color: rgb(0, 0, 0); background-color: rgb(255, 255,
255); font-family: Courier
New,courier,monaco,monospace,sans-serif; font-size: 10pt;">
<div><span>Hi there,</span></div>
<div><span><br>
</span></div>
<div><span>We've a system that run across 75 servers and needs
to be highly performant, fault-tolerant, scalable and shares
persistent data across all 75 servers. We're investigating
Erlang/Mnesia (which we don't know) because it sounds
tailor-made for our situation.</span></div>
</div>
</blockquote>
As mentioned earlier in this thread, 75 servers is a bit much, but
people have done it before.<br>
<blockquote
cite="mid:1337331624.44767.YahooMailNeo@web162101.mail.bf1.yahoo.com"
type="cite">
<div style="color: rgb(0, 0, 0); background-color: rgb(255, 255,
255); font-family: Courier
New,courier,monaco,monospace,sans-serif; font-size: 10pt;">
<div><span><br>
</span></div>
<div><span>We are not using Erlang for our first implementation,
but are instead hacking together a solution from known
technologies including Perl, MySQL and Redis. We're
considering Erlang for our future work.</span></div>
<div><span><br>
</span></div>
<div><span>We have two primary needs: Each box can bid on an
auction and potentially spend a tiny amount of money and
each of the 75 boxes will receive notifications of a small
amount of money spent if they win the auction (the auction
notification will probably not be sent to the box bidding in
the auction).</span></div>
<div><span><br>
</span></div>
<div><span>Use case 1: If the *total* of all of those small
amounts exceeds a daily cap or an all-time cap, all 75 boxes
must immediately stop spending bidding in auctions. It seems
that each box can run a separate Erlang process and write
out "winning bid" information to an Mnesia database and all
boxes can read the total amount spent from that to determine
if it should stop bidding.</span></div>
<div><span><br>
</span></div>
<div><span>This seems trivial to set up.</span></div>
</div>
</blockquote>
It isn't trivial. You have think about what happens when a box is
seen as crashing. How strongly consistent do you want things to be?
There is always a risk that a box didn't crash, but was cut off in a
netsplit. You might get divergences in budget that will be hard to
explain. <br>
<br>
There is also a definite timing issue depending on how your data is
being observed. For example, you ask permission to bid on an item,
but you do not get instant feedback; by the time you sent maybe 5-10
bids, the cap is finally reached and broken at once because the
delay to the other network made you keep on bidding without a final
result. How much tolerance do you have for this? <br>
<br>
You mentioned in another post that "<span>We need to ensure that
were all 75 boxes to mysteriously crash, we could bring them back
up and not worry about data integrity.", Possibly, but what about
1 node only? What about 5? What about 30 or 35? What if they crash
and you missed winning bids because you went out after bidding but
before getting your notifications back (if that is possible by the
bidding rules of whatever exchange you're dealing with). <br>
<br>
</span>The most solid synchronous database setup might not give you
the guarantees you expect in the first place.<br>
<blockquote
cite="mid:1337331624.44767.YahooMailNeo@web162101.mail.bf1.yahoo.com"
type="cite">
<div style="color: rgb(0, 0, 0); background-color: rgb(255, 255,
255); font-family: Courier
New,courier,monaco,monospace,sans-serif; font-size: 10pt;">
<div><span><br>
</span></div>
<div><span>Use case 2: we periodically need to reauthenticate to
the auction system. We MUST NOT have all 75 boxes trying to
reauthenticate at the same time because we will be locked
out of the system if we do this. Having a central box
handling reauthentication is a single point of failure that
we would like to avoid, but we don't know what design
pattern Erlang would use to ensure that only one of the 75
Erlang instances would attempt to reauthenticate at any one
time (all 75 boxes can share the same authentication token).</span></div>
</div>
</blockquote>
That depends on: 1. how many times you can try to re-authenticate
before being blocked, 2. how close together they have to be.<br>
<br>
Central points of failures are definitely something to avoid. Leader
election across 75 boxes might not be the funnest thing in the world
either. I could see a scheme where you use some distributed cached
value that can say "I am currently being logged" that can time out
at some point, visible to all readers. When you read that timeout
value from each box (possibly from an OTP Application that only
handles auth), each reading of that value adds or subtracts a random
number to the timeout. This is to try and avoid a cluster-wide
synchronization on the timeout value, and instead have them happen
at different times. You could add an "I'm updating" flag related to
that value and that could give you good probabilities that only a
fraction of all the nodes attempt an authentication at any point in
time close to the timeout value.<br>
<br>
Again, this would depend on how often your authentication needs to
be done, and to what frequency you're allowed to do it.<br>
<br>
If it's too tight, you might need a central server or node that
takes care of it, with one or two fail-overs to add some
reliability.<br>
<br>
Note you will still have to care about netsplits ruining your day
with this whole scheme.<br>
<br>
-- I had nothing to add on the rest of the mail, so cut if off.<br>
<br>
Hope this helps,<br>
Fred.<br>
</body>
</html>