[erlang-questions] Erlang suitability

Jesper Louis Andersen jesper.louis.andersen@REDACTED
Mon May 21 17:40:21 CEST 2012

On 5/18/12 11:00 AM, Ovid wrote:
> Use case 1: If the *total* of all of those small amounts exceeds a 
> daily cap or an all-time cap, all 75 boxes must immediately stop 
> spending bidding in auctions. It seems that each box can run a 
> separate Erlang process and write out "winning bid" information to an 
> Mnesia database and all boxes can read the total amount spent from 
> that to determine if it should stop bidding.
Have you considered using the opposite direction? A bidder takes a lease 
on a part of the possible cap. If there is a lot of remaining money you 
can pick out a fairly large lease and as you get closer to the spending 
limit you can decrease the amount of extra money you can get for bids. 
For instance, you may know there is $100 in the pool and a bidder needs 
to do a bid. Hence it allocates $5 to itself and can now roam on those 
$5 as it sees fit.

If the leaser crashes, you have a monitor on it, so you will get 
notified and can recover money it did not spend.

You will still need some database solution that is running on multiple 
nodes to battle a single-point-of-failure. But this idea works even if 
this is the case.

The advantage is that this scales a lot better. A bidder now knows for 
how much it is allowed to bid and can then operate independently on a 
synchronization point of "How much more am I allowed to spend?"

> This seems trivial to set up.
It isn't. But Erlang could perhaps lend you some tools to make this work.
> Use case 2: we periodically need to reauthenticate to the auction 
> system. We MUST NOT have all 75 boxes trying to reauthenticate at the 
> same time because we will be locked out of the system if we do this. 
> Having a central box handling reauthentication is a single point of 
> failure that we would like to avoid, but we don't know what design 
> pattern Erlang would use to ensure that only one of the 75 Erlang 
> instances would attempt to reauthenticate at any one time (all 75 
> boxes can share the same authentication token).
Your problem is that of a hypothesis: waiting-requires-locking. That is, 
if you need to wait on others, you need to synchronize who is doing 
things in what order - and that requires you to block on a single point. 
This in turn makes it hard to avoid the single-point-of-failure.

If you know that you may have up to K simultaneous authentications 
running it becomes easier to handle because then you have some leverage 
in how much synchronization that is needed.

There is no really good solution though. A problem here is the 
split-brain scenario, where your network gets disconnected, but 
individual nodes are still operating and can authenticate. In that case, 
you might have double auths if you pick the lowest possible node in a list.

What you should really do is to use *risk* as a deciding factor. You 
must evaluate the risk of something happening to the impact. It is, for 
instance, more likely that a node is lost than the network connectivity 
is in a split brain where you can still authenticate. Hence you decide 
to take that risk probably - knowing that certain split-brain scenarios 
cannot be handled by the solution.

If there is anything I wish to tell people about distributed programming 
it is that it is a fuzzy logic. On a single machine you are *not* safe 
since it can die. On multiple machines you have an error rate and 
different types of errors. What is important is that you control the 
error rate rather than let it flow by itself. You will almost never hit 
a situation where 100% stability can be guaranteed if you also need 
speed. So it becomes a question of risk management and trade-offs.

Jesper Louis Andersen
   Erlang Solutions Ltd., Copenhagen, DK

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20120521/8aa12761/attachment.htm>

More information about the erlang-questions mailing list