[erlang-questions] Erlang suitability
Jesper Louis Andersen
Mon May 21 17:40:21 CEST 2012
On 5/18/12 11:00 AM, Ovid wrote:
> Use case 1: If the *total* of all of those small amounts exceeds a
> daily cap or an all-time cap, all 75 boxes must immediately stop
> spending bidding in auctions. It seems that each box can run a
> separate Erlang process and write out "winning bid" information to an
> Mnesia database and all boxes can read the total amount spent from
> that to determine if it should stop bidding.
Have you considered using the opposite direction? A bidder takes a lease
on a part of the possible cap. If there is a lot of remaining money you
can pick out a fairly large lease and as you get closer to the spending
limit you can decrease the amount of extra money you can get for bids.
For instance, you may know there is $100 in the pool and a bidder needs
to do a bid. Hence it allocates $5 to itself and can now roam on those
$5 as it sees fit.
If the leaser crashes, you have a monitor on it, so you will get
notified and can recover money it did not spend.
You will still need some database solution that is running on multiple
nodes to battle a single-point-of-failure. But this idea works even if
this is the case.
The advantage is that this scales a lot better. A bidder now knows for
how much it is allowed to bid and can then operate independently on a
synchronization point of "How much more am I allowed to spend?"
> This seems trivial to set up.
It isn't. But Erlang could perhaps lend you some tools to make this work.
> Use case 2: we periodically need to reauthenticate to the auction
> system. We MUST NOT have all 75 boxes trying to reauthenticate at the
> same time because we will be locked out of the system if we do this.
> Having a central box handling reauthentication is a single point of
> failure that we would like to avoid, but we don't know what design
> pattern Erlang would use to ensure that only one of the 75 Erlang
> instances would attempt to reauthenticate at any one time (all 75
> boxes can share the same authentication token).
Your problem is that of a hypothesis: waiting-requires-locking. That is,
if you need to wait on others, you need to synchronize who is doing
things in what order - and that requires you to block on a single point.
This in turn makes it hard to avoid the single-point-of-failure.
If you know that you may have up to K simultaneous authentications
running it becomes easier to handle because then you have some leverage
in how much synchronization that is needed.
There is no really good solution though. A problem here is the
split-brain scenario, where your network gets disconnected, but
individual nodes are still operating and can authenticate. In that case,
you might have double auths if you pick the lowest possible node in a list.
What you should really do is to use *risk* as a deciding factor. You
must evaluate the risk of something happening to the impact. It is, for
instance, more likely that a node is lost than the network connectivity
is in a split brain where you can still authenticate. Hence you decide
to take that risk probably - knowing that certain split-brain scenarios
cannot be handled by the solution.
If there is anything I wish to tell people about distributed programming
it is that it is a fuzzy logic. On a single machine you are *not* safe
since it can die. On multiple machines you have an error rate and
different types of errors. What is important is that you control the
error rate rather than let it flow by itself. You will almost never hit
a situation where 100% stability can be guaranteed if you also need
speed. So it becomes a question of risk management and trade-offs.
Jesper Louis Andersen
Erlang Solutions Ltd., Copenhagen, DK
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the erlang-questions