[erlang-questions] Erlang suitability
Fri May 18 14:29:16 CEST 2012
Netsplits. Damn. I forgot to put my thinking CAP on. In this case, a netsplit would be disastrous unless we fell back to a central data store such as Redis. At that point, Erlang doesn't look like a solution at all.
On top of that, you're the second person to point out that a complete graph with 75 nodes is problematic. Now that I think about it, I guess I can see why. It now sounds like an Erlang solution is not the quick win we thought it might be (quelle surprise!).
Thanks to everyone for all of your answers.
Live and work overseas - http://www.overseas-exile.com/
Buy the book - http://www.oreilly.com/catalog/perlhks/
Tech blog - http://blogs.perl.org/users/ovid/
Twitter - http://twitter.com/OvidPerl/
> From: Fred Hebert <>
>To: Ovid <>
>Cc: "" <>
>Sent: Friday, 18 May 2012, 14:02
>Subject: Re: [erlang-questions] Erlang suitability
>On 12-05-18 5:00 AM, Ovid wrote:
>>We've a system that run across 75 servers and needs to be highly performant, fault-tolerant, scalable and shares persistent data across all 75 servers. We're investigating Erlang/Mnesia (which we don't know) because it sounds tailor-made for our situation.
As mentioned earlier in this thread, 75 servers is a bit much, but people have done it before.
>>We are not using Erlang for our first implementation, but are instead hacking together a solution from known technologies including Perl, MySQL and Redis. We're considering Erlang for our future work.
>>We have two primary needs: Each box can bid on an auction and potentially spend a tiny amount of money and each of the 75 boxes will receive notifications of a small amount of money spent if they win the auction (the auction notification will probably not be sent to the box bidding in the auction).
>>Use case 1: If the *total* of all of those small amounts exceeds a daily cap or an all-time cap, all 75 boxes must immediately stop spending bidding in auctions. It seems that each box can run a separate Erlang process and write out "winning bid" information to an Mnesia database and all boxes can read the total amount spent from that to determine if it should stop bidding.
>>This seems trivial to set up.
It isn't trivial. You have think about what happens when a box is seen as crashing. How strongly consistent do you want things to be? There is always a risk that a box didn't crash, but was cut off in a netsplit. You might get divergences in budget that will be hard to explain.
>There is also a definite timing issue depending on how your data is
being observed. For example, you ask permission to bid on an item,
but you do not get instant feedback; by the time you sent maybe 5-10
bids, the cap is finally reached and broken at once because the
delay to the other network made you keep on bidding without a final
result. How much tolerance do you have for this?
>You mentioned in another post that "We need to ensure that were all 75 boxes to mysteriously crash, we could bring them back up and not worry about data integrity.", Possibly, but what about 1 node only? What about 5? What about 30 or 35? What if they crash and you missed winning bids because you went out after bidding but before getting your notifications back (if that is possible by the bidding rules of whatever exchange you're dealing with).
>The most solid synchronous database setup might not give you the guarantees you expect in the first place.
>>Use case 2: we periodically need to reauthenticate to the auction system. We MUST NOT have all 75 boxes trying to reauthenticate at the same time because we will be locked out of the system if we do this. Having a central box handling reauthentication is a single point of failure that we would like to avoid, but we don't know what design pattern Erlang would use to ensure that only one of the 75 Erlang instances would attempt to reauthenticate at any one time (all 75 boxes can share the same authentication token).
That depends on: 1. how many times you can try to re-authenticate before being blocked, 2. how close together they have to be.
>Central points of failures are definitely something to avoid. Leader
election across 75 boxes might not be the funnest thing in the world
either. I could see a scheme where you use some distributed cached
value that can say "I am currently being logged" that can time out
at some point, visible to all readers. When you read that timeout
value from each box (possibly from an OTP Application that only
handles auth), each reading of that value adds or subtracts a random
number to the timeout. This is to try and avoid a cluster-wide
synchronization on the timeout value, and instead have them happen
at different times. You could add an "I'm updating" flag related to
that value and that could give you good probabilities that only a
fraction of all the nodes attempt an authentication at any point in
time close to the timeout value.
>Again, this would depend on how often your authentication needs to
be done, and to what frequency you're allowed to do it.
>If it's too tight, you might need a central server or node that
takes care of it, with one or two fail-overs to add some
>Note you will still have to care about netsplits ruining your day
with this whole scheme.
>-- I had nothing to add on the rest of the mail, so cut if off.
>Hope this helps,
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the erlang-questions