[erlang-questions] Distributed application and netsplit

Scott Lystig Fritchie <>
Sun Nov 23 15:16:56 CET 2014


At the risk of sounding like we aren't in violent agreement when it's
quite likely that we are indeed agreeing...

Felix Gallo <> wrote:

fg> Disregarding for a moment the incongruousness of installing a Riak
fg> cluster to try to handle that case, there's nothing Riak could do to
fg> even help out, [...]

I'm not suggesting that Riak KV could fix that problem.  (Though, a
small application of riak_ensemble *could*.  But those two apps have
entirely different availability profiles and behavior/semantic
guarantees.)  I don't believe that Andrew was suggesting it, either.  My
apologies if our ties to a common employer was suggesting paths or
solutions related to that employer's products.

I think that both Andrew and I are suggesting that sometimes you really
do need something more than dirt on a wound[1] to avoid bleeding to
death[1b].  Many folks use ZooKeeper[2].  The OpenReplica[3] paper
suggests an alternative to ZooKeeper, one of many others.

I'm likely agreeing with Felix and Andrew both that when the
dirt-or-better-than-dirt mechanism has told you about a problem, how you
react to that problem has no universal answer[4].

-Scott[6]

[1] http://www.urbandictionary.com/define.php?term=rub%20some%20dirt%20on%20it
[1b] I'm not suggesting that the Erlang/OTP application controller is
     dirt.  However, it was designed by Ericsson to operate in a
     hardware environment where it it far more likely to work correctly
     than in a general data center or (gasp!) EC2 environment.
[2] After all, "Rub some ZooKeeper on it" is as good advice as "Rub some
    dirt on it": both work well in some cases and are ineffective in others.
[3] https://ecommons.library.cornell.edu/bitstream/1813/29009/2/OpenReplica.pdf
[4] OK, alright, "It depends on your app," is the correct answer.

P.S.  Many systems use ZooKeeper or other netsplit-tolerant[5] tools to
come close to a universal answer, which is "believe that
process/machine/actor/thingie and nothing else."  And those systems are
willing to accept unavailability when those tools can't give an
unambiguous answer.

[5] For varying (and maddening) definitions of "tolerant".
[6] Who apologizes for mixing up two issues when they should have been
    kept separate.  However, like peanut butter and chocolate, or phyllo
    dough and honey, they are frequently difficult to keep separate in
    practice.


More information about the erlang-questions mailing list