Asynchronous consensus in <40 code lines of Erlang

Bryan Ford brynosaurus@REDACTED
Sat Mar 7 14:51:06 CET 2020


Thanks Stavros.  The race-based “random” function I tried was similar to what you suggested, but used two racing threads instead of just one plus a timeout.  I guess the timeout approach is likely more efficient.

And I did find and try the —interleaving_bound option.  I also noticed the —scheduling_bound option too but it wasn’t as clear what it meant or why I should use one or the other (or both).  I’m not a model checking expert, just an occasional dabbler. :). If there’s some kind of “try this, then try that” kind of best-practice process for playing with the various bounds that you find kinda works, it might be nice to add that to the Concuerror documentation, since the current documentation basically just says that the bound parameters exist without saying much about what they mean or how to use them.

In any case, nice work on Concuerrer, and thanks for your help!  I’m letting the maybe-a-week-long Concuerror run go and we’ll see if it’s a day or a week or a century. ;)

Cheers
Bryan

> On 7 Mar 2020, at 14:18, Stavros Aronis <aronisstav@REDACTED> wrote:
> 
> Hi Bryan!
> 
> Thank you for trying out Concuerror!
> 
> You can express nondeterministic choice with a timeout, like this:
> 
> R = make_ref(),
> S = self(),
> spwan(fun() -> S ! R end),
> Choice =
>  receive
>   Ref -> 1
>  after
>    0 -> 2
> end
> 
> This is gonna produce the two choices depending on the scheduling of
> the delivery of the message before the timeout or after. It is
> important to use a ref and a pattern with it, so that this message
> doesn't interfere with any other messages.
> 
> As a co-author of Concuerror and relevant publications I agree with
> the engineering sentiment that "model checking will never finish" when
> applied to examples that are not painfully examined and minimized as
> much as possible. Just in case you missed it, however, Concuerror also
> has a --scheduling_bound option that does not do exhaustive search and
> might yield something interesting.
> 
> Finally, the "remaining time" estimate is a bit like a magic 8-ball. A
> week is not that bad and might actually mean a few hours or a day; a
> century, on the other hand, is fairly likely to be bad...!
> 
> Regards,
> 
> Stavros
> 
> On Sat, Mar 7, 2020 at 1:46 PM Bryan Ford <brynosaurus@REDACTED> wrote:
>> 
>> On 6 Mar 2020, at 12:32, Loïc Hoguin <essen@REDACTED> wrote:
>> 
>> 
>> You can test it using ConcuError.
>> 
>> "Concuerror is a model checking tool for debugging, testing and verifying concurrent Erlang programs."
>> 
>> https://concuerror.com/
>> 
>> 
>> Thanks - good to know there’s a model checker for Erlang.
>> 
>> I had already model checked an earlier formulation of QSC and TLC written in Promela with the SPIN model checker (see https://github.com/dedis/tlc/tree/master/spin).  That version model checks in just a couple minutes, partly because it’s hugely simplified so that “messages” are just bits and such.  So I’m pretty confident in the algorithm’s basic logic but it’s not a “real” implementation of course.
>> 
>> I’d of course love to have a model-checked implementation in a real programming language like Erlang, so I spent a bit of time getting it to be compatible with Concuerror, e.g., by eliminating some false-positive errors that the test synchronization was producing.  Concuerror didn’t find any errors after running for about a day, but at that point it estimated another week or so to go before the test run got interrupted.  I further simplified the test synchronization to reduce interleavings more; we’ll see how this version model checks.
>> 
>> One potential issue is that QSC is a randomized algorithm, and model checking isn’t perfectly suited towards randomized algorithms.  In the Promela/SPIN model, I could model random number selection as a simple nondeterministic choice, which works for checking safety but of course does nothing to verify (probabilistic) liveness.  I can live with that.  The Erlang version calls rand:uniform/1 to pick random numbers, and I’m not entirely sure how Concuerror deals with that.  I guess the considerations for non-deterministic functions probably apply (https://concuerror.com/faq/#how-does-concuerror-handle-non-deterministic-functions), which means (for any thread or execution trace?) it just picks one random number and memoizes that, and doesn’t explore the space of all the random numbers it could have picked.  That in principle makes the model checking non-exhaustive, so it could in principle miss errors even if it ever finishes.  I didn’t see in the Concuerror documentation any obvious way to express a “nondeterministic choice” for exhaustive search as in Promela/SPIN.
>> 
>> I tried creating a “random number generator” that chooses between 1 and 2 by setting up a race between two threads sending two messages.  This in principle transforms the “random number generation” into an interleaving choice, which is the state space Concuerror is designed to explore.  But surprise surprise, that led to a further explosion in the estimated time Concuerror model checking would take assuming it ever finishes.
>> 
>> So while a nice exercise, I guess this is another data point supporting the rule of thumb that model checking is great in principle but if the concurrent code isn’t designed to be hopelessly unrealistically trivial, model checking will never finish. :)
>> 
>> Cheers
>> Bryan
>> 
>> Cheers,
>> 
>> On 06/03/2020 11:52, Bryan Ford wrote:
>> 
>> Hi all,
>> I’ve heard about Erlang for a long time, but finally had reason to try it.
>> So I used it to prototype a simple model of Que Sera Consensus (QSC), a new experimental asynchronous consensus algorithm.  I was impressed to discover that the complete, working algorithm - though just running on a single machine for now - is expressible in only about 37 code lines as counted by CLOC, not counting test code.  Erlang’s selective receive feature, in particular, seems to have been presciently designed three decades ago just to implement the Threshold Logical Clocks (TLC) communication primitive that QSC builds on.
>> If anyone feels like poking at it, helping to find any flaws, or building more realistic cross-node implementations and experimenting with it in real distributed applications, the code lives here:
>> https://github.com/dedis/tlc/tree/master/erlang/model
>> I’d be happy to hear any feedback or experiences, either privately, on this list, or via the above GitHub repo.  Is it comprehensible?  Can you break it?  Can you find a competitively small and simple implementation of, say, Paxos or Raft or any other comparable consensus algorithm in Erlang?
>> For background, the actual algorithms are described in this preprint:
>> Que Sera Consensus: Simple Asynchronous Agreement with Private Coins and Threshold Logical Clocks
>> https://arxiv.org/abs/2003.02291
>> Enjoy.  And by the way, nice language. :)
>> Cheers
>> Bryan
>> 
>> 
>> --
>> Loïc Hoguin
>> https://ninenines.eu
>> 
>> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: Message signed with OpenPGP
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20200307/af954324/attachment.bin>


More information about the erlang-questions mailing list