Asynchronous consensus in <40 code lines of Erlang

Bryan Ford brynosaurus@REDACTED
Sat Mar 7 13:07:53 CET 2020


On 6 Mar 2020, at 12:32, Loïc Hoguin <essen@REDACTED> wrote:
> 
> You can test it using ConcuError.
> 
> "Concuerror is a model checking tool for debugging, testing and verifying concurrent Erlang programs."
> 
> https://concuerror.com/ <https://concuerror.com/>

Thanks - good to know there’s a model checker for Erlang.

I had already model checked an earlier formulation of QSC and TLC written in Promela with the SPIN model checker (see https://github.com/dedis/tlc/tree/master/spin <https://github.com/dedis/tlc/tree/master/spin>).  That version model checks in just a couple minutes, partly because it’s hugely simplified so that “messages” are just bits and such.  So I’m pretty confident in the algorithm’s basic logic but it’s not a “real” implementation of course.

I’d of course love to have a model-checked implementation in a real programming language like Erlang, so I spent a bit of time getting it to be compatible with Concuerror, e.g., by eliminating some false-positive errors that the test synchronization was producing.  Concuerror didn’t find any errors after running for about a day, but at that point it estimated another week or so to go before the test run got interrupted.  I further simplified the test synchronization to reduce interleavings more; we’ll see how this version model checks.

One potential issue is that QSC is a randomized algorithm, and model checking isn’t perfectly suited towards randomized algorithms.  In the Promela/SPIN model, I could model random number selection as a simple nondeterministic choice, which works for checking safety but of course does nothing to verify (probabilistic) liveness.  I can live with that.  The Erlang version calls rand:uniform/1 to pick random numbers, and I’m not entirely sure how Concuerror deals with that.  I guess the considerations for non-deterministic functions probably apply (https://concuerror.com/faq/#how-does-concuerror-handle-non-deterministic-functions <https://concuerror.com/faq/#how-does-concuerror-handle-non-deterministic-functions>), which means (for any thread or execution trace?) it just picks one random number and memoizes that, and doesn’t explore the space of all the random numbers it could have picked.  That in principle makes the model checking non-exhaustive, so it could in principle miss errors even if it ever finishes.  I didn’t see in the Concuerror documentation any obvious way to express a “nondeterministic choice” for exhaustive search as in Promela/SPIN.

I tried creating a “random number generator” that chooses between 1 and 2 by setting up a race between two threads sending two messages.  This in principle transforms the “random number generation” into an interleaving choice, which is the state space Concuerror is designed to explore.  But surprise surprise, that led to a further explosion in the estimated time Concuerror model checking would take assuming it ever finishes.

So while a nice exercise, I guess this is another data point supporting the rule of thumb that model checking is great in principle but if the concurrent code isn’t designed to be hopelessly unrealistically trivial, model checking will never finish. :)

Cheers
Bryan

> Cheers,
> 
> On 06/03/2020 11:52, Bryan Ford wrote:
>> Hi all,
>> I’ve heard about Erlang for a long time, but finally had reason to try it.
>> So I used it to prototype a simple model of Que Sera Consensus (QSC), a new experimental asynchronous consensus algorithm.  I was impressed to discover that the complete, working algorithm - though just running on a single machine for now - is expressible in only about 37 code lines as counted by CLOC, not counting test code.  Erlang’s selective receive feature, in particular, seems to have been presciently designed three decades ago just to implement the Threshold Logical Clocks (TLC) communication primitive that QSC builds on.
>> If anyone feels like poking at it, helping to find any flaws, or building more realistic cross-node implementations and experimenting with it in real distributed applications, the code lives here:
>> https://github.com/dedis/tlc/tree/master/erlang/model
>> I’d be happy to hear any feedback or experiences, either privately, on this list, or via the above GitHub repo.  Is it comprehensible?  Can you break it?  Can you find a competitively small and simple implementation of, say, Paxos or Raft or any other comparable consensus algorithm in Erlang?
>> For background, the actual algorithms are described in this preprint:
>> Que Sera Consensus: Simple Asynchronous Agreement with Private Coins and Threshold Logical Clocks
>> https://arxiv.org/abs/2003.02291
>> Enjoy.  And by the way, nice language. :)
>> Cheers
>> Bryan
> 
> --
> Loïc Hoguin
> https://ninenines.eu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20200307/a5b5fb9f/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: Message signed with OpenPGP
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20200307/a5b5fb9f/attachment.bin>


More information about the erlang-questions mailing list