Asynchronous consensus in <40 code lines of Erlang
Stavros Aronis
aronisstav@REDACTED
Sat Mar 7 14:18:26 CET 2020
Hi Bryan!
Thank you for trying out Concuerror!
You can express nondeterministic choice with a timeout, like this:
R = make_ref(),
S = self(),
spwan(fun() -> S ! R end),
Choice =
receive
Ref -> 1
after
0 -> 2
end
This is gonna produce the two choices depending on the scheduling of
the delivery of the message before the timeout or after. It is
important to use a ref and a pattern with it, so that this message
doesn't interfere with any other messages.
As a co-author of Concuerror and relevant publications I agree with
the engineering sentiment that "model checking will never finish" when
applied to examples that are not painfully examined and minimized as
much as possible. Just in case you missed it, however, Concuerror also
has a --scheduling_bound option that does not do exhaustive search and
might yield something interesting.
Finally, the "remaining time" estimate is a bit like a magic 8-ball. A
week is not that bad and might actually mean a few hours or a day; a
century, on the other hand, is fairly likely to be bad...!
Regards,
Stavros
On Sat, Mar 7, 2020 at 1:46 PM Bryan Ford <brynosaurus@REDACTED> wrote:
>
> On 6 Mar 2020, at 12:32, Loïc Hoguin <essen@REDACTED> wrote:
>
>
> You can test it using ConcuError.
>
> "Concuerror is a model checking tool for debugging, testing and verifying concurrent Erlang programs."
>
> https://concuerror.com/
>
>
> Thanks - good to know there’s a model checker for Erlang.
>
> I had already model checked an earlier formulation of QSC and TLC written in Promela with the SPIN model checker (see https://github.com/dedis/tlc/tree/master/spin). That version model checks in just a couple minutes, partly because it’s hugely simplified so that “messages” are just bits and such. So I’m pretty confident in the algorithm’s basic logic but it’s not a “real” implementation of course.
>
> I’d of course love to have a model-checked implementation in a real programming language like Erlang, so I spent a bit of time getting it to be compatible with Concuerror, e.g., by eliminating some false-positive errors that the test synchronization was producing. Concuerror didn’t find any errors after running for about a day, but at that point it estimated another week or so to go before the test run got interrupted. I further simplified the test synchronization to reduce interleavings more; we’ll see how this version model checks.
>
> One potential issue is that QSC is a randomized algorithm, and model checking isn’t perfectly suited towards randomized algorithms. In the Promela/SPIN model, I could model random number selection as a simple nondeterministic choice, which works for checking safety but of course does nothing to verify (probabilistic) liveness. I can live with that. The Erlang version calls rand:uniform/1 to pick random numbers, and I’m not entirely sure how Concuerror deals with that. I guess the considerations for non-deterministic functions probably apply (https://concuerror.com/faq/#how-does-concuerror-handle-non-deterministic-functions), which means (for any thread or execution trace?) it just picks one random number and memoizes that, and doesn’t explore the space of all the random numbers it could have picked. That in principle makes the model checking non-exhaustive, so it could in principle miss errors even if it ever finishes. I didn’t see in the Concuerror documentation any obvious way to express a “nondeterministic choice” for exhaustive search as in Promela/SPIN.
>
> I tried creating a “random number generator” that chooses between 1 and 2 by setting up a race between two threads sending two messages. This in principle transforms the “random number generation” into an interleaving choice, which is the state space Concuerror is designed to explore. But surprise surprise, that led to a further explosion in the estimated time Concuerror model checking would take assuming it ever finishes.
>
> So while a nice exercise, I guess this is another data point supporting the rule of thumb that model checking is great in principle but if the concurrent code isn’t designed to be hopelessly unrealistically trivial, model checking will never finish. :)
>
> Cheers
> Bryan
>
> Cheers,
>
> On 06/03/2020 11:52, Bryan Ford wrote:
>
> Hi all,
> I’ve heard about Erlang for a long time, but finally had reason to try it.
> So I used it to prototype a simple model of Que Sera Consensus (QSC), a new experimental asynchronous consensus algorithm. I was impressed to discover that the complete, working algorithm - though just running on a single machine for now - is expressible in only about 37 code lines as counted by CLOC, not counting test code. Erlang’s selective receive feature, in particular, seems to have been presciently designed three decades ago just to implement the Threshold Logical Clocks (TLC) communication primitive that QSC builds on.
> If anyone feels like poking at it, helping to find any flaws, or building more realistic cross-node implementations and experimenting with it in real distributed applications, the code lives here:
> https://github.com/dedis/tlc/tree/master/erlang/model
> I’d be happy to hear any feedback or experiences, either privately, on this list, or via the above GitHub repo. Is it comprehensible? Can you break it? Can you find a competitively small and simple implementation of, say, Paxos or Raft or any other comparable consensus algorithm in Erlang?
> For background, the actual algorithms are described in this preprint:
> Que Sera Consensus: Simple Asynchronous Agreement with Private Coins and Threshold Logical Clocks
> https://arxiv.org/abs/2003.02291
> Enjoy. And by the way, nice language. :)
> Cheers
> Bryan
>
>
> --
> Loïc Hoguin
> https://ninenines.eu
>
>
More information about the erlang-questions
mailing list