[erlang-questions] Global lock is unfair?

Tue May 15 20:40:10 CEST 2018

On Tue, May 15, 2018 at 7:53 PM Ryan Stewart <zzantozz@REDACTED> wrote:

> Thanks for the pointer, Andrew. The process per resource would work, but
> how would you ensure that only one process is started per resource? The
> only way I know would be to write a central "manager" process that keeps
> track of the resource processes and directs traffic according to the
> resource "key", ensuring a new process is started when a new resource comes
> in demand. On top of that, the manager has to be aware of other nodes in
> the cluster and resource processes that might already exist elsewhere
> That's still a lot more work than seems necessary.
>
>
What are your failure semantics?

One "solution" is to shard the key to a node, and register the process
locally on that node. Since only one can be registered at a time, this work
somewhat and provides a lock on the resource. I.e., just steal riak_core
and use it :P

Another solution, which you shouldn't dismiss, is to put all locks on one
node only. This gives you a single-point-of-failure, but unless you need
several 9's of uptime, the solution is easy to implement, easy to reason
about and the 9's it gives you are very nice to have. Beware the advanced
algorithm which has a bug because it can easily make your reliability
measure in 9's _worse_ than having a deliberate S.P.O.F you understand.

However,

* Failure semantics matter. If one node goes down, all the locks on that
node will be lost, etc.
* Ulf Wiger's locks library should work in a distributed setting and solve
this problem, but with a different set of failure semantics.

Usually, a distributed lock that is truly resistant to node failure and so
on is a hard problem. Theoretically, because just coming up with an
algorithm is hard (Raft, PAXOS (multi-PAXOS, Fast-PAXOS, etc). Practically
because implementing said algorithms in an error-free way is equally hard.

If you haven't, the SRE Handbook by Google has a chapter[0] on this. It is
a decent survey which gives you an overview of the problem space without
delving too much into Erlang. Once you know what you are looking for and
what tooling you have, it is usually easier to pick a solution.

[0]
https://landing.google.com/sre/book/chapters/managing-critical-state.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20180515/e1edaa72/attachment.htm>