Use of spinlocks in Erlang

Rickard Green rickard@REDACTED
Wed Jan 15 23:02:40 CET 2020


On Wed, Jan 15, 2020 at 5:54 PM Andreas Schultz <
andreas.schultz@REDACTED> wrote:
>
>
>
> Am Mi., 15. Jan. 2020 um 17:41 Uhr schrieb Jesper Louis Andersen <
jesper.louis.andersen@REDACTED>:
>>
>> On Tue, Jan 7, 2020 at 10:01 AM Andreas Schultz <
andreas.schultz@REDACTED> wrote:
>>>
>>>
>>> Could any of the Erlang gurus or ERTS developers comment on the use of
home grown mutexes and spinlocks in Erlang in light of that discussion?
>>>
>>
>> Not a locking expert in the ERTS, but:
>>
>> The locking API is there to provide a factoring point for the BEAM. The
idea is that we have a locking API to use in ERTS which then maps to
different implementations as we see fit. This allows us to target both
UNIX, Windows, and embedded OS'es at the same time, with a consistent API.
But it gets better than that!
>>
>> We can use the factoring point to "plug in" different locking models and
try them out. This allows us to choose a locking model which works well. It
also allows us to plug in various debugging/correctness checks at compile
time. For instance that lock ordering is preserved, if locks are contended
on, or if a deadlock happened.
>>
>> That is, the current locking model is sort-of chosen at compile time,
based on the capabilities of the underlying system.
>>
>> All in all, I'm going to claim that the system is not so much "home
grown" as it is a sensible wrapper around an underlying system.
>
>
> I have seen the different underlying implementations for the locking
primitives. There are wrappers for Windows and POSIX threads (pthread). But
there also seems to be a completely independent implementation of mutexes
and spinlocks that does not use the OS or libc provided primitives.
> On Linux that Erlang specific implementation appears to be used by
default (that is what I called "home grown"). The pthread version is only
used when Erlang is build with valgrind support (or so it seems).
>
> I was thinking about hacking a build to use only the pthread locking
primitives (without the other valdrind overhead) and run some benchmarks.
> However, my hope was that some Erlang expert could shed some light in the
locking before I spent the time to do that.
>
> Anyhow, does anybody have a suggestion for an existing benchmark that
could demonstrate differences between locking primitives?
>
> Andreas
>
>>
>>
>> --
>> J.
>
>
>
> --
>
> Andreas Schultz


There are more or less no pure spinlocks left in the VM (there might be in
some test-code and/or at some initialization phase). The last one at a
potentially heavily contended place was removed not long ago though <
http://erlang.org/download/OTP-22.1.8.README>.

We have "homegrown" lock implementations that spin a while trying to
acquire a lock that is locked. The spinning since it significantly improve
performance. The spinning is bounded though and a thread not successfully
acquiring a lock will eventually go to sleep.

The two major "homegrown" lock implementations:

-- Process locks --

Our process lock implementation is mainly a memory optimization. Each
process (currently) have 5 different locks.The 5 locks of a process
consumes 6 words in total. When using pthread mutexes 25 words are used (on
my linux machine). Our process lock implementation also makes it cheaper to
lock/unlock multiple locks of one processes in one operation. The cost of
locking all 5 in one operation is the same as locking one of them.

-- Rwlocks --

This implementation came about since the NPTL *) rwlocks was not fair. By
default a reader preferred approach was used by NPTL rwlocks which easily
could (and did) starve writers. You could configure them as writer
preferred which was a bit better, but the main problem with starvation
still remained (in this case of readers). The performance of the NPTL
rwlocks was also quite bad in most scenarios. I write "was" since I don't
know if changes has been made since the last time I looked at this.

Besides being fair our rwlock implementation also have a reader optimized
mode which improve performance in scenarios where you mostly only read-lock
the rwlock. This by keeping track of readers in separate cachelines and by
this avoiding cacheline ping-ponging. On a big smp machine this improvement
can be huge. Such rwlocks are e.g. used when you pass the read_concurrency
option to ets tables.

*) Native POSIX Thread Library (used on Linux)


The NPTL rwlocks was quite a disappointment, but NPTL mutexes was the
opposite. The performance was really good. Especially in heavily contended
cases. On Linux they are used for all "mutex-cases" except for process
locks.

Regards,
Rickard
--
Rickard Green, Erlang/OTP, Ericsson AB
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20200115/de789dbf/attachment.htm>


More information about the erlang-questions mailing list