Use of spinlocks in Erlang

Jaka Bac jakabac@REDACTED
Thu Jan 16 10:40:57 CET 2020

Since this question has popped up, I looked at how the ethread part of
locking is written for Windows (as opposed to Linux). The waits here also
seem to spin potentially (on both Windows and POSIX implementation)
Linux implementation is using futexes while standard Windows events are
used on Windows.

The question is what is the earliest version of Windows that should be
officially supported since starting with Windows 8, there is a mechanism
similar to futexes available (and everything which does not have this
feature is pretty much EOL'ed from MS now)

I know that Windows port of Erlang is not a priority, but would it make
sense to update the thread events to use WaitOnAddress/WakeByAddressXXX?
Before I do any experiments by myself I would like to know if the
Erlang/OTP gurus see a potential gain here.


On Wed, 15 Jan 2020 at 23:03, Rickard Green <rickard@REDACTED> wrote:

> On Wed, Jan 15, 2020 at 5:54 PM Andreas Schultz <
> andreas.schultz@REDACTED> wrote:
> >
> >
> >
> > Am Mi., 15. Jan. 2020 um 17:41 Uhr schrieb Jesper Louis Andersen <
> jesper.louis.andersen@REDACTED>:
> >>
> >> On Tue, Jan 7, 2020 at 10:01 AM Andreas Schultz <
> andreas.schultz@REDACTED> wrote:
> >>>
> >>>
> >>> Could any of the Erlang gurus or ERTS developers comment on the use of
> home grown mutexes and spinlocks in Erlang in light of that discussion?
> >>>
> >>
> >> Not a locking expert in the ERTS, but:
> >>
> >> The locking API is there to provide a factoring point for the BEAM. The
> idea is that we have a locking API to use in ERTS which then maps to
> different implementations as we see fit. This allows us to target both
> UNIX, Windows, and embedded OS'es at the same time, with a consistent API.
> But it gets better than that!
> >>
> >> We can use the factoring point to "plug in" different locking models
> and try them out. This allows us to choose a locking model which works
> well. It also allows us to plug in various debugging/correctness checks at
> compile time. For instance that lock ordering is preserved, if locks are
> contended on, or if a deadlock happened.
> >>
> >> That is, the current locking model is sort-of chosen at compile time,
> based on the capabilities of the underlying system.
> >>
> >> All in all, I'm going to claim that the system is not so much "home
> grown" as it is a sensible wrapper around an underlying system.
> >
> >
> > I have seen the different underlying implementations for the locking
> primitives. There are wrappers for Windows and POSIX threads (pthread). But
> there also seems to be a completely independent implementation of mutexes
> and spinlocks that does not use the OS or libc provided primitives.
> > On Linux that Erlang specific implementation appears to be used by
> default (that is what I called "home grown"). The pthread version is only
> used when Erlang is build with valgrind support (or so it seems).
> >
> > I was thinking about hacking a build to use only the pthread locking
> primitives (without the other valdrind overhead) and run some benchmarks.
> > However, my hope was that some Erlang expert could shed some light in
> the locking before I spent the time to do that.
> >
> > Anyhow, does anybody have a suggestion for an existing benchmark that
> could demonstrate differences between locking primitives?
> >
> > Andreas
> >
> >>
> >>
> >> --
> >> J.
> >
> >
> >
> > --
> >
> > Andreas Schultz
> There are more or less no pure spinlocks left in the VM (there might be in
> some test-code and/or at some initialization phase). The last one at a
> potentially heavily contended place was removed not long ago though <
> We have "homegrown" lock implementations that spin a while trying to
> acquire a lock that is locked. The spinning since it significantly improve
> performance. The spinning is bounded though and a thread not successfully
> acquiring a lock will eventually go to sleep.
> The two major "homegrown" lock implementations:
> -- Process locks --
> Our process lock implementation is mainly a memory optimization. Each
> process (currently) have 5 different locks.The 5 locks of a process
> consumes 6 words in total. When using pthread mutexes 25 words are used (on
> my linux machine). Our process lock implementation also makes it cheaper to
> lock/unlock multiple locks of one processes in one operation. The cost of
> locking all 5 in one operation is the same as locking one of them.
> -- Rwlocks --
> This implementation came about since the NPTL *) rwlocks was not fair. By
> default a reader preferred approach was used by NPTL rwlocks which easily
> could (and did) starve writers. You could configure them as writer
> preferred which was a bit better, but the main problem with starvation
> still remained (in this case of readers). The performance of the NPTL
> rwlocks was also quite bad in most scenarios. I write "was" since I don't
> know if changes has been made since the last time I looked at this.
> Besides being fair our rwlock implementation also have a reader optimized
> mode which improve performance in scenarios where you mostly only read-lock
> the rwlock. This by keeping track of readers in separate cachelines and by
> this avoiding cacheline ping-ponging. On a big smp machine this improvement
> can be huge. Such rwlocks are e.g. used when you pass the read_concurrency
> option to ets tables.
> *) Native POSIX Thread Library (used on Linux)
> The NPTL rwlocks was quite a disappointment, but NPTL mutexes was the
> opposite. The performance was really good. Especially in heavily contended
> cases. On Linux they are used for all "mutex-cases" except for process
> locks.
> Regards,
> Rickard
> --
> Rickard Green, Erlang/OTP, Ericsson AB
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the erlang-questions mailing list