[erlang-questions] Ets read concurrency tweak

Rickard Green rickard@REDACTED
Sun May 17 20:31:36 CEST 2015

On Thu, May 14, 2015 at 12:29 PM, Viacheslav V. Kovalev
<kovyl2404@REDACTED> wrote:
> Hi, List!
> I'm playing with ets tweaks and specifically with read_concurrency.
> I've written simple test to measure how this tweak impacts on read
> performance. Test implementations is here
> https://gist.github.com/kovyl2404/826a51b27ba869527910
> Briefly, this test sequentially creates three [public, set] ets tables
> with different read_concurrency options (without any tweaks, with
> {read_concurrency, true} and with {read_concurrency, false}). After
> one table created, test pupulates table with some random data and runs
> N readers (N is power of 2 from 4 to 1024). Readers performs K random
> reads and exists.
> Result is quite surprising for me. There absolutely no difference
> between 3 these tests. I have run this test on Erlang 17 on 2-, 8-,
> and 64-core machines and could not find any significant performance
> impact from this tweak.
> Could anybody explain usecase of this tweak? What should I do to see
> any difference and understand when to use this option?
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions

I haven't had the time to look at your code, so I cannot tell you why
you are not getting the results you expected. Here is some information
about the option, though.

When the {read_concurrency, true} option is passed, reader optimized
rwlocks are used instead of ordinary rwlocks. When reader optimized
rwlocks are used, threads performing read-locking notify about their
presence in separate cache lines, and by this avoid ping-ponging of a
common cache-line between caches.

Write-locking of a reader optimized rwlock is more expensive than
write-locking an ordinary rwlock, so if you have a large amount of
write operation you don't want to use the read_concurrency option. The
largest performance improvement will be seen when there are no
write-locking at all.

In order to determine if it is beneficial to use the option in your
use-case, you need to observe your system when it is executing under
expected load and without effecting it too much while observing it. In
your case it might be that eprof is effecting the execution too much,
but that is just a guess.

The improvement varies a lot depending on hardware. The more expensive
it is to keep a common cache line up to date in all involved caches,
the larger the performance improvement will be. It will typically be
more expensive, the further away cores are from each other and the
more cores that are involved.

I've attached a small benchmark that illustrates the effect. When run on:

Intel(R) Core(TM) i7-4600U CPU @ 2.10GHz

Eshell V6.3  (abort with ^G)
1> erlang:system_info(cpu_topology).

Without read_concurrency an execution time of about 0.85-1.0 seconds.
With read_concurrency 0.75-0.8 seconds.

When run on:

AMD Opteron(tm) Processor 4376 HE

Eshell V6.4.1  (abort with ^G)
1> erlang:system_info(cpu_topology).

Without read_concurrency an execution time of about 39-41 seconds.
With read_concurrency 1.1-1.2 seconds.

Rickard Green, Erlang/OTP, Ericsson AB
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ets_rc_test.erl
Type: text/x-erlang
Size: 708 bytes
Desc: not available
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20150517/35a38672/attachment.bin>

More information about the erlang-questions mailing list