[erlang-questions] High lock contention on dist_tables

Lukas Larsson lukas@REDACTED
Mon Apr 22 18:06:43 CEST 2013


The dist_table mutex refers to the rwmutex which is defined here[1]. There
is a bunch of different places where it is used, so saying exactly what is
causing the contentions is hard without knowing the code. Generally it
should indicate that you are trying to send many messages over distribution
while information about remote nodes is changing frequently.

One thing I noticed is that the nodes() bif call takes a rwlock on the
mutex. Are you using that bif alot?

Lukas

   [1]:
https://github.com/erlang/otp/blob/maint/erts/emulator/beam/erl_node_tables.c#L802


On Fri, Apr 19, 2013 at 9:24 PM, Brian Picciano <mediocregopher@REDACTED>wrote:

> We have a pool of 3 erlang nodes, all on different servers. Every
> afternoon, without fail, we start seeing lots of messages between the nodes
> start having really high latency, on the order of tens of seconds. Today we
> ran lcnt on them to see if there's anything there, and found that on one of
> the nodes dist_tables had a significantly higher lock percentage then
> anything else, and definitely higher then on the other boxes:
>
> (node@REDACTED)8> lcnt:conflicts().
>
>                  lock     id   #tries  #collisions  collisions [%]  time
> [us]  duration [%]
>                 -----    ---  ------- ------------ ---------------
> ---------- -------------
>            dist_table      1  3468191      1242055         35.8128
>  153712413      255.2521
>             run_queue     24 76969638      4088578          5.3119
> 14468656       24.0264
>         process_table      1  2015686       147148          7.3001
>  3208529        5.3280
>           timer_wheel      1 12214948       834737          6.8337
>  3076638        5.1090
>             timeofday      1 18231600       594487          3.2608
>  1491633        2.4770
> ...
>
> while on the other boxes it had closer to 3. On the box with the high lock
> contention we also saw much higher load then on the other boxes.
>
> My question is: what is this lock? We couldn't find much online except
> that it appears to have to do with communication between nodes, but we're
> not sure what. Also, what, if anything, could we do to mitigate this
> problem?
>
> (We're running erlang 16B)
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130422/66b50cc8/attachment.htm>


More information about the erlang-questions mailing list