[erlang-questions] High lock contention on dist_tables

Brian Picciano mediocregopher@REDACTED
Mon Apr 22 22:23:48 CEST 2013


We are actually. Is there an alternative way of easily retrieving which
nodes are currently connected?


On Mon, Apr 22, 2013 at 12:06 PM, Lukas Larsson
<lukas@REDACTED>wrote:

> The dist_table mutex refers to the rwmutex which is defined here[1]. There
> is a bunch of different places where it is used, so saying exactly what is
> causing the contentions is hard without knowing the code. Generally it
> should indicate that you are trying to send many messages over distribution
> while information about remote nodes is changing frequently.
>
> One thing I noticed is that the nodes() bif call takes a rwlock on the
> mutex. Are you using that bif alot?
>
> Lukas
>
>    [1]:
> https://github.com/erlang/otp/blob/maint/erts/emulator/beam/erl_node_tables.c#L802
>
>
> On Fri, Apr 19, 2013 at 9:24 PM, Brian Picciano <mediocregopher@REDACTED>wrote:
>
>> We have a pool of 3 erlang nodes, all on different servers. Every
>> afternoon, without fail, we start seeing lots of messages between the nodes
>> start having really high latency, on the order of tens of seconds. Today we
>> ran lcnt on them to see if there's anything there, and found that on one of
>> the nodes dist_tables had a significantly higher lock percentage then
>> anything else, and definitely higher then on the other boxes:
>>
>> (node@REDACTED)8> lcnt:conflicts().
>>
>>                  lock     id   #tries  #collisions  collisions [%]  time
>> [us]  duration [%]
>>                 -----    ---  ------- ------------ ---------------
>> ---------- -------------
>>            dist_table      1  3468191      1242055         35.8128
>>  153712413      255.2521
>>             run_queue     24 76969638      4088578          5.3119
>> 14468656       24.0264
>>         process_table      1  2015686       147148          7.3001
>>  3208529        5.3280
>>           timer_wheel      1 12214948       834737          6.8337
>>  3076638        5.1090
>>             timeofday      1 18231600       594487          3.2608
>>  1491633        2.4770
>> ...
>>
>> while on the other boxes it had closer to 3. On the box with the high
>> lock contention we also saw much higher load then on the other boxes.
>>
>> My question is: what is this lock? We couldn't find much online except
>> that it appears to have to do with communication between nodes, but we're
>> not sure what. Also, what, if anything, could we do to mitigate this
>> problem?
>>
>> (We're running erlang 16B)
>>
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130422/f8c8f7d1/attachment.htm>


More information about the erlang-questions mailing list