[erlang-questions] High lock contention on dist_tables

Lukas Larsson lukas@REDACTED
Tue Apr 23 09:30:32 CEST 2013


I don't know of an alternative way. Before trying to come up with a
solution I would verify that it is nodes() which is causing the contention.

If you do lcnt:inspect(dist_table,[{locations,true}]) you should get a list
of which actual locks in the source code it is that the contentions are
happening.


On Mon, Apr 22, 2013 at 10:23 PM, Brian Picciano
<mediocregopher@REDACTED>wrote:

> We are actually. Is there an alternative way of easily retrieving which
> nodes are currently connected?
>
>
> On Mon, Apr 22, 2013 at 12:06 PM, Lukas Larsson <
> lukas@REDACTED> wrote:
>
>> The dist_table mutex refers to the rwmutex which is defined here[1].
>> There is a bunch of different places where it is used, so saying exactly
>> what is causing the contentions is hard without knowing the code. Generally
>> it should indicate that you are trying to send many messages over
>> distribution while information about remote nodes is changing frequently.
>>
>> One thing I noticed is that the nodes() bif call takes a rwlock on the
>> mutex. Are you using that bif alot?
>>
>> Lukas
>>
>>    [1]:
>> https://github.com/erlang/otp/blob/maint/erts/emulator/beam/erl_node_tables.c#L802
>>
>>
>> On Fri, Apr 19, 2013 at 9:24 PM, Brian Picciano <mediocregopher@REDACTED
>> > wrote:
>>
>>> We have a pool of 3 erlang nodes, all on different servers. Every
>>> afternoon, without fail, we start seeing lots of messages between the nodes
>>> start having really high latency, on the order of tens of seconds. Today we
>>> ran lcnt on them to see if there's anything there, and found that on one of
>>> the nodes dist_tables had a significantly higher lock percentage then
>>> anything else, and definitely higher then on the other boxes:
>>>
>>> (node@REDACTED)8> lcnt:conflicts().
>>>
>>>                  lock     id   #tries  #collisions  collisions [%]  time
>>> [us]  duration [%]
>>>                 -----    ---  ------- ------------ ---------------
>>> ---------- -------------
>>>            dist_table      1  3468191      1242055         35.8128
>>>  153712413      255.2521
>>>             run_queue     24 76969638      4088578          5.3119
>>> 14468656       24.0264
>>>         process_table      1  2015686       147148          7.3001
>>>  3208529        5.3280
>>>           timer_wheel      1 12214948       834737          6.8337
>>>  3076638        5.1090
>>>             timeofday      1 18231600       594487          3.2608
>>>  1491633        2.4770
>>> ...
>>>
>>> while on the other boxes it had closer to 3. On the box with the high
>>> lock contention we also saw much higher load then on the other boxes.
>>>
>>> My question is: what is this lock? We couldn't find much online except
>>> that it appears to have to do with communication between nodes, but we're
>>> not sure what. Also, what, if anything, could we do to mitigate this
>>> problem?
>>>
>>> (We're running erlang 16B)
>>>
>>> _______________________________________________
>>> erlang-questions mailing list
>>> erlang-questions@REDACTED
>>> http://erlang.org/mailman/listinfo/erlang-questions
>>>
>>>
>>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130423/dfb17374/attachment.htm>


More information about the erlang-questions mailing list