<div dir="ltr"><div>The dist_table mutex refers to the rwmutex which is defined here[1]. There is a bunch of different places where it is used, so saying exactly what is causing the contentions is hard without knowing the code. Generally it should indicate that you are trying to send many messages over distribution while information about remote nodes is changing frequently. <br>
<br></div>One thing I noticed is that the nodes() bif call takes a rwlock on the mutex. Are you using that bif alot?<br><div><br>Lukas<br><br> [1]: <a href="https://github.com/erlang/otp/blob/maint/erts/emulator/beam/erl_node_tables.c#L802">https://github.com/erlang/otp/blob/maint/erts/emulator/beam/erl_node_tables.c#L802</a><br>
</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Apr 19, 2013 at 9:24 PM, Brian Picciano <span dir="ltr"><<a href="mailto:mediocregopher@gmail.com" target="_blank">mediocregopher@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>We have a pool of 3 erlang nodes, all on different servers. Every afternoon, without fail, we start seeing lots of messages between the nodes start having really high latency, on the order of tens of seconds. Today we ran lcnt on them to see if there's anything there, and found that on one of the nodes dist_tables had a significantly higher lock percentage then anything else, and definitely higher then on the other boxes:</div>
<div><br></div><div><div>(node@address)8> lcnt:conflicts().</div><div> </div><div> lock id #tries #collisions collisions [%] time [us] duration [%]</div>
<div> ----- --- ------- ------------ --------------- ---------- -------------</div><div> dist_table 1 3468191 1242055 35.8128 153712413 255.2521</div><div> run_queue 24 76969638 4088578 5.3119 14468656 24.0264 </div>
<div> process_table 1 2015686 147148 7.3001 3208529 5.3280 </div><div> timer_wheel 1 12214948 834737 6.8337 3076638 5.1090 </div><div> timeofday 1 18231600 594487 3.2608 1491633 2.4770 </div>
<div>...</div><div><br></div><div>while on the other boxes it had closer to 3. On the box with the high lock contention we also saw much higher load then on the other boxes.</div><div><br></div><div>My question is: what is this lock? We couldn't find much online except that it appears to have to do with communication between nodes, but we're not sure what. Also, what, if anything, could we do to mitigate this problem?</div>
<div><br></div><div>(We're running erlang 16B)</div></div></div>
<br>_______________________________________________<br>
erlang-questions mailing list<br>
<a href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a><br>
<a href="http://erlang.org/mailman/listinfo/erlang-questions" target="_blank">http://erlang.org/mailman/listinfo/erlang-questions</a><br>
<br></blockquote></div><br></div>