<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Tue, Jul 26, 2016 at 12:11 PM, Danil Zagoskin <span dir="ltr"><<a href="mailto:z@gosk.in" target="_blank">z@gosk.in</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hello!</div></blockquote><div> </div><div>Hello!</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div></div><div>Inspecting the process gives me<br></div><div><div><font face="monospace, monospace">    lock      id  #tries  #collisions  collisions [%]  time [us]  duration [%] histogram [log2(us)]</font></div><div><font face="monospace, monospace">    -----     --- ------- ------------ --------------- ---------- ------------- ---------------------</font></div><div><font face="monospace, monospace"> flu_pulsedb  proc_main   6989     4242     60.6954   1161759    11.5906 |    ...........XX...    |</font></div><div><font face="monospace, monospace"> flu_pulsedb  proc_msgq   7934     3754     47.3154   669383     6.6783 |    .xXXxxXxx..xXX...    |</font></div><div><font face="monospace, monospace"> flu_pulsedb proc_status   5489      521      9.4917   287153     2.8649 |   ..xxxxxxxxxxxXX..     |</font></div><div><font face="monospace, monospace"> flu_pulsedb  proc_link   864      44      5.0926     566     0.0056 |    ...XxX.....       |</font></div><div><font face="monospace, monospace"> flu_pulsedb   proc_btm    0       0      0.0000      0     0.0000 |                |</font></div><div><font face="monospace, monospace"> flu_pulsedb  proc_trace   201       0      0.0000      0     0.0000 |                |</font></div><div><font face="monospace, monospace"><br></font></div><div>Context: this process is a data receiver. Each sender first checks its message queue length and then sends a message</div><div>  if queue is not very long. This happens about 220 times a second. Then this process accumulates some data and</div><div>  writes it to disk periodically.</div>What do proc_main, proc_msgq and proc_status locks mean?</div></div></blockquote><div><br></div><div>proc_main is the main execution lock, proc_msgq is the lock protecting the external message queue, and proc_status is the lock protecting the process status ans various other things. </div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>Why at all are collisions possible here?</div></div></blockquote><div><br></div><div>When doing various operations, different locks are needed in order to guarantee the order of events. For instance when sending a message the proc_msgq lock is needed. However when checking the size of the message queue both the proc_main, proc_msgq are needed. So if many processes continually check the message queue size of a single other process you will get a lot of conflict on both the main and msgq lock.  </div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>What should I see next to optimise this bottleneck?</div></div></blockquote><div><br></div><div>Don't check the message queue length of another process unless you really have to, and if you do have to, do it as seldom as you can. Checking the length of a message queue is a deceptively expensive operation.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><br></div><div>Next, inspecting db_hash_slot gives me 20 rows all alike (only top few shown):<br></div><div><div><font face="monospace, monospace">     lock  id  #tries  #collisions  collisions [%]  time [us]  duration [%] histogram [log2(us)]</font></div><div><font face="monospace, monospace">    ----- --- ------- ------------ --------------- ---------- ------------- ---------------------</font></div><div><font face="monospace, monospace"> db_hash_slot  0   492      299     60.7724   107552     1.0730 |        ...XX. .     |</font></div><div><font face="monospace, monospace"> db_hash_slot  1   492      287     58.3333   101951     1.0171 |       .  ..XX. .     |</font></div><div><font face="monospace, monospace"> db_hash_slot  48   480      248     51.6667    99486     0.9925 |        ...xXx.     |</font></div><div><font face="monospace, monospace"> db_hash_slot  47   480      248     51.6667    96443     0.9622 |        ...XXx      |</font></div><div><font face="monospace, monospace"> db_hash_slot  2   574      304     52.9617    92952     0.9274 |      . ....XX. .     |</font></div></div><div><font face="monospace, monospace"><br></font></div>How do I see what ETS tables are causing this high collision rate?<div>Is there any way to map lock id (here: 0, 1, 48, 47, 2) to a table id?</div></div></blockquote><div><br></div><div>iirc the id used in the lock checker should be the same as the table id.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>Or maybe there is a better tool for ETS profiling?</div></div></blockquote><div><br></div><div>for detection of ETS conflicts there is no better tool. For general ETS performance, I would use the same tools as for all erlang code, i.e. cprof, eprof, fprof and friends. </div><div><br></div><div>Lukas</div></div></div></div>