<div dir="ltr">Hello!<div><br></div><div>I am inspecting performance issues in an Erlang/OTP system.</div><div><br></div><div>I've run LCNT and I see results I cannot interpret.</div><div><br></div><div>After lcnt:swap_pid_keys(), I see these conflicts (only top of the table):</div><div><div><font face="monospace, monospace">        lock  id  #tries  #collisions  collisions [%]  time [us]  duration [%]</font></div><div><font face="monospace, monospace">       -----  --- ------- ------------ --------------- ---------- -------------</font></div><div><span style="font-family:monospace,monospace">    db_hash_slot  896  192708     28684     14.8847   8164589    81.4562</span></div><div><font face="monospace, monospace">    flu_pulsedb   6  21477     8561     39.8612   2118861    21.1394</font></div><div><font face="monospace, monospace">     run_queue  42 6427021    169979      2.6448   339177     3.3839</font></div><div><font face="monospace, monospace">      pix_lock 1024   4384      229      5.2235   209468     2.0898</font></div><div><font face="monospace, monospace">      pollset   1  303839     16477      5.4229   134808     1.3449</font></div><div><font face="monospace, monospace">       db_tab  146  261929      926      0.3535   107955     1.0770</font></div><div><br></div>Here I see two problems: lots of ETS locking (db_hash_slot) and something with a particular process (flu_pulsedb).</div><div><br></div><div><br></div><div><br></div><div>Inspecting the process gives me</div><div><div><font face="monospace, monospace">    lock      id  #tries  #collisions  collisions [%]  time [us]  duration [%] histogram [log2(us)]</font></div><div><font face="monospace, monospace">    -----     --- ------- ------------ --------------- ---------- ------------- ---------------------</font></div><div><font face="monospace, monospace"> flu_pulsedb  proc_main   6989     4242     60.6954   1161759    11.5906 |    ...........XX...    |</font></div><div><font face="monospace, monospace"> flu_pulsedb  proc_msgq   7934     3754     47.3154   669383     6.6783 |    .xXXxxXxx..xXX...    |</font></div><div><font face="monospace, monospace"> flu_pulsedb proc_status   5489      521      9.4917   287153     2.8649 |   ..xxxxxxxxxxxXX..     |</font></div><div><font face="monospace, monospace"> flu_pulsedb  proc_link   864      44      5.0926     566     0.0056 |    ...XxX.....       |</font></div><div><font face="monospace, monospace"> flu_pulsedb   proc_btm    0       0      0.0000      0     0.0000 |                |</font></div><div><font face="monospace, monospace"> flu_pulsedb  proc_trace   201       0      0.0000      0     0.0000 |                |</font></div><div><font face="monospace, monospace"><br></font></div><div>Context: this process is a data receiver. Each sender first checks its message queue length and then sends a message</div><div>  if queue is not very long. This happens about 220 times a second. Then this process accumulates some data and</div><div>  writes it to disk periodically.</div>What do proc_main, proc_msgq and proc_status locks mean? Why at all are collisions possible here?</div><div>What should I see next to optimise this bottleneck?</div><div><br></div><div><br></div><div><br></div><div>Next, inspecting db_hash_slot gives me 20 rows all alike (only top few shown):</div><div><div><font face="monospace, monospace">     lock  id  #tries  #collisions  collisions [%]  time [us]  duration [%] histogram [log2(us)]</font></div><div><font face="monospace, monospace">    ----- --- ------- ------------ --------------- ---------- ------------- ---------------------</font></div><div><font face="monospace, monospace"> db_hash_slot  0   492      299     60.7724   107552     1.0730 |        ...XX. .     |</font></div><div><font face="monospace, monospace"> db_hash_slot  1   492      287     58.3333   101951     1.0171 |       .  ..XX. .     |</font></div><div><font face="monospace, monospace"> db_hash_slot  48   480      248     51.6667    99486     0.9925 |        ...xXx.     |</font></div><div><font face="monospace, monospace"> db_hash_slot  47   480      248     51.6667    96443     0.9622 |        ...XXx      |</font></div><div><font face="monospace, monospace"> db_hash_slot  2   574      304     52.9617    92952     0.9274 |      . ....XX. .     |</font></div></div><div><font face="monospace, monospace"><br></font></div>How do I see what ETS tables are causing this high collision rate?<div>Is there any way to map lock id (here: 0, 1, 48, 47, 2) to a table id?</div><div>Or maybe there is a better tool for ETS profiling?<br><div><br> <div><font face="monospace, monospace">-- <br></font><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><font face="'courier new', monospace">Danil Zagoskin | <a href="mailto:z@gosk.in" target="_blank">z@gosk.in</a></font></div></div></div>
</div></div></div></div>