<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Aug 16, 2016 at 12:28 PM, Danil Zagoskin <span dir="ltr"><<a href="mailto:z@gosk.in" target="_blank">z@gosk.in</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span class=""><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Next, inspecting db_hash_slot gives me 20 rows all alike (only top few shown):<br></div><div><div><font face="monospace, monospace"> lock id #tries #collisions collisions [%] time [us] duration [%] histogram [log2(us)]</font></div><div><font face="monospace, monospace"> ----- --- ------- ------------ --------------- ---------- ------------- ---------------------</font></div><div><font face="monospace, monospace"> db_hash_slot 0 492 299 60.7724 107552 1.0730 | ...XX. . |</font></div><div><font face="monospace, monospace"> db_hash_slot 1 492 287 58.3333 101951 1.0171 | . ..XX. . |</font></div><div><font face="monospace, monospace"> db_hash_slot 48 480 248 51.6667 99486 0.9925 | ...xXx. |</font></div><div><font face="monospace, monospace"> db_hash_slot 47 480 248 51.6667 96443 0.9622 | ...XXx |</font></div><div><font face="monospace, monospace"> db_hash_slot 2 574 304 52.9617 92952 0.9274 | . ....XX. . |</font></div></div><div><font face="monospace, monospace"><br></font></div>How do I see what ETS tables are causing this high collision rate?<div>Is there any way to map lock id (here: 0, 1, 48, 47, 2) to a table id?</div></div></blockquote><div><br></div></span><div>iirc the id used in the lock checker should be the same as the table id.</div></div></div></div></blockquote><div><br></div></span><div>Unfortunately, the lock equals a table's hash lock id: <a href="https://github.com/erlang/otp/blob/maint/erts/emulator/beam/erl_db_hash.c#L687" target="_blank">https://github.com/erlang/otp/<wbr>blob/maint/erts/emulator/beam/<wbr>erl_db_hash.c#L687</a></div><div>After changing make_small(i) to tb->common.the_name we were able to see the table name causing locking:</div><div><pre style="margin:1em 1em 1em 1.6em;padding:8px;border:1px solid rgb(226,226,226);width:auto;background-color:rgb(250,250,250)"><font color="#484848">(<a href="mailto:flussonic@127.0.0.1" target="_blank">flussonic@127.0.0.1</a>)22> lcnt:inspect(db_hash_slot, [{max_locks, 10}]).
lock id #tries #collisions collisions [%] time [us] duration [%] histogram [log2(us)]
----- --- ------- ------------ --------------- ---------- ------------- ---------------------
db_hash_slot pulsedb_seconds_data 523 78 14.9140 26329 0.5265 | .. .XXX .. |
db_hash_slot pulsedb_seconds_data 498 77 15.4618 24210 0.4841 | ...xXX. . |
db_hash_slot pulsedb_seconds_data 524 62 11.8321 23082 0.4616 | . ..XX. .. |
db_hash_slot pulsedb_seconds_data 489 74 15.1329 21425 0.4284 | ...XX. . |
db_hash_slot pulsedb_seconds_data 493 79 16.0243 19918 0.3983 | ... .xXX. |
db_hash_slot pulsedb_seconds_data 518 67 12.9344 19298 0.3859 | ....XX.. |
db_hash_slot pulsedb_seconds_data 595 70 11.7647 18947 0.3789 | . ..XX. |
db_hash_slot pulsedb_seconds_data 571 74 12.9597 18638 0.3727 | ....XX. |
db_hash_slot pulsedb_seconds_data 470 61 12.9787 17818 0.3563 | .....XX... |
db_hash_slot pulsedb_seconds_data 475 75 15.7895 17582 0.3516 | xXX. |
ok<br></font></pre></div><div><br></div><div><br></div><div>Should I create a PR for that?</div><div>The result is not perfect — it could be better to see {TableName, LockID} there, but I failed to create a new tuple in that function.</div></div></div></div></blockquote><div><br></div><div>Yes please. Although as you say, the PR should should also contain the lock id so that it's possible to know which hash slot is the culprit. You should be able to just add some memory extra allocation to the erts_alloc call just above the for look and then use the TUPLE2() macro to create the tuple, something like:</div><div><br></div><div><pre style="color:rgb(0,0,0);word-wrap:break-word;white-space:pre-wrap"> tb->locks = (DbTableHashFineLocks*) erts_db_alloc_fnf(ERTS_ALC_T_DB_SEG, /* Other type maybe? */
(DbTable *) tb,
sizeof(DbTableHashFineLocks) + sizeof(Eterm) * DB_HASH_LOCK_CNT);</pre><pre style="color:rgb(0,0,0);word-wrap:break-word;white-space:pre-wrap"> Eterm *hp = (Eterm*)(tb->locks+1);
for (i=0; i<DB_HASH_LOCK_CNT; ++i) {</pre><pre style="color:rgb(0,0,0);word-wrap:break-word;white-space:pre-wrap"> erts_smp_rwmtx_init_opt_x(&tb->locks->lck_vec[i].lck, &rwmtx_opt,
"db_hash_slot", TUPLE2(hp, tb->common.the_name, make_small(i)));</pre><pre style="color:rgb(0,0,0);word-wrap:break-word;white-space:pre-wrap"> hp += 3;
}</pre></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><br></div><div class="gmail_extra"><br></div><div class="gmail_extra">Thing still unclear:</div><div class="gmail_extra"> - Why does ETS usage pattern affect processes which do not use ETS?</div></div></blockquote><div><br></div><div>I don't know in your specific case, but in general eliminating contention points like these is a constant game of whack a mole. When you eliminate one, all the processes are free to bang on another contention point so you end up with contention somewhere else. I've even seen cases where eliminating a contention point lead to a slower overall system as another contention point became even more contended which slowed down the system significantly.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"> - Are there more hidden ETS tuning options?</div></div></blockquote><div><br></div><div>Most likely, we constantly introduce different tuning options to see if they help or not in specific cases, not all of them get documented for various reasons.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"> - What else can we do to make this system faster? Free system resources are enough for doing 4—5 times more job.</div></div></blockquote><div><br></div><div>Continue doing what you are doing :) Maybe use linux perf to see if you can get any information from it?</div></div></div></div>