[erlang-questions] LCNT: understanding proc_* and db_hash_slot collisions
Danil Zagoskin
z@REDACTED
Tue Jul 26 12:11:28 CEST 2016
Hello!
I am inspecting performance issues in an Erlang/OTP system.
I've run LCNT and I see results I cannot interpret.
After lcnt:swap_pid_keys(), I see these conflicts (only top of the table):
lock id #tries #collisions collisions [%] time [us]
duration [%]
----- --- ------- ------------ --------------- ----------
-------------
db_hash_slot 896 192708 28684 14.8847 8164589
81.4562
flu_pulsedb 6 21477 8561 39.8612 2118861
21.1394
run_queue 42 6427021 169979 2.6448 339177
3.3839
pix_lock 1024 4384 229 5.2235 209468
2.0898
pollset 1 303839 16477 5.4229 134808
1.3449
db_tab 146 261929 926 0.3535 107955
1.0770
Here I see two problems: lots of ETS locking (db_hash_slot) and something
with a particular process (flu_pulsedb).
Inspecting the process gives me
lock id #tries #collisions collisions [%] time [us]
duration [%] histogram [log2(us)]
----- --- ------- ------------ --------------- ----------
------------- ---------------------
flu_pulsedb proc_main 6989 4242 60.6954 1161759
11.5906 | ...........XX... |
flu_pulsedb proc_msgq 7934 3754 47.3154 669383
6.6783 | .xXXxxXxx..xXX... |
flu_pulsedb proc_status 5489 521 9.4917 287153
2.8649 | ..xxxxxxxxxxxXX.. |
flu_pulsedb proc_link 864 44 5.0926 566
0.0056 | ...XxX..... |
flu_pulsedb proc_btm 0 0 0.0000 0
0.0000 | |
flu_pulsedb proc_trace 201 0 0.0000 0
0.0000 | |
Context: this process is a data receiver. Each sender first checks its
message queue length and then sends a message
if queue is not very long. This happens about 220 times a second. Then
this process accumulates some data and
writes it to disk periodically.
What do proc_main, proc_msgq and proc_status locks mean? Why at all are
collisions possible here?
What should I see next to optimise this bottleneck?
Next, inspecting db_hash_slot gives me 20 rows all alike (only top few
shown):
lock id #tries #collisions collisions [%] time [us] duration
[%] histogram [log2(us)]
----- --- ------- ------------ --------------- ----------
------------- ---------------------
db_hash_slot 0 492 299 60.7724 107552
1.0730 | ...XX. . |
db_hash_slot 1 492 287 58.3333 101951
1.0171 | . ..XX. . |
db_hash_slot 48 480 248 51.6667 99486
0.9925 | ...xXx. |
db_hash_slot 47 480 248 51.6667 96443
0.9622 | ...XXx |
db_hash_slot 2 574 304 52.9617 92952
0.9274 | . ....XX. . |
How do I see what ETS tables are causing this high collision rate?
Is there any way to map lock id (here: 0, 1, 48, 47, 2) to a table id?
Or maybe there is a better tool for ETS profiling?
--
Danil Zagoskin | z@REDACTED
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20160726/d4655e0b/attachment.htm>
More information about the erlang-questions
mailing list