[erlang-questions] LCNT: understanding proc_* and db_hash_slot collisions
Lukas Larsson
garazdawi@REDACTED
Sun Jul 31 11:45:42 CEST 2016
On Tue, Jul 26, 2016 at 12:11 PM, Danil Zagoskin <z@REDACTED> wrote:
> Hello!
>
Hello!
> Inspecting the process gives me
> lock id #tries #collisions collisions [%] time [us]
> duration [%] histogram [log2(us)]
> ----- --- ------- ------------ --------------- ----------
> ------------- ---------------------
> flu_pulsedb proc_main 6989 4242 60.6954 1161759
> 11.5906 | ...........XX... |
> flu_pulsedb proc_msgq 7934 3754 47.3154 669383
> 6.6783 | .xXXxxXxx..xXX... |
> flu_pulsedb proc_status 5489 521 9.4917 287153
> 2.8649 | ..xxxxxxxxxxxXX.. |
> flu_pulsedb proc_link 864 44 5.0926 566
> 0.0056 | ...XxX..... |
> flu_pulsedb proc_btm 0 0 0.0000 0
> 0.0000 | |
> flu_pulsedb proc_trace 201 0 0.0000 0
> 0.0000 | |
>
> Context: this process is a data receiver. Each sender first checks its
> message queue length and then sends a message
> if queue is not very long. This happens about 220 times a second. Then
> this process accumulates some data and
> writes it to disk periodically.
> What do proc_main, proc_msgq and proc_status locks mean?
>
proc_main is the main execution lock, proc_msgq is the lock protecting the
external message queue, and proc_status is the lock protecting the process
status ans various other things.
> Why at all are collisions possible here?
>
When doing various operations, different locks are needed in order to
guarantee the order of events. For instance when sending a message the
proc_msgq lock is needed. However when checking the size of the message
queue both the proc_main, proc_msgq are needed. So if many processes
continually check the message queue size of a single other process you will
get a lot of conflict on both the main and msgq lock.
> What should I see next to optimise this bottleneck?
>
Don't check the message queue length of another process unless you really
have to, and if you do have to, do it as seldom as you can. Checking the
length of a message queue is a deceptively expensive operation.
>
> Next, inspecting db_hash_slot gives me 20 rows all alike (only top few
> shown):
> lock id #tries #collisions collisions [%] time [us]
> duration [%] histogram [log2(us)]
> ----- --- ------- ------------ --------------- ----------
> ------------- ---------------------
> db_hash_slot 0 492 299 60.7724 107552
> 1.0730 | ...XX. . |
> db_hash_slot 1 492 287 58.3333 101951
> 1.0171 | . ..XX. . |
> db_hash_slot 48 480 248 51.6667 99486
> 0.9925 | ...xXx. |
> db_hash_slot 47 480 248 51.6667 96443
> 0.9622 | ...XXx |
> db_hash_slot 2 574 304 52.9617 92952
> 0.9274 | . ....XX. . |
>
> How do I see what ETS tables are causing this high collision rate?
> Is there any way to map lock id (here: 0, 1, 48, 47, 2) to a table id?
>
iirc the id used in the lock checker should be the same as the table id.
> Or maybe there is a better tool for ETS profiling?
>
for detection of ETS conflicts there is no better tool. For general ETS
performance, I would use the same tools as for all erlang code, i.e. cprof,
eprof, fprof and friends.
Lukas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20160731/441adeea/attachment.htm>
More information about the erlang-questions
mailing list