[erlang-questions] LCNT: understanding proc_* and db_hash_slot collisions

Thu Aug 18 10:35:02 CEST 2016

On Wed, Aug 17, 2016 at 5:03 PM, Danil Zagoskin <z@REDACTED> wrote:

> After reducing ETS load I see a new lock coming up — now it's one of 1024
> pix_lock's:
>
> (flussonic@REDACTED)3> lcnt:apply(timer, sleep, [5000]),
> lcnt:swap_pid_keys(), lcnt:conflicts([{max_locks, 5}]).
>                          lock   id  #tries  #collisions  collisions [%]
>  time [us]  duration [%]
>                         -----  --- ------- ------------ ---------------
> ---------- -------------
>  <flussonic@REDACTED>    6    6403          686         10.7137
>  5550663      109.9727
>                      pix_lock 1024    1330            1          0.0752
>  1481334       29.3490
>                     run_queue   42 3394894        82843          2.4402
>   171155        3.3910
>                       pollset    1  162867        10714          6.5784
>    88266        1.7488
>                        db_tab  181  135250          759          0.5612
>    62164        1.2316
> ok
>
> (flussonic@REDACTED)5> lcnt:inspect(pix_lock, [{max_locks, 5}]).
>      lock  id  #tries  #collisions  collisions [%]  time [us]  duration
> [%] histogram [log2(us)]
>     ----- --- ------- ------------ --------------- ----------
> ------------- ---------------------
>  pix_lock  53    1284            1          0.0779    1480359
> 29.3297 |              XX.  ........   |
>  pix_lock 895       4            0          0.0000        121
>  0.0024 |               X              |
>  pix_lock 862       4            0          0.0000         92
>  0.0018 |              X X             |
>  pix_lock 270       2            0          0.0000         83
>  0.0016 |                X             |
>  pix_lock 949       2            0          0.0000         70
>  0.0014 |                X             |
> ok
>
> (flussonic@REDACTED)8> lcnt:inspect(pix_lock, [{locations, true},
> {combine, true}]).
> lock: pix_lock
> id:   1024
> type: mutex
>                       location  #tries  #collisions  collisions [%]  time
> [us]  duration [%]
>                      --------- ------- ------------ ---------------
> ---------- -------------
>  'beam/erl_process_lock.h':422    1330            1          0.0752
>  1481334       29.3490
>                    undefined:0       0            0          0.0000
>    0        0.0000
> ok
>
>
>
> The previously described proc_link/proc_status lock is still here.
>
> The thing i find strange about pix_lock is that only one lock of 1024 is
> used and has a significant duration. Other ones have almost zero tries and
> duration.
> The lock id is not constant — it changes over time.
>
>
> May long pix_lock be a same problem as proc_link/proc_status?
> Why can this happen?
> What should we see next?
>
>
The pix lock is taken when a contention on a proc_* lock has happened, so
to get a conflict there is very rare, but it does happen. I'm not sure why
the duration of the conflict is so long, is the result repeatable?

AS for the distribution of the pix_locks tries, the sample you have is
quite small so it could just be a statistical anomaly, or something is
incorrect in the hashing algo. IIRC it is the PID of the process that gets
hashed somehow and used to figure out which pix_lock to take. Of course if
it is only one process where all the contention happens, that process will
hash to the same slot everytime, so maybe that is what is happening.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20160818/d69ed31c/attachment.htm>