[erlang-questions] R11B-2 SMP Timer Race Condition Bug [Re: bug in timer:sleep/1 smp implementation (R11B-0)]
Dmitriy Kargapolov
dmitry.kargapolov@REDACTED
Fri Dec 22 20:52:27 CET 2006
Unfortunately I can not create standalone test for this bug, even when I
became much more close to understanding the effect.
This bug appears only in highly loaded system.
Recently I did manage to trace some points in the code and see at least
one scenario for the race condition bug.
1. Thread A erl_set_timer (time.c) Lock Timing Wheel
2. Thread A insert_timer (time.c) Insert Timer T1
3. Thread A erl_set_timer (time.c) Unlock Timing Wheel
4. Thread B bump_timer_internal (time.c) Lock Timing Wheel
5. Thread A cancel_timer (erl_process.c) Cancel timer T1
6. Thread B bump_timer_internal (time.c) Build list of Expired
Timers
7. Thread A erl_cancel_timer (time.c) Cancel timer T1:
Waiting for Timing Wheel Lock
8. Thread B bump_timer_internal (time.c) Unlock Timing Wheel
9. Thread C set_timer (erl_process.c) New Timeout Request (T2)
10. Thread B bump_timer_internal (time.c) Call Expired Timers
Callbacks
11. Thread B free_ptimer (utils.c) Timer T1 callback
invokes free_ptimer()
12. Thread C erts_create_smp_ptimer (utils.c) Create Timer
ErtsSmpPTimer for T2
13. Thread B free_ptimer (utils.c) Free ErtsSmpPTimer
memory block
14. Thread C erts_create_smp_ptimer (utils.c) Allocate ErtsSmpPTimer
for T2, block reused!
15. Thread C erl_set_timer (time.c) erl_set_timer invoked
for T2
16. Thread C erl_set_timer (time.c) Lock Timing Wheel
17. Thread C insert_timer (time.c) Insert Timer T2
18. Thread C erl_set_timer (time.c) Unlock Timing Wheel
19. Thread A erl_cancel_timer (time.c) Lock Timing Wheel
20. Thread A erl_cancel_timer (time.c) Remove ex-T1 == T2
from the timing wheel
21. Thread A erl_cancel_timer (time.c) Unlock Timing Wheel
See also attached diagram.
Looks like one more mutex required, excluding release of ErtsSmpPTimer
memory block by timeout callback if cancel request was issued for the
timer and vise versa. The two point of control - cancel timer and timer
expiration should not interfere.
This bug happens only in SMP mode since there additional timer control
structure ErtsSmpPTimer is used between emulator and timing wheel.
Mikael Pettersson wrote:
> Dmitriy Kargapolov writes:
> >
> > When running erl with -smp +S 2 option, sometimes process gets stuck in
> > timer:sleep/1.
> > Process code looks like:
> >
> > some_receiver(State) ->
> > NewState = receive
> > % legal packet
> > {some_keyword, Address, Port, Packet} ->
> > State1 = handle_packet(Address, Port, Packet, State),
> > timer:sleep(get_loop_delay()),
> > State1;
> > % unknown message
> > _ ->
> > State
> > end,
> > some_receiver(NewState).
> >
> > Delay value varies in range 1..999
> >
> > Since timer:sleep/1 implemented as:
> > sleep(T) ->
> > receive
> > after T -> ok
> > end.
> > it seems to be problem with "after" in smp implementation in R11B-0
> >
> > I don't have more details yet but will continue testing.
> > My platform: 2.6.9-5.ELsmp #1 SMP i686 i686 i386 GNU/Linux
>
> Interesting. Please send us a small standalone module that exhibits
> the bug, and I'll see if I can reproduce it.
>
> /Mikael
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: RaceCond.pdf
Type: application/pdf
Size: 16115 bytes
Desc: not available
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20061222/b1790c40/attachment.pdf>
More information about the erlang-questions
mailing list