[erlang-questions] gen_statem - state_timeout sometimes reaching callback module as [info, {timeout, Ref, Name}

Raimo Niskanen raimo+erlang-questions@REDACTED
Thu Apr 25 09:28:17 CEST 2019


On Wed, Apr 24, 2019 at 06:46:27PM +0100, Peter Morgan wrote:
> Hi Raimo,
> 
> I am having great difficultly reproducing in a simple test case. Despite best efforts of hundreds of thousands of processes creating timers and retry_state I’ve not managed to reproduce our production issue - with a vanilla OTP21 without your patch. Will let you know if I do.
> 

Can you try my patch on OTP-21 - pull/2211 to see if it works without
errors in your original setup?

/ Raimo



> Thanks again for your help on this issue.
> 
> Regards,
> Peter.
> 
> > On 17 Apr 2019, at 15:36, Raimo Niskanen <raimo+erlang-questions@REDACTED> wrote:
> > 
> > On Wed, Apr 17, 2019 at 11:54:42AM +0200, Raimo Niskanen wrote:
> >> After some discussions with the VM guys this seems to be a bug in
> >> gen_statem coming from a misunderstanding of which guarantees that
> >> asynchronous timer cancels gives and does not give.
> >> 
> >> The {timeout, Ref, Msg} message may actually come _after_ the
> >> {cancel_timer, Ref, false} message.  Heavy load seems to increase the
> >> probability from almost zero to very small.
> >> 
> >> So the handling of these messages is broken for gen_statem on OTP-21, but
> >> this should be fixed on master (soon to be OTP-22) since gen_statem there
> >> does not use asynchronous timer cancels.
> >> 
> >> I will look into how big a diff it will be to rewrite gen_statem in OTP-21.latest
> >> to use synchronous timer cancel...
> > 
> > The diff became a fat one, so I created a pull request:
> > 
> >    https://github.com/erlang/otp/pull/2211 <https://github.com/erlang/otp/pull/2211>
> > 
> > If you can try it out, that would be awsome!
> > 
> > Best Regards
> > / Raimo
> > 
> > 
> > 
> >> 
> >> Thank you for finding a tricky bug!
> >> / Raimo
> >> 
> >> 
> >> 
> >> On Fri, Apr 12, 2019 at 04:13:27PM +0100, Peter Morgan wrote:
> >>> Hi Raimo,
> >>> 
> >>> Thanks for looking at this, and the background on the implementation which is very useful. I am in the process of putting together a simple test case, but having trouble reproducing it. It appears to happen very infrequently - possibly something that is load related? The original output was from the Crash Report which I’ve attached - where you can see the info timeout was received by our gen_statem behaviour callback. Our OTP is 21.2 (without any 21.2.X patches):
> >>> 
> >>> {stdlib,"ERTS  CXC 138 10","3.7”}
> >>> {kernel,"ERTS  CXC 138 10","6.2”}
> >>> 
> >>> 
> >>> Your question - “why so many?” - it could be a failure of a downstream service that is ultimately causing this, and we’re not backing off in this case - something that I’m checking too. However, I still thought it was interesting that the info timeout reaches the behaviour module.
> >>> 
> >> 
> >>> =CRASH REPORT==== 8-Apr-2019::23:37:19.038905 === <0.969.0> proc_lib:crash_report/4:508
> >>>  crasher:
> >>>    initial call: kafire_fetcher:init/1
> >>>    pid: <0.969.0>
> >>>    registered_name: []
> >>>    exception error: no function clause matching 
> >>>                     kafire_fetcher:handle_event(info,
> >>>                                                 {timeout,
> >>>                                                  #Ref<0.2399112782.889192450.118024>,
> >>>                                                  fetch},
> >>>                                                 subsequent_fetches,
> >>>                                                 our_state) (src/kafire_fetcher.erl, line 445)
> >>>      in function  gen_statem:call_state_function/5 (gen_statem.erl, line 1662)
> >>>      in call from gen_statem:loop_event_state_function/6 (gen_statem.erl, line 1023)
> >>>    ancestors: [<0.931.0>,<0.332.0>,kafire_cluster_sup,kafire_sup,<0.227.0>]
> >>>    message_queue_len: 279
> >>>    messages: [{timeout,#Ref<0.2399112782.903610369.70160>,fetch},
> >>>                  {cancel_timer,#Ref<0.2399112782.903610369.70160>,false},
> >>>                  {cancel_timer,#Ref<0.2399112782.903610369.70168>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118038>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118039>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118043>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118044>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118045>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118049>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118050>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118051>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118052>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118053>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118054>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118055>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118056>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118060>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118061>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118062>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118063>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118064>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118068>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118069>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118070>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118071>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118074>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118079>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118081>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118083>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118087>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118088>,0},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118089>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118093>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118094>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118095>,3},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118096>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118097>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118098>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118099>,3},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118103>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118104>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118105>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118106>,3},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118107>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118108>,1},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118109>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118113>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118114>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118115>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118116>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118117>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118118>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118119>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118120>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118121>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118122>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118123>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118124>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118125>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118126>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118127>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118128>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118129>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118130>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118131>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118132>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118133>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118134>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118135>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118136>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118137>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118138>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118139>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118140>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118141>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118145>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118146>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118147>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118148>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118149>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118150>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118151>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118152>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118153>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118154>,5},
> >>>                  {timeout,#Ref<0.2399112782.889192450.118159>,fetch},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118159>,false},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118165>,0},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118169>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118170>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118173>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118174>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118175>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118176>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118177>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118178>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118179>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118180>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118181>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118182>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118183>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118184>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118185>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118189>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118190>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118191>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118192>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118196>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118197>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118198>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118199>,5},
> >>>                  {timeout,#Ref<0.2399112782.889192450.118203>,fetch},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118203>,false},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118206>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118207>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118208>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118212>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118213>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118214>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118215>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118221>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118225>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118226>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118227>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118231>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118235>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118236>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118237>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118238>,2},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118242>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118243>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118244>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118248>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118249>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118250>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118254>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118255>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118256>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118260>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118264>,0},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118265>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118266>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118270>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118271>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118272>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118273>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118274>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118278>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118279>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118280>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118281>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118282>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118283>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118287>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118288>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118289>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118290>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118291>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118295>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118296>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118297>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118298>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118299>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118303>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118304>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118305>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118306>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118307>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118308>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118309>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118313>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118314>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118315>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118316>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118317>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118318>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118319>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118323>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118324>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118325>,5},
> >>>                  {timeout,#Ref<0.2399112782.889192450.118326>,fetch},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118326>,false},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118327>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118328>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118329>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118330>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118331>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118332>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118333>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118337>,3},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118338>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118339>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118340>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118344>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118345>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118346>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118347>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118351>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118352>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118353>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118357>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118358>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118359>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118360>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118364>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118365>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118366>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118367>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118371>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118372>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118373>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118377>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118378>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118379>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118383>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118384>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118385>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118389>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118390>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118391>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118392>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118393>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118394>,0},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118395>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118399>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118400>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118401>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118402>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118403>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118404>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118405>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118409>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118410>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118411>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118412>,1},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118413>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118414>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118415>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118419>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118420>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118421>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118422>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118423>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118424>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118425>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118429>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118430>,0},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118431>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118432>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118433>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118434>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118435>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118439>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118440>,1},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118441>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118442>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118443>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118444>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118445>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118449>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118450>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118451>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118452>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118453>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118454>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118455>,5},
> >>>                  {timeout,#Ref<0.2399112782.889192450.118456>,fetch},
> >>>                  {'EXIT',<0.10080.8>,normal},
> >>>                  {'EXIT',<0.10079.8>,normal},
> >>>                  {'EXIT',<0.10081.8>,normal},
> >>>                  {'EXIT',<0.10082.8>,normal},
> >>>                  {'EXIT',<0.10083.8>,normal},
> >>>                  {'EXIT',<0.10084.8>,normal},
> >>>                  {'EXIT',<0.10085.8>,normal},
> >>>                  {'EXIT',<0.10086.8>,normal},
> >>>                  {'EXIT',<0.10087.8>,normal},
> >>>                  {'EXIT',<0.10088.8>,normal},
> >>>                  {'EXIT',<0.10089.8>,normal},
> >>>                  {'EXIT',<0.10071.8>,normal}]
> >>>    links: [<0.931.0>]
> >>>    dictionary: []
> >>>    trap_exit: true
> >>>    status: running
> >>>    heap_size: 17731
> >>>    stack_size: 27
> >>>    reductions: 28849413204
> >>>  neighbours:
> >> 
> >>> 
> >>> 
> >>> I’ll try and get a simple test case for you.
> >>> Regards,
> >>> Peter.
> >>> 
> >>> 
> >>> 
> >>>> On 12 Apr 2019, at 13:38, Raimo Niskanen <raimo+erlang-questions@REDACTED> wrote:
> >>>> 
> >>>> On Fri, Apr 12, 2019 at 10:38:42AM +0200, Raimo Niskanen wrote:
> >>>>> On Wed, Apr 10, 2019 at 01:41:06PM +0100, Peter Morgan wrote:
> >>>>>> Hello -
> >>>>>> 
> >>>>>> We are _sometimes_ seeing cases where a state_timeout in a gen_statem results in the timeout reaching the callback module as info {timeout, Ref, Name} with OTP 21.2.
> >>>>>> 
> >>>>>> The crash looks like:
> >>>>>> 
> >>>>>> =ERROR REPORT==== 8-Apr-2019::23:37:19.035346 === <0.969.0> gen_statem:error_info/5:1895
> >>>>>> ** State machine {kafire_fetcher,<<“abc">>,<<“def">>,0} terminating
> >>>>>> ** Last event = {info,{timeout,#Ref<0.2399112782.889192450.118024>,fetch}}
> >>>>>> 
> >>>>>> We crash because we are not expecting info messages - interestingly the following messages are in the queue for the crashed process:
> >>>>>> 
> >>>>>> 
> >>>>>>   message_queue_len: 279
> >>>>>>   messages: [{timeout,#Ref<0.2399112782.903610369.70160>,fetch},
> >>>>>>                 {cancel_timer,#Ref<0.2399112782.903610369.70160>,false},
> >>>>>>                 {cancel_timer,#Ref<0.2399112782.903610369.70168>,4},
> >>>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118038>,5},
> >>>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118039>,5},
> >>>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118043>,5},
> >>>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118044>,4},
> >>>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118045>,5},
> >>>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118049>,5},
> >>>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118050>,5},
> >>>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118051>,4},
> >>>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118052>,5},
> >>>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118053>,5},
> >>>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118054>,5},
> >>>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118055>,4},
> >>>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118056>,5},
> >>>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118060>,5},
> >>>>> 
> >>>>> Is that from a process_info() call?
> >>>>> 
> >>>>>> 
> >>>>>> Followed by lots more cancel_timer messages (200ish!). Our gen_statem does use {state_timeout, 500, fetch} so we are expecting a “fetch” timeout to happen, and we use repeat_state_and_data to requeue the state timeout (and also transition to other states). Is it possible that timeouts in gen_statem can be delivered as an info message?
> >>>>> 
> >>>>> It is not supposed to happen.  This must be a bug.
> >>>>> 
> >>>>> The gen_statem engine keeps track of the running state_timeout timer
> >>>>> and should never present it as an info event.
> >>>>> 
> >>>>> When a timer is stopped (or restarted) it is done with an asynchronous
> >>>>> cancel (in OTP 21) so the cancel_timer messages comes from that.
> >>>>> They should be matched against map entries in the internal gen_statem
> >>>>> engine state when they arrive so them being in the inbox may be ok.
> >>>>> 
> >>>>> However, how come you have so many?  Are you restarting the state_timeout
> >>>>> in a very tight loop of repeat_state:s?  Can you show the essential parts
> >>>>> of the code that causes state_timeout:s?
> >>>>> 
> >>>>> Also - OTP-21.2; is that your exact OTP version?  What is your stdlib version?
> >>>>> 
> >>>>> I will look at the relevant parts of the code with the new knowledge that
> >>>>> a state_timeout timer can be lost, probably in combination with
> >>>>> repeat_state and state_timeout restart!
> >>>> 
> >>>> 
> >>>> Sorry, the knowledge did not help; I did not find anything suspicous.
> >>>> 
> >>>> More information will be needed...
> >>>> 
> >>>>> 
> >>>>> On master since Jan 21 so in OTP-22.0-rc1 among other changes the timer handling
> >>>>> has been rewritten to use a synchronous cancel, which simplified the code
> >>>>> significantly.  I do not know if that would be worth trying?
> >>>>> 
> >>>>> / Raimo Niskanen
> >>>>> 
> >>>>> 
> >>>>>> 
> >>>>>> Thanks
> >>>>>> Peter.
> >>>>>> 
> >>>> 
> >>>> -- 
> >>>> 
> >>>> / Raimo Niskanen, Erlang/OTP, Ericsson AB
> >>>> _______________________________________________
> >>>> erlang-questions mailing list
> >>>> erlang-questions@REDACTED
> >>>> http://erlang.org/mailman/listinfo/erlang-questions
> >>> 
> >> 
> >> 
> >> -- 
> >> 
> >> / Raimo Niskanen, Erlang/OTP, Ericsson AB
> >> _______________________________________________
> >> erlang-questions mailing list
> >> erlang-questions@REDACTED
> >> http://erlang.org/mailman/listinfo/erlang-questions
> > 
> > -- 
> > 
> > / Raimo Niskanen, Erlang/OTP, Ericsson AB
> > _______________________________________________
> > erlang-questions mailing list
> > erlang-questions@REDACTED <mailto:erlang-questions@REDACTED>
> > http://erlang.org/mailman/listinfo/erlang-questions <http://erlang.org/mailman/listinfo/erlang-questions>

-- 

/ Raimo Niskanen, Erlang/OTP, Ericsson AB



More information about the erlang-questions mailing list