[erlang-questions] gen_statem - state_timeout sometimes reaching callback module as [info, {timeout, Ref, Name}

Raimo Niskanen raimo+erlang-questions@REDACTED
Mon Apr 29 16:53:39 CEST 2019


On Mon, Apr 29, 2019 at 11:49:18AM +0100, Peter Morgan wrote:
> Hi Raimo,
> 
> I’ve tried your patch and it hasn’t caused any errors. We are still having problems reproducing the issue, but it at least doesn’t look to have caused any side effects.

Well, that is at least reassuring :-)

So, it is so hard to reproduce that you can not be sure that a patch (any
patch) has fixed it...?

I will put it in our daily builds too, but we do not have many machines
running daily for OTP-21, and this is a backported change from OTP-22, so
it is hard to be sure that the change is exactly the same as in OTP-22
where it runs without problems.

Since it passes your regression test, if it does not cause any regression
errors for our few OTP-21 machines, I will probably place it in queue for next
OTP-21 maintenance patch...

If you get to any further conclusions on whether the patch fixes your
problem or breaks anything, please let me know ASAP.

Thank you for your help in investigating this!

/ Raimo



> 
> Regards,
> Peter.
> 
> > On 25 Apr 2019, at 08:28, Raimo Niskanen <raimo+erlang-questions@REDACTED> wrote:
> > 
> > On Wed, Apr 24, 2019 at 06:46:27PM +0100, Peter Morgan wrote:
> >> Hi Raimo,
> >> 
> >> I am having great difficultly reproducing in a simple test case. Despite best efforts of hundreds of thousands of processes creating timers and retry_state I’ve not managed to reproduce our production issue - with a vanilla OTP21 without your patch. Will let you know if I do.
> >> 
> > 
> > Can you try my patch on OTP-21 - pull/2211 to see if it works without
> > errors in your original setup?
> > 
> > / Raimo
> > 
> > 
> > 
> >> Thanks again for your help on this issue.
> >> 
> >> Regards,
> >> Peter.
> >> 
> >>> On 17 Apr 2019, at 15:36, Raimo Niskanen <raimo+erlang-questions@REDACTED> wrote:
> >>> 
> >>> On Wed, Apr 17, 2019 at 11:54:42AM +0200, Raimo Niskanen wrote:
> >>>> After some discussions with the VM guys this seems to be a bug in
> >>>> gen_statem coming from a misunderstanding of which guarantees that
> >>>> asynchronous timer cancels gives and does not give.
> >>>> 
> >>>> The {timeout, Ref, Msg} message may actually come _after_ the
> >>>> {cancel_timer, Ref, false} message.  Heavy load seems to increase the
> >>>> probability from almost zero to very small.
> >>>> 
> >>>> So the handling of these messages is broken for gen_statem on OTP-21, but
> >>>> this should be fixed on master (soon to be OTP-22) since gen_statem there
> >>>> does not use asynchronous timer cancels.
> >>>> 
> >>>> I will look into how big a diff it will be to rewrite gen_statem in OTP-21.latest
> >>>> to use synchronous timer cancel...
> >>> 
> >>> The diff became a fat one, so I created a pull request:
> >>> 
> >>>   https://github.com/erlang/otp/pull/2211 <https://github.com/erlang/otp/pull/2211> <https://github.com/erlang/otp/pull/2211 <https://github.com/erlang/otp/pull/2211>>
> >>> 
> >>> If you can try it out, that would be awsome!
> >>> 
> >>> Best Regards
> >>> / Raimo
> >>> 
> >>> 
> >>> 
> >>>> 
> >>>> Thank you for finding a tricky bug!
> >>>> / Raimo
> >>>> 
> >>>> 
> >>>> 
> >>>> On Fri, Apr 12, 2019 at 04:13:27PM +0100, Peter Morgan wrote:
> >>>>> Hi Raimo,
> >>>>> 
> >>>>> Thanks for looking at this, and the background on the implementation which is very useful. I am in the process of putting together a simple test case, but having trouble reproducing it. It appears to happen very infrequently - possibly something that is load related? The original output was from the Crash Report which I’ve attached - where you can see the info timeout was received by our gen_statem behaviour callback. Our OTP is 21.2 (without any 21.2.X patches):
> >>>>> 
> >>>>> {stdlib,"ERTS  CXC 138 10","3.7”}
> >>>>> {kernel,"ERTS  CXC 138 10","6.2”}
> >>>>> 
> >>>>> 
> >>>>> Your question - “why so many?” - it could be a failure of a downstream service that is ultimately causing this, and we’re not backing off in this case - something that I’m checking too. However, I still thought it was interesting that the info timeout reaches the behaviour module.
> >>>>> 
> >>>> 
> >>>>> =CRASH REPORT==== 8-Apr-2019::23:37:19.038905 === <0.969.0> proc_lib:crash_report/4:508
> >>>>> crasher:
> >>>>>   initial call: kafire_fetcher:init/1
> >>>>>   pid: <0.969.0>
> >>>>>   registered_name: []
> >>>>>   exception error: no function clause matching 
> >>>>>                    kafire_fetcher:handle_event(info,
> >>>>>                                                {timeout,
> >>>>>                                                 #Ref<0.2399112782.889192450.118024>,
> >>>>>                                                 fetch},
> >>>>>                                                subsequent_fetches,
> >>>>>                                                our_state) (src/kafire_fetcher.erl, line 445)
> >>>>>     in function  gen_statem:call_state_function/5 (gen_statem.erl, line 1662)
> >>>>>     in call from gen_statem:loop_event_state_function/6 (gen_statem.erl, line 1023)
> >>>>>   ancestors: [<0.931.0>,<0.332.0>,kafire_cluster_sup,kafire_sup,<0.227.0>]
> >>>>>   message_queue_len: 279
> >>>>>   messages: [{timeout,#Ref<0.2399112782.903610369.70160>,fetch},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.903610369.70160>,false},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.903610369.70168>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118038>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118039>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118043>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118044>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118045>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118049>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118050>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118051>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118052>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118053>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118054>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118055>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118056>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118060>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118061>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118062>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118063>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118064>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118068>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118069>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118070>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118071>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118074>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118079>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118081>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118083>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118087>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118088>,0},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118089>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118093>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118094>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118095>,3},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118096>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118097>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118098>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118099>,3},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118103>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118104>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118105>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118106>,3},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118107>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118108>,1},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118109>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118113>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118114>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118115>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118116>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118117>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118118>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118119>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118120>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118121>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118122>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118123>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118124>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118125>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118126>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118127>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118128>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118129>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118130>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118131>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118132>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118133>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118134>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118135>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118136>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118137>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118138>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118139>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118140>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118141>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118145>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118146>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118147>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118148>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118149>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118150>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118151>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118152>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118153>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118154>,5},
> >>>>>                 {timeout,#Ref<0.2399112782.889192450.118159>,fetch},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118159>,false},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118165>,0},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118169>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118170>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118173>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118174>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118175>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118176>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118177>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118178>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118179>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118180>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118181>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118182>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118183>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118184>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118185>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118189>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118190>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118191>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118192>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118196>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118197>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118198>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118199>,5},
> >>>>>                 {timeout,#Ref<0.2399112782.889192450.118203>,fetch},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118203>,false},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118206>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118207>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118208>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118212>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118213>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118214>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118215>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118221>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118225>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118226>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118227>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118231>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118235>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118236>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118237>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118238>,2},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118242>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118243>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118244>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118248>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118249>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118250>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118254>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118255>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118256>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118260>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118264>,0},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118265>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118266>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118270>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118271>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118272>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118273>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118274>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118278>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118279>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118280>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118281>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118282>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118283>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118287>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118288>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118289>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118290>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118291>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118295>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118296>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118297>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118298>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118299>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118303>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118304>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118305>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118306>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118307>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118308>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118309>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118313>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118314>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118315>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118316>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118317>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118318>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118319>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118323>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118324>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118325>,5},
> >>>>>                 {timeout,#Ref<0.2399112782.889192450.118326>,fetch},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118326>,false},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118327>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118328>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118329>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118330>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118331>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118332>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118333>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118337>,3},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118338>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118339>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118340>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118344>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118345>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118346>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118347>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118351>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118352>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118353>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118357>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118358>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118359>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118360>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118364>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118365>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118366>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118367>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118371>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118372>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118373>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118377>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118378>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118379>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118383>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118384>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118385>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118389>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118390>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118391>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118392>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118393>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118394>,0},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118395>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118399>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118400>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118401>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118402>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118403>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118404>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118405>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118409>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118410>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118411>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118412>,1},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118413>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118414>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118415>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118419>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118420>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118421>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118422>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118423>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118424>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118425>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118429>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118430>,0},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118431>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118432>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118433>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118434>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118435>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118439>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118440>,1},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118441>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118442>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118443>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118444>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118445>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118449>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118450>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118451>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118452>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118453>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118454>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118455>,5},
> >>>>>                 {timeout,#Ref<0.2399112782.889192450.118456>,fetch},
> >>>>>                 {'EXIT',<0.10080.8>,normal},
> >>>>>                 {'EXIT',<0.10079.8>,normal},
> >>>>>                 {'EXIT',<0.10081.8>,normal},
> >>>>>                 {'EXIT',<0.10082.8>,normal},
> >>>>>                 {'EXIT',<0.10083.8>,normal},
> >>>>>                 {'EXIT',<0.10084.8>,normal},
> >>>>>                 {'EXIT',<0.10085.8>,normal},
> >>>>>                 {'EXIT',<0.10086.8>,normal},
> >>>>>                 {'EXIT',<0.10087.8>,normal},
> >>>>>                 {'EXIT',<0.10088.8>,normal},
> >>>>>                 {'EXIT',<0.10089.8>,normal},
> >>>>>                 {'EXIT',<0.10071.8>,normal}]
> >>>>>   links: [<0.931.0>]
> >>>>>   dictionary: []
> >>>>>   trap_exit: true
> >>>>>   status: running
> >>>>>   heap_size: 17731
> >>>>>   stack_size: 27
> >>>>>   reductions: 28849413204
> >>>>> neighbours:
> >>>> 
> >>>>> 
> >>>>> 
> >>>>> I’ll try and get a simple test case for you.
> >>>>> Regards,
> >>>>> Peter.
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>>> On 12 Apr 2019, at 13:38, Raimo Niskanen <raimo+erlang-questions@REDACTED> wrote:
> >>>>>> 
> >>>>>> On Fri, Apr 12, 2019 at 10:38:42AM +0200, Raimo Niskanen wrote:
> >>>>>>> On Wed, Apr 10, 2019 at 01:41:06PM +0100, Peter Morgan wrote:
> >>>>>>>> Hello -
> >>>>>>>> 
> >>>>>>>> We are _sometimes_ seeing cases where a state_timeout in a gen_statem results in the timeout reaching the callback module as info {timeout, Ref, Name} with OTP 21.2.
> >>>>>>>> 
> >>>>>>>> The crash looks like:
> >>>>>>>> 
> >>>>>>>> =ERROR REPORT==== 8-Apr-2019::23:37:19.035346 === <0.969.0> gen_statem:error_info/5:1895
> >>>>>>>> ** State machine {kafire_fetcher,<<“abc">>,<<“def">>,0} terminating
> >>>>>>>> ** Last event = {info,{timeout,#Ref<0.2399112782.889192450.118024>,fetch}}
> >>>>>>>> 
> >>>>>>>> We crash because we are not expecting info messages - interestingly the following messages are in the queue for the crashed process:
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>>  message_queue_len: 279
> >>>>>>>>  messages: [{timeout,#Ref<0.2399112782.903610369.70160>,fetch},
> >>>>>>>>                {cancel_timer,#Ref<0.2399112782.903610369.70160>,false},
> >>>>>>>>                {cancel_timer,#Ref<0.2399112782.903610369.70168>,4},
> >>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118038>,5},
> >>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118039>,5},
> >>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118043>,5},
> >>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118044>,4},
> >>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118045>,5},
> >>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118049>,5},
> >>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118050>,5},
> >>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118051>,4},
> >>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118052>,5},
> >>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118053>,5},
> >>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118054>,5},
> >>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118055>,4},
> >>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118056>,5},
> >>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118060>,5},
> >>>>>>> 
> >>>>>>> Is that from a process_info() call?
> >>>>>>> 
> >>>>>>>> 
> >>>>>>>> Followed by lots more cancel_timer messages (200ish!). Our gen_statem does use {state_timeout, 500, fetch} so we are expecting a “fetch” timeout to happen, and we use repeat_state_and_data to requeue the state timeout (and also transition to other states). Is it possible that timeouts in gen_statem can be delivered as an info message?
> >>>>>>> 
> >>>>>>> It is not supposed to happen.  This must be a bug.
> >>>>>>> 
> >>>>>>> The gen_statem engine keeps track of the running state_timeout timer
> >>>>>>> and should never present it as an info event.
> >>>>>>> 
> >>>>>>> When a timer is stopped (or restarted) it is done with an asynchronous
> >>>>>>> cancel (in OTP 21) so the cancel_timer messages comes from that.
> >>>>>>> They should be matched against map entries in the internal gen_statem
> >>>>>>> engine state when they arrive so them being in the inbox may be ok.
> >>>>>>> 
> >>>>>>> However, how come you have so many?  Are you restarting the state_timeout
> >>>>>>> in a very tight loop of repeat_state:s?  Can you show the essential parts
> >>>>>>> of the code that causes state_timeout:s?
> >>>>>>> 
> >>>>>>> Also - OTP-21.2; is that your exact OTP version?  What is your stdlib version?
> >>>>>>> 
> >>>>>>> I will look at the relevant parts of the code with the new knowledge that
> >>>>>>> a state_timeout timer can be lost, probably in combination with
> >>>>>>> repeat_state and state_timeout restart!
> >>>>>> 
> >>>>>> 
> >>>>>> Sorry, the knowledge did not help; I did not find anything suspicous.
> >>>>>> 
> >>>>>> More information will be needed...
> >>>>>> 
> >>>>>>> 
> >>>>>>> On master since Jan 21 so in OTP-22.0-rc1 among other changes the timer handling
> >>>>>>> has been rewritten to use a synchronous cancel, which simplified the code
> >>>>>>> significantly.  I do not know if that would be worth trying?
> >>>>>>> 
> >>>>>>> / Raimo Niskanen
> >>>>>>> 
> >>>>>>> 
> >>>>>>>> 
> >>>>>>>> Thanks
> >>>>>>>> Peter.
> >>>>>>>> 
> >>>>>> 
> >>>>>> -- 
> >>>>>> 
> >>>>>> / Raimo Niskanen, Erlang/OTP, Ericsson AB
> >>>>>> _______________________________________________
> >>>>>> erlang-questions mailing list
> >>>>>> erlang-questions@REDACTED
> >>>>>> http://erlang.org/mailman/listinfo/erlang-questions
> >>>>> 
> >>>> 
> >>>> 
> >>>> -- 
> >>>> 
> >>>> / Raimo Niskanen, Erlang/OTP, Ericsson AB
> >>>> _______________________________________________
> >>>> erlang-questions mailing list
> >>>> erlang-questions@REDACTED
> >>>> http://erlang.org/mailman/listinfo/erlang-questions
> >>> 
> >>> -- 
> >>> 
> >>> / Raimo Niskanen, Erlang/OTP, Ericsson AB
> >>> _______________________________________________
> >>> erlang-questions mailing list
> >>> erlang-questions@REDACTED <mailto:erlang-questions@REDACTED> <mailto:erlang-questions@REDACTED <mailto:erlang-questions@REDACTED>>
> >>> http://erlang.org/mailman/listinfo/erlang-questions <http://erlang.org/mailman/listinfo/erlang-questions> <http://erlang.org/mailman/listinfo/erlang-questions <http://erlang.org/mailman/listinfo/erlang-questions>>
> > 
> > -- 
> > 
> > / Raimo Niskanen, Erlang/OTP, Ericsson AB
> > _______________________________________________
> > erlang-questions mailing list
> > erlang-questions@REDACTED <mailto:erlang-questions@REDACTED>
> > http://erlang.org/mailman/listinfo/erlang-questions <http://erlang.org/mailman/listinfo/erlang-questions>

-- 

/ Raimo Niskanen, Erlang/OTP, Ericsson AB



More information about the erlang-questions mailing list