[erlang-questions] gen_statem - state_timeout sometimes reaching callback module as [info, {timeout, Ref, Name}
Raimo Niskanen
raimo+erlang-questions@REDACTED
Wed Apr 17 16:36:03 CEST 2019
On Wed, Apr 17, 2019 at 11:54:42AM +0200, Raimo Niskanen wrote:
> After some discussions with the VM guys this seems to be a bug in
> gen_statem coming from a misunderstanding of which guarantees that
> asynchronous timer cancels gives and does not give.
>
> The {timeout, Ref, Msg} message may actually come _after_ the
> {cancel_timer, Ref, false} message. Heavy load seems to increase the
> probability from almost zero to very small.
>
> So the handling of these messages is broken for gen_statem on OTP-21, but
> this should be fixed on master (soon to be OTP-22) since gen_statem there
> does not use asynchronous timer cancels.
>
> I will look into how big a diff it will be to rewrite gen_statem in OTP-21.latest
> to use synchronous timer cancel...
The diff became a fat one, so I created a pull request:
https://github.com/erlang/otp/pull/2211
If you can try it out, that would be awsome!
Best Regards
/ Raimo
>
> Thank you for finding a tricky bug!
> / Raimo
>
>
>
> On Fri, Apr 12, 2019 at 04:13:27PM +0100, Peter Morgan wrote:
> > Hi Raimo,
> >
> > Thanks for looking at this, and the background on the implementation which is very useful. I am in the process of putting together a simple test case, but having trouble reproducing it. It appears to happen very infrequently - possibly something that is load related? The original output was from the Crash Report which I’ve attached - where you can see the info timeout was received by our gen_statem behaviour callback. Our OTP is 21.2 (without any 21.2.X patches):
> >
> > {stdlib,"ERTS CXC 138 10","3.7”}
> > {kernel,"ERTS CXC 138 10","6.2”}
> >
> >
> > Your question - “why so many?” - it could be a failure of a downstream service that is ultimately causing this, and we’re not backing off in this case - something that I’m checking too. However, I still thought it was interesting that the info timeout reaches the behaviour module.
> >
>
> > =CRASH REPORT==== 8-Apr-2019::23:37:19.038905 === <0.969.0> proc_lib:crash_report/4:508
> > crasher:
> > initial call: kafire_fetcher:init/1
> > pid: <0.969.0>
> > registered_name: []
> > exception error: no function clause matching
> > kafire_fetcher:handle_event(info,
> > {timeout,
> > #Ref<0.2399112782.889192450.118024>,
> > fetch},
> > subsequent_fetches,
> > our_state) (src/kafire_fetcher.erl, line 445)
> > in function gen_statem:call_state_function/5 (gen_statem.erl, line 1662)
> > in call from gen_statem:loop_event_state_function/6 (gen_statem.erl, line 1023)
> > ancestors: [<0.931.0>,<0.332.0>,kafire_cluster_sup,kafire_sup,<0.227.0>]
> > message_queue_len: 279
> > messages: [{timeout,#Ref<0.2399112782.903610369.70160>,fetch},
> > {cancel_timer,#Ref<0.2399112782.903610369.70160>,false},
> > {cancel_timer,#Ref<0.2399112782.903610369.70168>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118038>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118039>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118043>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118044>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118045>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118049>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118050>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118051>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118052>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118053>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118054>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118055>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118056>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118060>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118061>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118062>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118063>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118064>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118068>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118069>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118070>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118071>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118074>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118079>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118081>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118083>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118087>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118088>,0},
> > {cancel_timer,#Ref<0.2399112782.889192450.118089>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118093>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118094>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118095>,3},
> > {cancel_timer,#Ref<0.2399112782.889192450.118096>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118097>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118098>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118099>,3},
> > {cancel_timer,#Ref<0.2399112782.889192450.118103>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118104>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118105>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118106>,3},
> > {cancel_timer,#Ref<0.2399112782.889192450.118107>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118108>,1},
> > {cancel_timer,#Ref<0.2399112782.889192450.118109>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118113>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118114>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118115>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118116>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118117>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118118>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118119>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118120>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118121>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118122>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118123>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118124>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118125>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118126>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118127>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118128>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118129>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118130>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118131>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118132>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118133>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118134>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118135>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118136>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118137>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118138>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118139>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118140>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118141>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118145>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118146>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118147>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118148>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118149>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118150>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118151>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118152>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118153>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118154>,5},
> > {timeout,#Ref<0.2399112782.889192450.118159>,fetch},
> > {cancel_timer,#Ref<0.2399112782.889192450.118159>,false},
> > {cancel_timer,#Ref<0.2399112782.889192450.118165>,0},
> > {cancel_timer,#Ref<0.2399112782.889192450.118169>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118170>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118173>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118174>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118175>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118176>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118177>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118178>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118179>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118180>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118181>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118182>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118183>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118184>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118185>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118189>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118190>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118191>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118192>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118196>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118197>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118198>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118199>,5},
> > {timeout,#Ref<0.2399112782.889192450.118203>,fetch},
> > {cancel_timer,#Ref<0.2399112782.889192450.118203>,false},
> > {cancel_timer,#Ref<0.2399112782.889192450.118206>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118207>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118208>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118212>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118213>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118214>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118215>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118221>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118225>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118226>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118227>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118231>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118235>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118236>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118237>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118238>,2},
> > {cancel_timer,#Ref<0.2399112782.889192450.118242>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118243>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118244>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118248>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118249>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118250>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118254>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118255>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118256>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118260>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118264>,0},
> > {cancel_timer,#Ref<0.2399112782.889192450.118265>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118266>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118270>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118271>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118272>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118273>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118274>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118278>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118279>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118280>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118281>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118282>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118283>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118287>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118288>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118289>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118290>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118291>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118295>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118296>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118297>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118298>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118299>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118303>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118304>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118305>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118306>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118307>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118308>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118309>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118313>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118314>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118315>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118316>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118317>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118318>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118319>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118323>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118324>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118325>,5},
> > {timeout,#Ref<0.2399112782.889192450.118326>,fetch},
> > {cancel_timer,#Ref<0.2399112782.889192450.118326>,false},
> > {cancel_timer,#Ref<0.2399112782.889192450.118327>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118328>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118329>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118330>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118331>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118332>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118333>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118337>,3},
> > {cancel_timer,#Ref<0.2399112782.889192450.118338>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118339>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118340>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118344>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118345>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118346>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118347>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118351>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118352>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118353>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118357>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118358>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118359>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118360>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118364>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118365>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118366>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118367>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118371>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118372>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118373>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118377>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118378>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118379>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118383>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118384>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118385>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118389>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118390>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118391>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118392>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118393>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118394>,0},
> > {cancel_timer,#Ref<0.2399112782.889192450.118395>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118399>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118400>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118401>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118402>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118403>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118404>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118405>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118409>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118410>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118411>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118412>,1},
> > {cancel_timer,#Ref<0.2399112782.889192450.118413>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118414>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118415>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118419>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118420>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118421>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118422>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118423>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118424>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118425>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118429>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118430>,0},
> > {cancel_timer,#Ref<0.2399112782.889192450.118431>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118432>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118433>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118434>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118435>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118439>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118440>,1},
> > {cancel_timer,#Ref<0.2399112782.889192450.118441>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118442>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118443>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118444>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118445>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118449>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118450>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118451>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118452>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118453>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118454>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118455>,5},
> > {timeout,#Ref<0.2399112782.889192450.118456>,fetch},
> > {'EXIT',<0.10080.8>,normal},
> > {'EXIT',<0.10079.8>,normal},
> > {'EXIT',<0.10081.8>,normal},
> > {'EXIT',<0.10082.8>,normal},
> > {'EXIT',<0.10083.8>,normal},
> > {'EXIT',<0.10084.8>,normal},
> > {'EXIT',<0.10085.8>,normal},
> > {'EXIT',<0.10086.8>,normal},
> > {'EXIT',<0.10087.8>,normal},
> > {'EXIT',<0.10088.8>,normal},
> > {'EXIT',<0.10089.8>,normal},
> > {'EXIT',<0.10071.8>,normal}]
> > links: [<0.931.0>]
> > dictionary: []
> > trap_exit: true
> > status: running
> > heap_size: 17731
> > stack_size: 27
> > reductions: 28849413204
> > neighbours:
>
> >
> >
> > I’ll try and get a simple test case for you.
> > Regards,
> > Peter.
> >
> >
> >
> > > On 12 Apr 2019, at 13:38, Raimo Niskanen <raimo+erlang-questions@REDACTED> wrote:
> > >
> > > On Fri, Apr 12, 2019 at 10:38:42AM +0200, Raimo Niskanen wrote:
> > >> On Wed, Apr 10, 2019 at 01:41:06PM +0100, Peter Morgan wrote:
> > >>> Hello -
> > >>>
> > >>> We are _sometimes_ seeing cases where a state_timeout in a gen_statem results in the timeout reaching the callback module as info {timeout, Ref, Name} with OTP 21.2.
> > >>>
> > >>> The crash looks like:
> > >>>
> > >>> =ERROR REPORT==== 8-Apr-2019::23:37:19.035346 === <0.969.0> gen_statem:error_info/5:1895
> > >>> ** State machine {kafire_fetcher,<<“abc">>,<<“def">>,0} terminating
> > >>> ** Last event = {info,{timeout,#Ref<0.2399112782.889192450.118024>,fetch}}
> > >>>
> > >>> We crash because we are not expecting info messages - interestingly the following messages are in the queue for the crashed process:
> > >>>
> > >>>
> > >>> message_queue_len: 279
> > >>> messages: [{timeout,#Ref<0.2399112782.903610369.70160>,fetch},
> > >>> {cancel_timer,#Ref<0.2399112782.903610369.70160>,false},
> > >>> {cancel_timer,#Ref<0.2399112782.903610369.70168>,4},
> > >>> {cancel_timer,#Ref<0.2399112782.889192450.118038>,5},
> > >>> {cancel_timer,#Ref<0.2399112782.889192450.118039>,5},
> > >>> {cancel_timer,#Ref<0.2399112782.889192450.118043>,5},
> > >>> {cancel_timer,#Ref<0.2399112782.889192450.118044>,4},
> > >>> {cancel_timer,#Ref<0.2399112782.889192450.118045>,5},
> > >>> {cancel_timer,#Ref<0.2399112782.889192450.118049>,5},
> > >>> {cancel_timer,#Ref<0.2399112782.889192450.118050>,5},
> > >>> {cancel_timer,#Ref<0.2399112782.889192450.118051>,4},
> > >>> {cancel_timer,#Ref<0.2399112782.889192450.118052>,5},
> > >>> {cancel_timer,#Ref<0.2399112782.889192450.118053>,5},
> > >>> {cancel_timer,#Ref<0.2399112782.889192450.118054>,5},
> > >>> {cancel_timer,#Ref<0.2399112782.889192450.118055>,4},
> > >>> {cancel_timer,#Ref<0.2399112782.889192450.118056>,5},
> > >>> {cancel_timer,#Ref<0.2399112782.889192450.118060>,5},
> > >>
> > >> Is that from a process_info() call?
> > >>
> > >>>
> > >>> Followed by lots more cancel_timer messages (200ish!). Our gen_statem does use {state_timeout, 500, fetch} so we are expecting a “fetch” timeout to happen, and we use repeat_state_and_data to requeue the state timeout (and also transition to other states). Is it possible that timeouts in gen_statem can be delivered as an info message?
> > >>
> > >> It is not supposed to happen. This must be a bug.
> > >>
> > >> The gen_statem engine keeps track of the running state_timeout timer
> > >> and should never present it as an info event.
> > >>
> > >> When a timer is stopped (or restarted) it is done with an asynchronous
> > >> cancel (in OTP 21) so the cancel_timer messages comes from that.
> > >> They should be matched against map entries in the internal gen_statem
> > >> engine state when they arrive so them being in the inbox may be ok.
> > >>
> > >> However, how come you have so many? Are you restarting the state_timeout
> > >> in a very tight loop of repeat_state:s? Can you show the essential parts
> > >> of the code that causes state_timeout:s?
> > >>
> > >> Also - OTP-21.2; is that your exact OTP version? What is your stdlib version?
> > >>
> > >> I will look at the relevant parts of the code with the new knowledge that
> > >> a state_timeout timer can be lost, probably in combination with
> > >> repeat_state and state_timeout restart!
> > >
> > >
> > > Sorry, the knowledge did not help; I did not find anything suspicous.
> > >
> > > More information will be needed...
> > >
> > >>
> > >> On master since Jan 21 so in OTP-22.0-rc1 among other changes the timer handling
> > >> has been rewritten to use a synchronous cancel, which simplified the code
> > >> significantly. I do not know if that would be worth trying?
> > >>
> > >> / Raimo Niskanen
> > >>
> > >>
> > >>>
> > >>> Thanks
> > >>> Peter.
> > >>>
> > >
> > > --
> > >
> > > / Raimo Niskanen, Erlang/OTP, Ericsson AB
> > > _______________________________________________
> > > erlang-questions mailing list
> > > erlang-questions@REDACTED
> > > http://erlang.org/mailman/listinfo/erlang-questions
> >
>
>
> --
>
> / Raimo Niskanen, Erlang/OTP, Ericsson AB
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
--
/ Raimo Niskanen, Erlang/OTP, Ericsson AB
More information about the erlang-questions
mailing list