[erlang-questions] gen_statem - state_timeout sometimes reaching callback module as [info, {timeout, Ref, Name}

Peter Morgan peter.james.morgan@REDACTED
Tue Apr 30 13:10:09 CEST 2019


Hi Raimo,

> On 29 Apr 2019, at 15:53, Raimo Niskanen <raimo+erlang-questions@REDACTED> wrote:
> 
> On Mon, Apr 29, 2019 at 11:49:18AM +0100, Peter Morgan wrote:
>> Hi Raimo,
>> 
>> I’ve tried your patch and it hasn’t caused any errors. We are still having problems reproducing the issue, but it at least doesn’t look to have caused any side effects.
> 
> Well, that is at least reassuring :-)
> 
> So, it is so hard to reproduce that you can not be sure that a patch (any
> patch) has fixed it…?

Correct. We’ve managed to get it to happen _once_ pre patch, and suspect that it is some combination of load and/or environment (AWS). Sadly, it is proving elusive to reproduce.


> 
> I will put it in our daily builds too, but we do not have many machines
> running daily for OTP-21, and this is a backported change from OTP-22, so
> it is hard to be sure that the change is exactly the same as in OTP-22
> where it runs without problems.
> 
> Since it passes your regression test, if it does not cause any regression
> errors for our few OTP-21 machines, I will probably place it in queue for next
> OTP-21 maintenance patch…

That would be superb if we can get it into the queue for a maintenance patch.

> 
> If you get to any further conclusions on whether the patch fixes your
> problem or breaks anything, please let me know ASAP.

:) Will do.


Thanks again,
Peter.


> 
> Thank you for your help in investigating this!
> 
> / Raimo
> 
> 
> 
>> 
>> Regards,
>> Peter.
>> 
>>> On 25 Apr 2019, at 08:28, Raimo Niskanen <raimo+erlang-questions@REDACTED> wrote:
>>> 
>>> On Wed, Apr 24, 2019 at 06:46:27PM +0100, Peter Morgan wrote:
>>>> Hi Raimo,
>>>> 
>>>> I am having great difficultly reproducing in a simple test case. Despite best efforts of hundreds of thousands of processes creating timers and retry_state I’ve not managed to reproduce our production issue - with a vanilla OTP21 without your patch. Will let you know if I do.
>>>> 
>>> 
>>> Can you try my patch on OTP-21 - pull/2211 to see if it works without
>>> errors in your original setup?
>>> 
>>> / Raimo
>>> 
>>> 
>>> 
>>>> Thanks again for your help on this issue.
>>>> 
>>>> Regards,
>>>> Peter.
>>>> 
>>>>> On 17 Apr 2019, at 15:36, Raimo Niskanen <raimo+erlang-questions@REDACTED> wrote:
>>>>> 
>>>>> On Wed, Apr 17, 2019 at 11:54:42AM +0200, Raimo Niskanen wrote:
>>>>>> After some discussions with the VM guys this seems to be a bug in
>>>>>> gen_statem coming from a misunderstanding of which guarantees that
>>>>>> asynchronous timer cancels gives and does not give.
>>>>>> 
>>>>>> The {timeout, Ref, Msg} message may actually come _after_ the
>>>>>> {cancel_timer, Ref, false} message.  Heavy load seems to increase the
>>>>>> probability from almost zero to very small.
>>>>>> 
>>>>>> So the handling of these messages is broken for gen_statem on OTP-21, but
>>>>>> this should be fixed on master (soon to be OTP-22) since gen_statem there
>>>>>> does not use asynchronous timer cancels.
>>>>>> 
>>>>>> I will look into how big a diff it will be to rewrite gen_statem in OTP-21.latest
>>>>>> to use synchronous timer cancel...
>>>>> 
>>>>> The diff became a fat one, so I created a pull request:
>>>>> 
>>>>>  https://github.com/erlang/otp/pull/2211 <https://github.com/erlang/otp/pull/2211> <https://github.com/erlang/otp/pull/2211 <https://github.com/erlang/otp/pull/2211>> <https://github.com/erlang/otp/pull/2211 <https://github.com/erlang/otp/pull/2211> <https://github.com/erlang/otp/pull/2211 <https://github.com/erlang/otp/pull/2211>>>
>>>>> 
>>>>> If you can try it out, that would be awsome!
>>>>> 
>>>>> Best Regards
>>>>> / Raimo
>>>>> 
>>>>> 
>>>>> 
>>>>>> 
>>>>>> Thank you for finding a tricky bug!
>>>>>> / Raimo
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Fri, Apr 12, 2019 at 04:13:27PM +0100, Peter Morgan wrote:
>>>>>>> Hi Raimo,
>>>>>>> 
>>>>>>> Thanks for looking at this, and the background on the implementation which is very useful. I am in the process of putting together a simple test case, but having trouble reproducing it. It appears to happen very infrequently - possibly something that is load related? The original output was from the Crash Report which I’ve attached - where you can see the info timeout was received by our gen_statem behaviour callback. Our OTP is 21.2 (without any 21.2.X patches):
>>>>>>> 
>>>>>>> {stdlib,"ERTS  CXC 138 10","3.7”}
>>>>>>> {kernel,"ERTS  CXC 138 10","6.2”}
>>>>>>> 
>>>>>>> 
>>>>>>> Your question - “why so many?” - it could be a failure of a downstream service that is ultimately causing this, and we’re not backing off in this case - something that I’m checking too. However, I still thought it was interesting that the info timeout reaches the behaviour module.
>>>>>>> 
>>>>>> 
>>>>>>> =CRASH REPORT==== 8-Apr-2019::23:37:19.038905 === <0.969.0> proc_lib:crash_report/4:508
>>>>>>> crasher:
>>>>>>>  initial call: kafire_fetcher:init/1
>>>>>>>  pid: <0.969.0>
>>>>>>>  registered_name: []
>>>>>>>  exception error: no function clause matching 
>>>>>>>                   kafire_fetcher:handle_event(info,
>>>>>>>                                               {timeout,
>>>>>>>                                                #Ref<0.2399112782.889192450.118024>,
>>>>>>>                                                fetch},
>>>>>>>                                               subsequent_fetches,
>>>>>>>                                               our_state) (src/kafire_fetcher.erl, line 445)
>>>>>>>    in function  gen_statem:call_state_function/5 (gen_statem.erl, line 1662)
>>>>>>>    in call from gen_statem:loop_event_state_function/6 (gen_statem.erl, line 1023)
>>>>>>>  ancestors: [<0.931.0>,<0.332.0>,kafire_cluster_sup,kafire_sup,<0.227.0>]
>>>>>>>  message_queue_len: 279
>>>>>>>  messages: [{timeout,#Ref<0.2399112782.903610369.70160>,fetch},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.903610369.70160>,false},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.903610369.70168>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118038>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118039>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118043>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118044>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118045>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118049>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118050>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118051>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118052>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118053>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118054>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118055>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118056>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118060>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118061>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118062>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118063>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118064>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118068>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118069>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118070>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118071>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118074>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118079>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118081>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118083>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118087>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118088>,0},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118089>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118093>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118094>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118095>,3},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118096>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118097>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118098>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118099>,3},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118103>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118104>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118105>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118106>,3},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118107>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118108>,1},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118109>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118113>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118114>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118115>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118116>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118117>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118118>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118119>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118120>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118121>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118122>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118123>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118124>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118125>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118126>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118127>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118128>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118129>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118130>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118131>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118132>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118133>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118134>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118135>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118136>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118137>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118138>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118139>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118140>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118141>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118145>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118146>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118147>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118148>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118149>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118150>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118151>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118152>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118153>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118154>,5},
>>>>>>>                {timeout,#Ref<0.2399112782.889192450.118159>,fetch},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118159>,false},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118165>,0},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118169>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118170>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118173>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118174>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118175>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118176>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118177>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118178>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118179>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118180>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118181>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118182>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118183>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118184>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118185>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118189>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118190>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118191>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118192>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118196>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118197>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118198>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118199>,5},
>>>>>>>                {timeout,#Ref<0.2399112782.889192450.118203>,fetch},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118203>,false},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118206>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118207>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118208>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118212>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118213>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118214>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118215>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118221>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118225>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118226>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118227>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118231>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118235>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118236>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118237>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118238>,2},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118242>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118243>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118244>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118248>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118249>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118250>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118254>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118255>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118256>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118260>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118264>,0},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118265>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118266>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118270>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118271>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118272>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118273>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118274>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118278>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118279>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118280>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118281>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118282>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118283>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118287>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118288>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118289>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118290>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118291>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118295>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118296>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118297>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118298>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118299>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118303>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118304>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118305>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118306>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118307>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118308>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118309>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118313>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118314>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118315>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118316>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118317>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118318>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118319>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118323>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118324>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118325>,5},
>>>>>>>                {timeout,#Ref<0.2399112782.889192450.118326>,fetch},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118326>,false},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118327>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118328>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118329>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118330>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118331>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118332>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118333>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118337>,3},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118338>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118339>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118340>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118344>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118345>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118346>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118347>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118351>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118352>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118353>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118357>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118358>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118359>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118360>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118364>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118365>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118366>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118367>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118371>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118372>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118373>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118377>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118378>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118379>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118383>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118384>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118385>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118389>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118390>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118391>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118392>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118393>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118394>,0},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118395>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118399>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118400>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118401>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118402>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118403>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118404>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118405>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118409>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118410>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118411>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118412>,1},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118413>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118414>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118415>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118419>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118420>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118421>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118422>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118423>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118424>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118425>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118429>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118430>,0},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118431>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118432>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118433>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118434>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118435>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118439>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118440>,1},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118441>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118442>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118443>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118444>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118445>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118449>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118450>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118451>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118452>,4},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118453>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118454>,5},
>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118455>,5},
>>>>>>>                {timeout,#Ref<0.2399112782.889192450.118456>,fetch},
>>>>>>>                {'EXIT',<0.10080.8>,normal},
>>>>>>>                {'EXIT',<0.10079.8>,normal},
>>>>>>>                {'EXIT',<0.10081.8>,normal},
>>>>>>>                {'EXIT',<0.10082.8>,normal},
>>>>>>>                {'EXIT',<0.10083.8>,normal},
>>>>>>>                {'EXIT',<0.10084.8>,normal},
>>>>>>>                {'EXIT',<0.10085.8>,normal},
>>>>>>>                {'EXIT',<0.10086.8>,normal},
>>>>>>>                {'EXIT',<0.10087.8>,normal},
>>>>>>>                {'EXIT',<0.10088.8>,normal},
>>>>>>>                {'EXIT',<0.10089.8>,normal},
>>>>>>>                {'EXIT',<0.10071.8>,normal}]
>>>>>>>  links: [<0.931.0>]
>>>>>>>  dictionary: []
>>>>>>>  trap_exit: true
>>>>>>>  status: running
>>>>>>>  heap_size: 17731
>>>>>>>  stack_size: 27
>>>>>>>  reductions: 28849413204
>>>>>>> neighbours:
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> I’ll try and get a simple test case for you.
>>>>>>> Regards,
>>>>>>> Peter.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On 12 Apr 2019, at 13:38, Raimo Niskanen <raimo+erlang-questions@REDACTED> wrote:
>>>>>>>> 
>>>>>>>> On Fri, Apr 12, 2019 at 10:38:42AM +0200, Raimo Niskanen wrote:
>>>>>>>>> On Wed, Apr 10, 2019 at 01:41:06PM +0100, Peter Morgan wrote:
>>>>>>>>>> Hello -
>>>>>>>>>> 
>>>>>>>>>> We are _sometimes_ seeing cases where a state_timeout in a gen_statem results in the timeout reaching the callback module as info {timeout, Ref, Name} with OTP 21.2.
>>>>>>>>>> 
>>>>>>>>>> The crash looks like:
>>>>>>>>>> 
>>>>>>>>>> =ERROR REPORT==== 8-Apr-2019::23:37:19.035346 === <0.969.0> gen_statem:error_info/5:1895
>>>>>>>>>> ** State machine {kafire_fetcher,<<“abc">>,<<“def">>,0} terminating
>>>>>>>>>> ** Last event = {info,{timeout,#Ref<0.2399112782.889192450.118024>,fetch}}
>>>>>>>>>> 
>>>>>>>>>> We crash because we are not expecting info messages - interestingly the following messages are in the queue for the crashed process:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> message_queue_len: 279
>>>>>>>>>> messages: [{timeout,#Ref<0.2399112782.903610369.70160>,fetch},
>>>>>>>>>>               {cancel_timer,#Ref<0.2399112782.903610369.70160>,false},
>>>>>>>>>>               {cancel_timer,#Ref<0.2399112782.903610369.70168>,4},
>>>>>>>>>>               {cancel_timer,#Ref<0.2399112782.889192450.118038>,5},
>>>>>>>>>>               {cancel_timer,#Ref<0.2399112782.889192450.118039>,5},
>>>>>>>>>>               {cancel_timer,#Ref<0.2399112782.889192450.118043>,5},
>>>>>>>>>>               {cancel_timer,#Ref<0.2399112782.889192450.118044>,4},
>>>>>>>>>>               {cancel_timer,#Ref<0.2399112782.889192450.118045>,5},
>>>>>>>>>>               {cancel_timer,#Ref<0.2399112782.889192450.118049>,5},
>>>>>>>>>>               {cancel_timer,#Ref<0.2399112782.889192450.118050>,5},
>>>>>>>>>>               {cancel_timer,#Ref<0.2399112782.889192450.118051>,4},
>>>>>>>>>>               {cancel_timer,#Ref<0.2399112782.889192450.118052>,5},
>>>>>>>>>>               {cancel_timer,#Ref<0.2399112782.889192450.118053>,5},
>>>>>>>>>>               {cancel_timer,#Ref<0.2399112782.889192450.118054>,5},
>>>>>>>>>>               {cancel_timer,#Ref<0.2399112782.889192450.118055>,4},
>>>>>>>>>>               {cancel_timer,#Ref<0.2399112782.889192450.118056>,5},
>>>>>>>>>>               {cancel_timer,#Ref<0.2399112782.889192450.118060>,5},
>>>>>>>>> 
>>>>>>>>> Is that from a process_info() call?
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Followed by lots more cancel_timer messages (200ish!). Our gen_statem does use {state_timeout, 500, fetch} so we are expecting a “fetch” timeout to happen, and we use repeat_state_and_data to requeue the state timeout (and also transition to other states). Is it possible that timeouts in gen_statem can be delivered as an info message?
>>>>>>>>> 
>>>>>>>>> It is not supposed to happen.  This must be a bug.
>>>>>>>>> 
>>>>>>>>> The gen_statem engine keeps track of the running state_timeout timer
>>>>>>>>> and should never present it as an info event.
>>>>>>>>> 
>>>>>>>>> When a timer is stopped (or restarted) it is done with an asynchronous
>>>>>>>>> cancel (in OTP 21) so the cancel_timer messages comes from that.
>>>>>>>>> They should be matched against map entries in the internal gen_statem
>>>>>>>>> engine state when they arrive so them being in the inbox may be ok.
>>>>>>>>> 
>>>>>>>>> However, how come you have so many?  Are you restarting the state_timeout
>>>>>>>>> in a very tight loop of repeat_state:s?  Can you show the essential parts
>>>>>>>>> of the code that causes state_timeout:s?
>>>>>>>>> 
>>>>>>>>> Also - OTP-21.2; is that your exact OTP version?  What is your stdlib version?
>>>>>>>>> 
>>>>>>>>> I will look at the relevant parts of the code with the new knowledge that
>>>>>>>>> a state_timeout timer can be lost, probably in combination with
>>>>>>>>> repeat_state and state_timeout restart!
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Sorry, the knowledge did not help; I did not find anything suspicous.
>>>>>>>> 
>>>>>>>> More information will be needed...
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On master since Jan 21 so in OTP-22.0-rc1 among other changes the timer handling
>>>>>>>>> has been rewritten to use a synchronous cancel, which simplified the code
>>>>>>>>> significantly.  I do not know if that would be worth trying?
>>>>>>>>> 
>>>>>>>>> / Raimo Niskanen
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Thanks
>>>>>>>>>> Peter.
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> -- 
>>>>>>>> 
>>>>>>>> / Raimo Niskanen, Erlang/OTP, Ericsson AB
>>>>>>>> _______________________________________________
>>>>>>>> erlang-questions mailing list
>>>>>>>> erlang-questions@REDACTED
>>>>>>>> http://erlang.org/mailman/listinfo/erlang-questions
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> -- 
>>>>>> 
>>>>>> / Raimo Niskanen, Erlang/OTP, Ericsson AB
>>>>>> _______________________________________________
>>>>>> erlang-questions mailing list
>>>>>> erlang-questions@REDACTED
>>>>>> http://erlang.org/mailman/listinfo/erlang-questions
>>>>> 
>>>>> -- 
>>>>> 
>>>>> / Raimo Niskanen, Erlang/OTP, Ericsson AB
>>>>> _______________________________________________
>>>>> erlang-questions mailing list
>>>>> erlang-questions@REDACTED <mailto:erlang-questions@REDACTED> <mailto:erlang-questions@REDACTED <mailto:erlang-questions@REDACTED>> <mailto:erlang-questions@REDACTED <mailto:erlang-questions@REDACTED> <mailto:erlang-questions@REDACTED <mailto:erlang-questions@REDACTED>>>
>>>>> http://erlang.org/mailman/listinfo/erlang-questions <http://erlang.org/mailman/listinfo/erlang-questions> <http://erlang.org/mailman/listinfo/erlang-questions <http://erlang.org/mailman/listinfo/erlang-questions>> <http://erlang.org/mailman/listinfo/erlang-questions <http://erlang.org/mailman/listinfo/erlang-questions> <http://erlang.org/mailman/listinfo/erlang-questions <http://erlang.org/mailman/listinfo/erlang-questions>>>
>>> 
>>> -- 
>>> 
>>> / Raimo Niskanen, Erlang/OTP, Ericsson AB
>>> _______________________________________________
>>> erlang-questions mailing list
>>> erlang-questions@REDACTED <mailto:erlang-questions@REDACTED>
>>> http://erlang.org/mailman/listinfo/erlang-questions <http://erlang.org/mailman/listinfo/erlang-questions>
> 
> -- 
> 
> / Raimo Niskanen, Erlang/OTP, Ericsson AB
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20190430/fe726400/attachment.htm>


More information about the erlang-questions mailing list