[erlang-questions] gen_statem - state_timeout sometimes reaching callback module as [info, {timeout, Ref, Name}
Peter Morgan
peter.james.morgan@REDACTED
Tue Apr 30 13:10:09 CEST 2019
Hi Raimo,
> On 29 Apr 2019, at 15:53, Raimo Niskanen <raimo+erlang-questions@REDACTED> wrote:
>
> On Mon, Apr 29, 2019 at 11:49:18AM +0100, Peter Morgan wrote:
>> Hi Raimo,
>>
>> I’ve tried your patch and it hasn’t caused any errors. We are still having problems reproducing the issue, but it at least doesn’t look to have caused any side effects.
>
> Well, that is at least reassuring :-)
>
> So, it is so hard to reproduce that you can not be sure that a patch (any
> patch) has fixed it…?
Correct. We’ve managed to get it to happen _once_ pre patch, and suspect that it is some combination of load and/or environment (AWS). Sadly, it is proving elusive to reproduce.
>
> I will put it in our daily builds too, but we do not have many machines
> running daily for OTP-21, and this is a backported change from OTP-22, so
> it is hard to be sure that the change is exactly the same as in OTP-22
> where it runs without problems.
>
> Since it passes your regression test, if it does not cause any regression
> errors for our few OTP-21 machines, I will probably place it in queue for next
> OTP-21 maintenance patch…
That would be superb if we can get it into the queue for a maintenance patch.
>
> If you get to any further conclusions on whether the patch fixes your
> problem or breaks anything, please let me know ASAP.
:) Will do.
Thanks again,
Peter.
>
> Thank you for your help in investigating this!
>
> / Raimo
>
>
>
>>
>> Regards,
>> Peter.
>>
>>> On 25 Apr 2019, at 08:28, Raimo Niskanen <raimo+erlang-questions@REDACTED> wrote:
>>>
>>> On Wed, Apr 24, 2019 at 06:46:27PM +0100, Peter Morgan wrote:
>>>> Hi Raimo,
>>>>
>>>> I am having great difficultly reproducing in a simple test case. Despite best efforts of hundreds of thousands of processes creating timers and retry_state I’ve not managed to reproduce our production issue - with a vanilla OTP21 without your patch. Will let you know if I do.
>>>>
>>>
>>> Can you try my patch on OTP-21 - pull/2211 to see if it works without
>>> errors in your original setup?
>>>
>>> / Raimo
>>>
>>>
>>>
>>>> Thanks again for your help on this issue.
>>>>
>>>> Regards,
>>>> Peter.
>>>>
>>>>> On 17 Apr 2019, at 15:36, Raimo Niskanen <raimo+erlang-questions@REDACTED> wrote:
>>>>>
>>>>> On Wed, Apr 17, 2019 at 11:54:42AM +0200, Raimo Niskanen wrote:
>>>>>> After some discussions with the VM guys this seems to be a bug in
>>>>>> gen_statem coming from a misunderstanding of which guarantees that
>>>>>> asynchronous timer cancels gives and does not give.
>>>>>>
>>>>>> The {timeout, Ref, Msg} message may actually come _after_ the
>>>>>> {cancel_timer, Ref, false} message. Heavy load seems to increase the
>>>>>> probability from almost zero to very small.
>>>>>>
>>>>>> So the handling of these messages is broken for gen_statem on OTP-21, but
>>>>>> this should be fixed on master (soon to be OTP-22) since gen_statem there
>>>>>> does not use asynchronous timer cancels.
>>>>>>
>>>>>> I will look into how big a diff it will be to rewrite gen_statem in OTP-21.latest
>>>>>> to use synchronous timer cancel...
>>>>>
>>>>> The diff became a fat one, so I created a pull request:
>>>>>
>>>>> https://github.com/erlang/otp/pull/2211 <https://github.com/erlang/otp/pull/2211> <https://github.com/erlang/otp/pull/2211 <https://github.com/erlang/otp/pull/2211>> <https://github.com/erlang/otp/pull/2211 <https://github.com/erlang/otp/pull/2211> <https://github.com/erlang/otp/pull/2211 <https://github.com/erlang/otp/pull/2211>>>
>>>>>
>>>>> If you can try it out, that would be awsome!
>>>>>
>>>>> Best Regards
>>>>> / Raimo
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Thank you for finding a tricky bug!
>>>>>> / Raimo
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Apr 12, 2019 at 04:13:27PM +0100, Peter Morgan wrote:
>>>>>>> Hi Raimo,
>>>>>>>
>>>>>>> Thanks for looking at this, and the background on the implementation which is very useful. I am in the process of putting together a simple test case, but having trouble reproducing it. It appears to happen very infrequently - possibly something that is load related? The original output was from the Crash Report which I’ve attached - where you can see the info timeout was received by our gen_statem behaviour callback. Our OTP is 21.2 (without any 21.2.X patches):
>>>>>>>
>>>>>>> {stdlib,"ERTS CXC 138 10","3.7”}
>>>>>>> {kernel,"ERTS CXC 138 10","6.2”}
>>>>>>>
>>>>>>>
>>>>>>> Your question - “why so many?” - it could be a failure of a downstream service that is ultimately causing this, and we’re not backing off in this case - something that I’m checking too. However, I still thought it was interesting that the info timeout reaches the behaviour module.
>>>>>>>
>>>>>>
>>>>>>> =CRASH REPORT==== 8-Apr-2019::23:37:19.038905 === <0.969.0> proc_lib:crash_report/4:508
>>>>>>> crasher:
>>>>>>> initial call: kafire_fetcher:init/1
>>>>>>> pid: <0.969.0>
>>>>>>> registered_name: []
>>>>>>> exception error: no function clause matching
>>>>>>> kafire_fetcher:handle_event(info,
>>>>>>> {timeout,
>>>>>>> #Ref<0.2399112782.889192450.118024>,
>>>>>>> fetch},
>>>>>>> subsequent_fetches,
>>>>>>> our_state) (src/kafire_fetcher.erl, line 445)
>>>>>>> in function gen_statem:call_state_function/5 (gen_statem.erl, line 1662)
>>>>>>> in call from gen_statem:loop_event_state_function/6 (gen_statem.erl, line 1023)
>>>>>>> ancestors: [<0.931.0>,<0.332.0>,kafire_cluster_sup,kafire_sup,<0.227.0>]
>>>>>>> message_queue_len: 279
>>>>>>> messages: [{timeout,#Ref<0.2399112782.903610369.70160>,fetch},
>>>>>>> {cancel_timer,#Ref<0.2399112782.903610369.70160>,false},
>>>>>>> {cancel_timer,#Ref<0.2399112782.903610369.70168>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118038>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118039>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118043>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118044>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118045>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118049>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118050>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118051>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118052>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118053>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118054>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118055>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118056>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118060>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118061>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118062>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118063>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118064>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118068>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118069>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118070>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118071>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118074>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118079>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118081>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118083>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118087>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118088>,0},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118089>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118093>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118094>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118095>,3},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118096>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118097>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118098>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118099>,3},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118103>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118104>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118105>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118106>,3},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118107>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118108>,1},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118109>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118113>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118114>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118115>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118116>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118117>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118118>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118119>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118120>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118121>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118122>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118123>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118124>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118125>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118126>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118127>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118128>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118129>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118130>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118131>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118132>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118133>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118134>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118135>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118136>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118137>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118138>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118139>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118140>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118141>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118145>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118146>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118147>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118148>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118149>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118150>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118151>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118152>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118153>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118154>,5},
>>>>>>> {timeout,#Ref<0.2399112782.889192450.118159>,fetch},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118159>,false},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118165>,0},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118169>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118170>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118173>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118174>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118175>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118176>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118177>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118178>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118179>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118180>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118181>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118182>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118183>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118184>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118185>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118189>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118190>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118191>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118192>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118196>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118197>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118198>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118199>,5},
>>>>>>> {timeout,#Ref<0.2399112782.889192450.118203>,fetch},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118203>,false},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118206>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118207>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118208>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118212>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118213>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118214>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118215>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118221>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118225>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118226>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118227>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118231>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118235>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118236>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118237>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118238>,2},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118242>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118243>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118244>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118248>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118249>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118250>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118254>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118255>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118256>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118260>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118264>,0},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118265>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118266>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118270>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118271>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118272>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118273>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118274>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118278>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118279>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118280>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118281>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118282>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118283>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118287>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118288>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118289>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118290>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118291>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118295>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118296>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118297>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118298>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118299>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118303>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118304>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118305>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118306>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118307>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118308>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118309>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118313>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118314>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118315>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118316>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118317>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118318>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118319>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118323>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118324>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118325>,5},
>>>>>>> {timeout,#Ref<0.2399112782.889192450.118326>,fetch},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118326>,false},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118327>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118328>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118329>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118330>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118331>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118332>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118333>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118337>,3},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118338>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118339>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118340>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118344>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118345>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118346>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118347>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118351>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118352>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118353>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118357>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118358>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118359>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118360>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118364>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118365>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118366>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118367>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118371>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118372>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118373>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118377>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118378>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118379>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118383>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118384>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118385>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118389>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118390>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118391>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118392>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118393>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118394>,0},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118395>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118399>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118400>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118401>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118402>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118403>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118404>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118405>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118409>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118410>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118411>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118412>,1},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118413>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118414>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118415>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118419>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118420>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118421>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118422>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118423>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118424>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118425>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118429>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118430>,0},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118431>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118432>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118433>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118434>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118435>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118439>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118440>,1},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118441>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118442>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118443>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118444>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118445>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118449>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118450>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118451>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118452>,4},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118453>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118454>,5},
>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118455>,5},
>>>>>>> {timeout,#Ref<0.2399112782.889192450.118456>,fetch},
>>>>>>> {'EXIT',<0.10080.8>,normal},
>>>>>>> {'EXIT',<0.10079.8>,normal},
>>>>>>> {'EXIT',<0.10081.8>,normal},
>>>>>>> {'EXIT',<0.10082.8>,normal},
>>>>>>> {'EXIT',<0.10083.8>,normal},
>>>>>>> {'EXIT',<0.10084.8>,normal},
>>>>>>> {'EXIT',<0.10085.8>,normal},
>>>>>>> {'EXIT',<0.10086.8>,normal},
>>>>>>> {'EXIT',<0.10087.8>,normal},
>>>>>>> {'EXIT',<0.10088.8>,normal},
>>>>>>> {'EXIT',<0.10089.8>,normal},
>>>>>>> {'EXIT',<0.10071.8>,normal}]
>>>>>>> links: [<0.931.0>]
>>>>>>> dictionary: []
>>>>>>> trap_exit: true
>>>>>>> status: running
>>>>>>> heap_size: 17731
>>>>>>> stack_size: 27
>>>>>>> reductions: 28849413204
>>>>>>> neighbours:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I’ll try and get a simple test case for you.
>>>>>>> Regards,
>>>>>>> Peter.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> On 12 Apr 2019, at 13:38, Raimo Niskanen <raimo+erlang-questions@REDACTED> wrote:
>>>>>>>>
>>>>>>>> On Fri, Apr 12, 2019 at 10:38:42AM +0200, Raimo Niskanen wrote:
>>>>>>>>> On Wed, Apr 10, 2019 at 01:41:06PM +0100, Peter Morgan wrote:
>>>>>>>>>> Hello -
>>>>>>>>>>
>>>>>>>>>> We are _sometimes_ seeing cases where a state_timeout in a gen_statem results in the timeout reaching the callback module as info {timeout, Ref, Name} with OTP 21.2.
>>>>>>>>>>
>>>>>>>>>> The crash looks like:
>>>>>>>>>>
>>>>>>>>>> =ERROR REPORT==== 8-Apr-2019::23:37:19.035346 === <0.969.0> gen_statem:error_info/5:1895
>>>>>>>>>> ** State machine {kafire_fetcher,<<“abc">>,<<“def">>,0} terminating
>>>>>>>>>> ** Last event = {info,{timeout,#Ref<0.2399112782.889192450.118024>,fetch}}
>>>>>>>>>>
>>>>>>>>>> We crash because we are not expecting info messages - interestingly the following messages are in the queue for the crashed process:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> message_queue_len: 279
>>>>>>>>>> messages: [{timeout,#Ref<0.2399112782.903610369.70160>,fetch},
>>>>>>>>>> {cancel_timer,#Ref<0.2399112782.903610369.70160>,false},
>>>>>>>>>> {cancel_timer,#Ref<0.2399112782.903610369.70168>,4},
>>>>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118038>,5},
>>>>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118039>,5},
>>>>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118043>,5},
>>>>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118044>,4},
>>>>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118045>,5},
>>>>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118049>,5},
>>>>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118050>,5},
>>>>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118051>,4},
>>>>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118052>,5},
>>>>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118053>,5},
>>>>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118054>,5},
>>>>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118055>,4},
>>>>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118056>,5},
>>>>>>>>>> {cancel_timer,#Ref<0.2399112782.889192450.118060>,5},
>>>>>>>>>
>>>>>>>>> Is that from a process_info() call?
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Followed by lots more cancel_timer messages (200ish!). Our gen_statem does use {state_timeout, 500, fetch} so we are expecting a “fetch” timeout to happen, and we use repeat_state_and_data to requeue the state timeout (and also transition to other states). Is it possible that timeouts in gen_statem can be delivered as an info message?
>>>>>>>>>
>>>>>>>>> It is not supposed to happen. This must be a bug.
>>>>>>>>>
>>>>>>>>> The gen_statem engine keeps track of the running state_timeout timer
>>>>>>>>> and should never present it as an info event.
>>>>>>>>>
>>>>>>>>> When a timer is stopped (or restarted) it is done with an asynchronous
>>>>>>>>> cancel (in OTP 21) so the cancel_timer messages comes from that.
>>>>>>>>> They should be matched against map entries in the internal gen_statem
>>>>>>>>> engine state when they arrive so them being in the inbox may be ok.
>>>>>>>>>
>>>>>>>>> However, how come you have so many? Are you restarting the state_timeout
>>>>>>>>> in a very tight loop of repeat_state:s? Can you show the essential parts
>>>>>>>>> of the code that causes state_timeout:s?
>>>>>>>>>
>>>>>>>>> Also - OTP-21.2; is that your exact OTP version? What is your stdlib version?
>>>>>>>>>
>>>>>>>>> I will look at the relevant parts of the code with the new knowledge that
>>>>>>>>> a state_timeout timer can be lost, probably in combination with
>>>>>>>>> repeat_state and state_timeout restart!
>>>>>>>>
>>>>>>>>
>>>>>>>> Sorry, the knowledge did not help; I did not find anything suspicous.
>>>>>>>>
>>>>>>>> More information will be needed...
>>>>>>>>
>>>>>>>>>
>>>>>>>>> On master since Jan 21 so in OTP-22.0-rc1 among other changes the timer handling
>>>>>>>>> has been rewritten to use a synchronous cancel, which simplified the code
>>>>>>>>> significantly. I do not know if that would be worth trying?
>>>>>>>>>
>>>>>>>>> / Raimo Niskanen
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> Peter.
>>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> / Raimo Niskanen, Erlang/OTP, Ericsson AB
>>>>>>>> _______________________________________________
>>>>>>>> erlang-questions mailing list
>>>>>>>> erlang-questions@REDACTED
>>>>>>>> http://erlang.org/mailman/listinfo/erlang-questions
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> / Raimo Niskanen, Erlang/OTP, Ericsson AB
>>>>>> _______________________________________________
>>>>>> erlang-questions mailing list
>>>>>> erlang-questions@REDACTED
>>>>>> http://erlang.org/mailman/listinfo/erlang-questions
>>>>>
>>>>> --
>>>>>
>>>>> / Raimo Niskanen, Erlang/OTP, Ericsson AB
>>>>> _______________________________________________
>>>>> erlang-questions mailing list
>>>>> erlang-questions@REDACTED <mailto:erlang-questions@REDACTED> <mailto:erlang-questions@REDACTED <mailto:erlang-questions@REDACTED>> <mailto:erlang-questions@REDACTED <mailto:erlang-questions@REDACTED> <mailto:erlang-questions@REDACTED <mailto:erlang-questions@REDACTED>>>
>>>>> http://erlang.org/mailman/listinfo/erlang-questions <http://erlang.org/mailman/listinfo/erlang-questions> <http://erlang.org/mailman/listinfo/erlang-questions <http://erlang.org/mailman/listinfo/erlang-questions>> <http://erlang.org/mailman/listinfo/erlang-questions <http://erlang.org/mailman/listinfo/erlang-questions> <http://erlang.org/mailman/listinfo/erlang-questions <http://erlang.org/mailman/listinfo/erlang-questions>>>
>>>
>>> --
>>>
>>> / Raimo Niskanen, Erlang/OTP, Ericsson AB
>>> _______________________________________________
>>> erlang-questions mailing list
>>> erlang-questions@REDACTED <mailto:erlang-questions@REDACTED>
>>> http://erlang.org/mailman/listinfo/erlang-questions <http://erlang.org/mailman/listinfo/erlang-questions>
>
> --
>
> / Raimo Niskanen, Erlang/OTP, Ericsson AB
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20190430/fe726400/attachment.htm>
More information about the erlang-questions
mailing list