[erlang-questions] gen_statem - state_timeout sometimes reaching callback module as [info, {timeout, Ref, Name}
Raimo Niskanen
raimo+erlang-questions@REDACTED
Fri Apr 12 14:38:35 CEST 2019
On Fri, Apr 12, 2019 at 10:38:42AM +0200, Raimo Niskanen wrote:
> On Wed, Apr 10, 2019 at 01:41:06PM +0100, Peter Morgan wrote:
> > Hello -
> >
> > We are _sometimes_ seeing cases where a state_timeout in a gen_statem results in the timeout reaching the callback module as info {timeout, Ref, Name} with OTP 21.2.
> >
> > The crash looks like:
> >
> > =ERROR REPORT==== 8-Apr-2019::23:37:19.035346 === <0.969.0> gen_statem:error_info/5:1895
> > ** State machine {kafire_fetcher,<<“abc">>,<<“def">>,0} terminating
> > ** Last event = {info,{timeout,#Ref<0.2399112782.889192450.118024>,fetch}}
> >
> > We crash because we are not expecting info messages - interestingly the following messages are in the queue for the crashed process:
> >
> >
> > message_queue_len: 279
> > messages: [{timeout,#Ref<0.2399112782.903610369.70160>,fetch},
> > {cancel_timer,#Ref<0.2399112782.903610369.70160>,false},
> > {cancel_timer,#Ref<0.2399112782.903610369.70168>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118038>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118039>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118043>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118044>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118045>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118049>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118050>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118051>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118052>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118053>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118054>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118055>,4},
> > {cancel_timer,#Ref<0.2399112782.889192450.118056>,5},
> > {cancel_timer,#Ref<0.2399112782.889192450.118060>,5},
>
> Is that from a process_info() call?
>
> >
> > Followed by lots more cancel_timer messages (200ish!). Our gen_statem does use {state_timeout, 500, fetch} so we are expecting a “fetch” timeout to happen, and we use repeat_state_and_data to requeue the state timeout (and also transition to other states). Is it possible that timeouts in gen_statem can be delivered as an info message?
>
> It is not supposed to happen. This must be a bug.
>
> The gen_statem engine keeps track of the running state_timeout timer
> and should never present it as an info event.
>
> When a timer is stopped (or restarted) it is done with an asynchronous
> cancel (in OTP 21) so the cancel_timer messages comes from that.
> They should be matched against map entries in the internal gen_statem
> engine state when they arrive so them being in the inbox may be ok.
>
> However, how come you have so many? Are you restarting the state_timeout
> in a very tight loop of repeat_state:s? Can you show the essential parts
> of the code that causes state_timeout:s?
>
> Also - OTP-21.2; is that your exact OTP version? What is your stdlib version?
>
> I will look at the relevant parts of the code with the new knowledge that
> a state_timeout timer can be lost, probably in combination with
> repeat_state and state_timeout restart!
Sorry, the knowledge did not help; I did not find anything suspicous.
More information will be needed...
>
> On master since Jan 21 so in OTP-22.0-rc1 among other changes the timer handling
> has been rewritten to use a synchronous cancel, which simplified the code
> significantly. I do not know if that would be worth trying?
>
> / Raimo Niskanen
>
>
> >
> > Thanks
> > Peter.
> >
--
/ Raimo Niskanen, Erlang/OTP, Ericsson AB
More information about the erlang-questions
mailing list