[erlang-questions] gen_statem - state_timeout sometimes reaching callback module as [info, {timeout, Ref, Name}

Raimo Niskanen raimo+erlang-questions@REDACTED
Fri Apr 12 14:38:35 CEST 2019


On Fri, Apr 12, 2019 at 10:38:42AM +0200, Raimo Niskanen wrote:
> On Wed, Apr 10, 2019 at 01:41:06PM +0100, Peter Morgan wrote:
> > Hello -
> > 
> > We are _sometimes_ seeing cases where a state_timeout in a gen_statem results in the timeout reaching the callback module as info {timeout, Ref, Name} with OTP 21.2.
> > 
> > The crash looks like:
> > 
> > =ERROR REPORT==== 8-Apr-2019::23:37:19.035346 === <0.969.0> gen_statem:error_info/5:1895
> > ** State machine {kafire_fetcher,<<“abc">>,<<“def">>,0} terminating
> > ** Last event = {info,{timeout,#Ref<0.2399112782.889192450.118024>,fetch}}
> > 
> > We crash because we are not expecting info messages - interestingly the following messages are in the queue for the crashed process:
> > 
> > 
> >     message_queue_len: 279
> >     messages: [{timeout,#Ref<0.2399112782.903610369.70160>,fetch},
> >                   {cancel_timer,#Ref<0.2399112782.903610369.70160>,false},
> >                   {cancel_timer,#Ref<0.2399112782.903610369.70168>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118038>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118039>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118043>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118044>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118045>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118049>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118050>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118051>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118052>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118053>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118054>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118055>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118056>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118060>,5},
> 
> Is that from a process_info() call?
> 
> > 
> > Followed by lots more cancel_timer messages (200ish!). Our gen_statem does use {state_timeout, 500, fetch} so we are expecting a “fetch” timeout to happen, and we use repeat_state_and_data to requeue the state timeout (and also transition to other states). Is it possible that timeouts in gen_statem can be delivered as an info message?
> 
> It is not supposed to happen.  This must be a bug.
> 
> The gen_statem engine keeps track of the running state_timeout timer
> and should never present it as an info event.
> 
> When a timer is stopped (or restarted) it is done with an asynchronous
> cancel (in OTP 21) so the cancel_timer messages comes from that.
> They should be matched against map entries in the internal gen_statem
> engine state when they arrive so them being in the inbox may be ok.
> 
> However, how come you have so many?  Are you restarting the state_timeout
> in a very tight loop of repeat_state:s?  Can you show the essential parts
> of the code that causes state_timeout:s?
> 
> Also - OTP-21.2; is that your exact OTP version?  What is your stdlib version?
> 
> I will look at the relevant parts of the code with the new knowledge that
> a state_timeout timer can be lost, probably in combination with
> repeat_state and state_timeout restart!


Sorry, the knowledge did not help; I did not find anything suspicous.

More information will be needed...

> 
> On master since Jan 21 so in OTP-22.0-rc1 among other changes the timer handling
> has been rewritten to use a synchronous cancel, which simplified the code
> significantly.  I do not know if that would be worth trying?
> 
> / Raimo Niskanen
> 
> 
> > 
> > Thanks
> > Peter.
> > 

-- 

/ Raimo Niskanen, Erlang/OTP, Ericsson AB



More information about the erlang-questions mailing list