[erlang-questions] gen_statem - state_timeout sometimes reaching callback module as [info, {timeout, Ref, Name}
Raimo Niskanen
raimo+erlang-questions@REDACTED
Fri Apr 12 10:38:42 CEST 2019
On Wed, Apr 10, 2019 at 01:41:06PM +0100, Peter Morgan wrote:
> Hello -
>
> We are _sometimes_ seeing cases where a state_timeout in a gen_statem results in the timeout reaching the callback module as info {timeout, Ref, Name} with OTP 21.2.
>
> The crash looks like:
>
> =ERROR REPORT==== 8-Apr-2019::23:37:19.035346 === <0.969.0> gen_statem:error_info/5:1895
> ** State machine {kafire_fetcher,<<“abc">>,<<“def">>,0} terminating
> ** Last event = {info,{timeout,#Ref<0.2399112782.889192450.118024>,fetch}}
>
> We crash because we are not expecting info messages - interestingly the following messages are in the queue for the crashed process:
>
>
> message_queue_len: 279
> messages: [{timeout,#Ref<0.2399112782.903610369.70160>,fetch},
> {cancel_timer,#Ref<0.2399112782.903610369.70160>,false},
> {cancel_timer,#Ref<0.2399112782.903610369.70168>,4},
> {cancel_timer,#Ref<0.2399112782.889192450.118038>,5},
> {cancel_timer,#Ref<0.2399112782.889192450.118039>,5},
> {cancel_timer,#Ref<0.2399112782.889192450.118043>,5},
> {cancel_timer,#Ref<0.2399112782.889192450.118044>,4},
> {cancel_timer,#Ref<0.2399112782.889192450.118045>,5},
> {cancel_timer,#Ref<0.2399112782.889192450.118049>,5},
> {cancel_timer,#Ref<0.2399112782.889192450.118050>,5},
> {cancel_timer,#Ref<0.2399112782.889192450.118051>,4},
> {cancel_timer,#Ref<0.2399112782.889192450.118052>,5},
> {cancel_timer,#Ref<0.2399112782.889192450.118053>,5},
> {cancel_timer,#Ref<0.2399112782.889192450.118054>,5},
> {cancel_timer,#Ref<0.2399112782.889192450.118055>,4},
> {cancel_timer,#Ref<0.2399112782.889192450.118056>,5},
> {cancel_timer,#Ref<0.2399112782.889192450.118060>,5},
Is that from a process_info() call?
>
> Followed by lots more cancel_timer messages (200ish!). Our gen_statem does use {state_timeout, 500, fetch} so we are expecting a “fetch” timeout to happen, and we use repeat_state_and_data to requeue the state timeout (and also transition to other states). Is it possible that timeouts in gen_statem can be delivered as an info message?
It is not supposed to happen. This must be a bug.
The gen_statem engine keeps track of the running state_timeout timer
and should never present it as an info event.
When a timer is stopped (or restarted) it is done with an asynchronous
cancel (in OTP 21) so the cancel_timer messages comes from that.
They should be matched against map entries in the internal gen_statem
engine state when they arrive so them being in the inbox may be ok.
However, how come you have so many? Are you restarting the state_timeout
in a very tight loop of repeat_state:s? Can you show the essential parts
of the code that causes state_timeout:s?
Also - OTP-21.2; is that your exact OTP version? What is your stdlib version?
I will look at the relevant parts of the code with the new knowledge that
a state_timeout timer can be lost, probably in combination with
repeat_state and state_timeout restart!
On master since Jan 21 so in OTP-22.0-rc1 among other changes the timer handling
has been rewritten to use a synchronous cancel, which simplified the code
significantly. I do not know if that would be worth trying?
/ Raimo Niskanen
>
> Thanks
> Peter.
>
--
/ Raimo Niskanen, Erlang/OTP, Ericsson AB
More information about the erlang-questions
mailing list