[erlang-questions] R12B erlang node restart after system clock change

Stephen Han kruegger@REDACTED
Wed Sep 9 06:18:09 CEST 2009


Ulf

Thanks for the information. I was able to resolve the issue right after
see your email and I am getting the same behavior as R10B. I thought it was
my mistake during the code upgrade. However, today I download the fresh
R13B01 for my own fun and found out that same typo in the code again. Then I
realized it was not really my mistake from the beginning.

I am still using Linux 2.4 kernel so that is why I am getting this issue
easily.

In heart.c

#if defined(CORRET_USING_TIMES) || defined(GETHRTIME_WITH_CLOCK_GETTIME)
#  define HEART_CORRECT_USING_TIMES 1
#endif


The CORRET_USING_TIMES is a typo, I did not bother to look when it  was
introduced but it should be CORRECT_USING_TIMES, otherwise the system clock
change will restart the node as default behavior in 2.4 system

regards,


On Wed, Aug 26, 2009 at 1:29 AM, Ulf Wiger
<ulf.wiger@REDACTED>wrote:

>
> The documentation for heart does say:
>
> "It should be noted that if the system clock is adjusted with more than
> HEART_BEAT_TIMEOUT seconds, heart will timeout and try to reboot the system.
> This can happen, for example, if the system clock is adjusted automatically
> by use of NTP (Network Time Protocol)."
>
> (...even in R10B).
>
> However, the reason why you're not seeing this in R10B is, I think,
> that heart.c has been re-written to use the system timestamp by
> default, whereas it derived timestamps from system ticks in R10.
>
> One relevant difference in the code seems to be:
>
> /*
>  * Implement time correction using times() call even on Linuxes
>  * that can simulate gethrtime with clock_gettime, no use implementing
>  * a phony gethrtime in this file as the time questions are so infrequent.
>  */
> #if defined(CORRET_USING_TIMES) || defined(GETHRTIME_WITH_CLOCK_GETTIME)
> #  define HEART_CORRECT_USING_TIMES 1
> #endif
>
>
> Timestamps are still simulated on WIN32 or if HAVE_GETHRTIME is not
> defined, but HEART_CORRECT_USING_TIMES is.
>
> (Please verify for yourself by reading erts/etc/common/heart.c,
> as this is not documented, from what I can tell, and you should
> never draw conclusions based solely on my sloppy reading of C code).
>
> Perhaps using ticks whenever possible would be the best strategy
> for heart.c, as it is hardly a feature that it goes bezerk if
> someone dabbles with the system clock. It doesn't need hi-res
> timestamps to begin with, as no one in their right mind would
> set HEART_BEAT_TIMEOUT to something in the millisecond range
> (I don't really recommend anything less than a minute, actually,
> as heart is just a last resort, and /will/ interfere will
> crash dump generation too, if given a chance).
>
> BR,
> Ulf W
>
>
> Stephen Han wrote:
>
>> Hi
>>
>> I am facing an issue where erlang node is restarted by "heart" whenever I
>> change the system clock forward. It seems beam got KILL signal and the
>> "heart" restarting the node. The node got restarted even I move forward
>> the
>> system clock for 1 minute.
>>
>> FYI, I am using OTP R12B-3.
>>
>> The problem is I am not even sure whether the node got restarted by our
>> application or Erlang/OTP.
>> However, this is also not reproducible in our old software which used to
>> use
>> R10B-8.
>>
>> Is there any changes have been made to post R10B where Erlang node should
>> restart if the system clock move forward?
>>
>> Can you suggest any good method to debugging this kind of problem?
>>
>> regards,
>>
>>
>
> --
> Ulf Wiger
> CTO, Erlang Training & Consulting Ltd
> http://www.erlang-consulting.com
>


More information about the erlang-questions mailing list