[erlang-questions] R12B erlang node restart after system clock change

Ulf Wiger ulf.wiger@REDACTED
Wed Aug 26 10:29:14 CEST 2009


The documentation for heart does say:

"It should be noted that if the system clock is adjusted with more than 
HEART_BEAT_TIMEOUT seconds, heart will timeout and try to reboot the 
system. This can happen, for example, if the system clock is adjusted 
automatically by use of NTP (Network Time Protocol)."

(...even in R10B).

However, the reason why you're not seeing this in R10B is, I think,
that heart.c has been re-written to use the system timestamp by
default, whereas it derived timestamps from system ticks in R10.

One relevant difference in the code seems to be:

/*
  * Implement time correction using times() call even on Linuxes
  * that can simulate gethrtime with clock_gettime, no use implementing
  * a phony gethrtime in this file as the time questions are so infrequent.
  */
#if defined(CORRET_USING_TIMES) || defined(GETHRTIME_WITH_CLOCK_GETTIME)
#  define HEART_CORRECT_USING_TIMES 1
#endif


Timestamps are still simulated on WIN32 or if HAVE_GETHRTIME is not
defined, but HEART_CORRECT_USING_TIMES is.

(Please verify for yourself by reading erts/etc/common/heart.c,
as this is not documented, from what I can tell, and you should
never draw conclusions based solely on my sloppy reading of C code).

Perhaps using ticks whenever possible would be the best strategy
for heart.c, as it is hardly a feature that it goes bezerk if
someone dabbles with the system clock. It doesn't need hi-res
timestamps to begin with, as no one in their right mind would
set HEART_BEAT_TIMEOUT to something in the millisecond range
(I don't really recommend anything less than a minute, actually,
as heart is just a last resort, and /will/ interfere will
crash dump generation too, if given a chance).

BR,
Ulf W

Stephen Han wrote:
> Hi
> 
> I am facing an issue where erlang node is restarted by "heart" whenever I
> change the system clock forward. It seems beam got KILL signal and the
> "heart" restarting the node. The node got restarted even I move forward the
> system clock for 1 minute.
> 
> FYI, I am using OTP R12B-3.
> 
> The problem is I am not even sure whether the node got restarted by our
> application or Erlang/OTP.
> However, this is also not reproducible in our old software which used to use
> R10B-8.
> 
> Is there any changes have been made to post R10B where Erlang node should
> restart if the system clock move forward?
> 
> Can you suggest any good method to debugging this kind of problem?
> 
> regards,
> 


-- 
Ulf Wiger
CTO, Erlang Training & Consulting Ltd
http://www.erlang-consulting.com


More information about the erlang-questions mailing list