heart anyone?
Martin Bjorklund
mbj@REDACTED
Thu Oct 23 12:14:13 CEST 2003
Francesco Cesarini <francesco@REDACTED> wrote:
>
>
> Dear Erlangers,
>
> before I head off to reinvent the wheel, I was wondering if any one has
> implemented their own version of heart. What we are looking for is a
> behaviour similar to the supervisor one, where you can allow a maximum
> of X local restarts of the beam emulator in Y seconds. Possibly after X
> restarts, the OS is rebooted, erlang started up again, and if it
> crashes, the whole system just stops. Needless to say cyclic restarts
> have been a problem..
>
> Any other thoughts on heart are welcome. Past problems, praises, horror
> stories, et all.
Yes, we do this, but there's no need to hack heart. Instead we do
what we have to do in the HEART_COMMAND, i.e. the command that heart
calls when the node goes down. This command is a shell script which
first checks if we've rebooted too many times, and if so gives up.
Otherwise erlang is started.
It shouldn't be difficult to also do a OS reboot here.
Here's the interesting part of that script:
#
# Execute this function to make
# sure that we don't get into a reboot cycle. If we
# reboot more than 6 times, each time less than 20
# minutes since the last reboot, we give up, and
# don't try to reboot again.
# Use this function only if heart is used and we're
# rebooting on the permenanent release.
# In case of a safe restart (i.e. not in the reboot
# interval) remove the reboot file used by isdstart.
#
check_reboot() {
Restarts=0;
Timestamp=`timestamp`;
Month=`month`;
ExternalRebootFile=$dir/reboot.isd
if [ -w $RebootFile ]; then
LastTimestamp=`awk '{print $1}' $RebootFile`;
LastMonth=`awk '{print $2}' $RebootFile`;
Restarts=`awk '{print $3}' $RebootFile`;
Diff=`expr $Timestamp - $LastTimestamp`;
if [ "$Month" = "$LastMonth" ]; then
if [ $Diff -lt $Timespan ]; then
# We rebooted too early
if [ $Restarts -ge $MaxRestarts ]; then
# We rebooted too early too many times - give up
echo "`date`: Too many reboots - giving up" >> $LogFile;
# Keep RebootFile as we otherwise removes ExternalRebootFile
# the next time started.
echo "$Timestamp $Month 0" > $RebootFile;
exit 0;
fi
Restarts=`expr $Restarts + 1`;
else
Restarts=0;
fi
else
Restarts=0;
fi
fi
if [ $Restarts = 0 -a -w $ExternalRebootFile ]; then
# We rebooted outside the reboot interval
rm $ExternalRebootFile;
fi
echo "$Timestamp $Month $Restarts" > $RebootFile;
exit 1;
}
timestamp() {
D=`date '+%d'`;
H=`date '+%H'`;
M=`date '+%M'`;
expr $D \* 1440 + $H \* 60 + $M
}
month() {
date '+%y%m'
}
/martin
More information about the erlang-questions
mailing list