[erlang-questions] Heart behavior

Bogdan Andu bog495@REDACTED
Mon Jul 6 12:15:22 CEST 2015


Hi,

Is HEART_BEAT_TIMEOUT I was refferd to.

The doc says:
"This modules contains the interface to the heart process. heart sends
periodic heartbeats to an external port program, which is also named heart."

So the external proccess, heart, is run with the same credentials as the vm
user
and is started also by vm, right? When the vm crashes,
this external heart process kills itself after re-spawning another vm,
right?

and so on.. new vm with new heart external process

>From the docs I understand that the external heart process never dies
and after restarts a crashed vm it monitors the new vm with new pid.

What is the rate at which heart sends within vm periodic hearbeats to
external heart process ?
It seems very high. Does this add some load on the monitored vm?

Another issue I observe is that heart never logs the crash/restart events
in the
application's logs, configured like this (in app.config and started with
-boot start_sasl -config /var/app/app ):

[{sasl, [
          {sasl_error_logger, false},
          %% define the parameters of the rotating log
          %% the log file directory
          {error_logger_mf_dir,"/var/app/logs"},
          %% # bytes per logfile
          {error_logger_mf_maxbytes,10485760}, % 10 MB
          %% maximum number of logfiles
          {error_logger_mf_maxfiles, 10}
        ]}]

I was expecting to see some heart activity logged, but
there is nothing.

What must be done to log heart events in application's log
or anywhere else? Because I want to monitor that heart log file and
be notified by e-mail when such events occurs.

Thanks,
Bogdan



On Mon, Jul 6, 2015 at 12:36 PM, Lukas Larsson <lukas@REDACTED>
wrote:

> Hello Bogdan,
>
> See some answers inline:
>
> On Mon, Jul 6, 2015 at 10:33 AM, Bogdan Andu <bog495@REDACTED> wrote:
>
>> Hi,
>>
>> I was made some experiments with heart
>> and I found something surpizing, athough it
>> does the job.
>>
>> I start a erlang vm in daemon mode under user called
>> _user0 with home in /var/app like this
>> (from a shell script /var/app/appd run with sudo as a priv user):
>>
>> case $1 in
>>   start)
>>
>> su - _user0 -c "$ERL -boot start_sasl -config $LOG +K true +A 4 -sname
>> $NODE  -heart -detached -s app_ctl start $NODE"
>>
>> ;;
>>
>>   restart)
>>      /usr/local/lib/erlang/lib/erl_interface-3.7.20/bin/erl_call -q
>> -sname $NODE
>>      sleep 2
>>      $ERL -boot start_sasl -config $LOG +K true +A 4 -sname $NODE\
>>                                  -heart -detached -s app_ctl start $NODE
>>     ;;
>>
>> ....
>>
>> exit 0
>>
>> environment vars are(under user _user0):
>>
>> HEART_COMMAND=/bin/sh /var/app/appd restart
>> ERL_CRASH_DUMP_SECONDS=10
>>
>> I have noticed 3 problems:
>> 1)  Starting the daemon (as a priv user) with sudo sh /var/app/appd start
>>      it starts the heart subsystem, but when I issue sudo kill -9
>> <pid-of_erlang-
>>     vm-monitored-byheart>, the erlang vm is killed but heart never
>> restarts it;
>>     Running as _user0 the command /bin/sh /var/app/appd restart manually
>>     heart restarts the system monitored after was killed;
>>
>
> It sounds as if heart for some reason cannot execute the HEART_COMMAND.
> Why that might be I don't know, maybe you could try to run it as a
> non-daemon, or at least redirect the stderr printouts to some file. heart
> might print things to stderr if it cannot execute HEART_COMMAND.
>
>
>>
>> 2) Everytime I kill a vm monitored by heart with kill -9 <pid-of-vm> the
>> heart procces restarts it immediately, and after that the heart process
>> dies itself,and if in restart is not mentioned -heart option, the heart
>> process is not restarted for the newly restarted erlang vm.
>>
>
> When you supply -heart to the erlang vm command line you tell that VM to
> monitor itself using the heart mechanism. So if, as you say, you don't pass
> -heart on the HEART_COMMAND command, the new vm will not be restarted. This
> is by design.
>
>
>>
>> 3) It seems the default timeout of 60 seconds is not respected because
>> the vm is restarted immediately -heart option is specified in restart
>> script;
>>
>
> Which timeout is it that you are referring to here? HEART_BEAT_TIMEOUT or
> HEART_BEAT_BOOT_DELAY? HEART_BEAT_TIMEOUT is the maximum time it will take
> for heart to detect that something is wrong with the VM, if it can detect
> that something is wrong earlier then it will.
>
>
>>
>> So a heart process is tied up to an erlang vm that it monitors and it
>> dies after it spawns another erlang vm?
>>
>
> yes
>
>
>> The docs are not clear about these.
>>
>> Having said these what are the best practices to use heart and why
>> heart behaves like above?
>>
>> It seems heart works with kill -KILL|SIGV <pid-of-vm>, but I  am not sure
>> what happens if the erlang vm crashes when runs out of memory of file
>> descriptors.
>> Is the vm restarted by heart in these conditions?
>>
>
> It should be. The only reason for heart not to restart the VM (that I can
> think of right now) is if you call init:stop(), or if the command line that
> you gave to HEART_COMMAND does not work.
>
>
>>
>> System: OTP 17.5 64 bits
>>
>> Thanks,
>> Bogdan
>>
>>
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20150706/535d03fb/attachment.htm>


More information about the erlang-questions mailing list