[erlang-questions] Heart behavior

Mon Jul 6 13:38:33 CEST 2015

Hello,

On Mon, Jul 6, 2015 at 12:15 PM, Bogdan Andu <bog495@REDACTED> wrote:

> Hi,
>
> Is HEART_BEAT_TIMEOUT I was refferd to.
>
> The doc says:
> "This modules contains the interface to the heart process. heart sends
> periodic heartbeats to an external port program, which is also named heart."
>
> So the external proccess, heart, is run with the same credentials as the
> vm user
> and is started also by vm, right?
>

yes

> When the vm crashes,
> this external heart process kills itself after re-spawning another vm,
> right?
>

yes

>
> and so on.. new vm with new heart external process
>
> From the docs I understand that the external heart process never dies
> and after restarts a crashed vm it monitors the new vm with new pid.
>

It would be great if you could help us get the documentation clearer. I've
been working with this for many years now and I've lost all sense of what
is obvious and what is not. So getting help in clarifying the documentation
from people who are just getting familiar with the subject is great. If you
feel like contributing, the relevant documentation file can be found here:
https://github.com/erlang/otp/blob/master/lib/kernel/doc/src/heart.xml

>
> What is the rate at which heart sends within vm periodic hearbeats to
> external heart process ?
>

5 seconds,
https://github.com/erlang/otp/blob/master/lib/kernel/src/heart.erl#L48

> It seems very high. Does this add some load on the monitored vm?
>

Some, but very little in the big scheme of things.

> Another issue I observe is that heart never logs the crash/restart events
> in the
> application's logs, configured like this (in app.config and started with
> -boot start_sasl -config /var/app/app ):
>
> [{sasl, [
>           {sasl_error_logger, false},
>           %% define the parameters of the rotating log
>           %% the log file directory
>           {error_logger_mf_dir,"/var/app/logs"},
>           %% # bytes per logfile
>           {error_logger_mf_maxbytes,10485760}, % 10 MB
>           %% maximum number of logfiles
>           {error_logger_mf_maxfiles, 10}
>         ]}]
>
> I was expecting to see some heart activity logged, but
> there is nothing.
>

I would add logging in the HEART_COMMAND script. You can write an as
complex bash script as you want in the HEART_COMMAND, so just prefixing the
erlang start with a call to logger would add a message in the syslog saying
that erlang has been restarted. You can even call sendmail here if you want
to :).

When you crash you will get an erl_crash.dump which is what you need
together with the application specific logs. So you may even want to write
a small script that bundles all this together in a .tgz and sends it to you.

When things go really bad, like if you segfault for some reason, then
you'll probably want to setup your /proc/sys/kernel/core_pattern to do
something for you.

When things go really really bad, like if the oom killer comes and kills
you, then I don't think there is much that you can do. I've seen the oom
killer come and kill both the erlang vm and heart at the same time, which
is really nasty. Hopefully you will get something in your syslog that tells
you what might be wrong.

>
> What must be done to log heart events in application's log
> or anywhere else? Because I want to monitor that heart log file and
> be notified by e-mail when such events occurs.
>

The heart program prints it's logs to stderr. Since it is not started with
stderr_to_stdout, all stderr it does will end up in the erlang vm's stderr.
So you want to forward that to the file you want. This of course means that
you cannot run as -detached. Maybe you want to use run_erl/to_erl to do the
daemonization. run_erl will log stdout and stderr to a log file that you
can use to debug stuff.

>
> Thanks,
> Bogdan
>
>
>
> On Mon, Jul 6, 2015 at 12:36 PM, Lukas Larsson <lukas@REDACTED
> > wrote:
>
>> Hello Bogdan,
>>
>> See some answers inline:
>>
>> On Mon, Jul 6, 2015 at 10:33 AM, Bogdan Andu <bog495@REDACTED> wrote:
>>
>>> Hi,
>>>
>>> I was made some experiments with heart
>>> and I found something surpizing, athough it
>>> does the job.
>>>
>>> I start a erlang vm in daemon mode under user called
>>> _user0 with home in /var/app like this
>>> (from a shell script /var/app/appd run with sudo as a priv user):
>>>
>>> case $1 in
>>>   start)
>>>
>>> su - _user0 -c "$ERL -boot start_sasl -config $LOG +K true +A 4 -sname
>>> $NODE  -heart -detached -s app_ctl start $NODE"
>>>
>>> ;;
>>>
>>>   restart)
>>>      /usr/local/lib/erlang/lib/erl_interface-3.7.20/bin/erl_call -q
>>> -sname $NODE
>>>      sleep 2
>>>      $ERL -boot start_sasl -config $LOG +K true +A 4 -sname $NODE\
>>>                                  -heart -detached -s app_ctl start $NODE
>>>     ;;
>>>
>>> ....
>>>
>>> exit 0
>>>
>>> environment vars are(under user _user0):
>>>
>>> HEART_COMMAND=/bin/sh /var/app/appd restart
>>> ERL_CRASH_DUMP_SECONDS=10
>>>
>>> I have noticed 3 problems:
>>> 1)  Starting the daemon (as a priv user) with sudo sh /var/app/appd start
>>>      it starts the heart subsystem, but when I issue sudo kill -9
>>> <pid-of_erlang-
>>>     vm-monitored-byheart>, the erlang vm is killed but heart never
>>> restarts it;
>>>     Running as _user0 the command /bin/sh /var/app/appd restart manually
>>>     heart restarts the system monitored after was killed;
>>>
>>
>> It sounds as if heart for some reason cannot execute the HEART_COMMAND.
>> Why that might be I don't know, maybe you could try to run it as a
>> non-daemon, or at least redirect the stderr printouts to some file. heart
>> might print things to stderr if it cannot execute HEART_COMMAND.
>>
>>
>>>
>>> 2) Everytime I kill a vm monitored by heart with kill -9 <pid-of-vm> the
>>> heart procces restarts it immediately, and after that the heart process
>>> dies itself,and if in restart is not mentioned -heart option, the heart
>>> process is not restarted for the newly restarted erlang vm.
>>>
>>
>> When you supply -heart to the erlang vm command line you tell that VM to
>> monitor itself using the heart mechanism. So if, as you say, you don't pass
>> -heart on the HEART_COMMAND command, the new vm will not be restarted. This
>> is by design.
>>
>>
>>>
>>> 3) It seems the default timeout of 60 seconds is not respected because
>>> the vm is restarted immediately -heart option is specified in restart
>>> script;
>>>
>>
>> Which timeout is it that you are referring to here? HEART_BEAT_TIMEOUT or
>> HEART_BEAT_BOOT_DELAY? HEART_BEAT_TIMEOUT is the maximum time it will take
>> for heart to detect that something is wrong with the VM, if it can detect
>> that something is wrong earlier then it will.
>>
>>
>>>
>>> So a heart process is tied up to an erlang vm that it monitors and it
>>> dies after it spawns another erlang vm?
>>>
>>
>> yes
>>
>>
>>> The docs are not clear about these.
>>>
>>> Having said these what are the best practices to use heart and why
>>> heart behaves like above?
>>>
>>> It seems heart works with kill -KILL|SIGV <pid-of-vm>, but I  am not sure
>>> what happens if the erlang vm crashes when runs out of memory of file
>>> descriptors.
>>> Is the vm restarted by heart in these conditions?
>>>
>>
>> It should be. The only reason for heart not to restart the VM (that I can
>> think of right now) is if you call init:stop(), or if the command line that
>> you gave to HEART_COMMAND does not work.
>>
>>
>>>
>>> System: OTP 17.5 64 bits
>>>
>>> Thanks,
>>> Bogdan
>>>
>>>
>>> _______________________________________________
>>> erlang-questions mailing list
>>> erlang-questions@REDACTED
>>> http://erlang.org/mailman/listinfo/erlang-questions
>>>
>>>
>>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20150706/c5b52d8c/attachment.htm>