<div dir="ltr"><div><div><div><div><div><div>Hi,<br><br>Is HEART_BEAT_TIMEOUT I was refferd to.<br><br></div>The doc says:<br>"This modules contains the interface to the heart process. heart sends periodic heartbeats to an external port program, which is also named heart."<br><br></div>So the external proccess, heart, is run with the same credentials as the vm user<br></div>and is started also by vm, right? When the vm crashes, <br></div>this external heart process kills itself after re-spawning another vm, right?<br><br></div><div>and so on.. new vm with new heart external process<br></div><div><br></div>From the docs I understand that the external heart process never dies<br></div>and after restarts a crashed vm it monitors the new vm with new pid.<br><div><div><div><br></div><div>What is the rate at which heart sends within vm periodic hearbeats to <br>external heart process ?<br></div><div>It seems very high. Does this add some load on the monitored vm?<br><br></div><div>Another issue I observe is that heart never logs the crash/restart events in the<br></div><div>application's logs, configured like this (in app.config and started with -boot start_sasl -config /var/app/app ):<br><br>[{sasl, [<br> {sasl_error_logger, false},<br> %% define the parameters of the rotating log<br> %% the log file directory<br> {error_logger_mf_dir,"/var/app/logs"},<br> %% # bytes per logfile<br> {error_logger_mf_maxbytes,10485760}, % 10 MB<br> %% maximum number of logfiles<br> {error_logger_mf_maxfiles, 10}<br> ]}]<br><br></div><div>I was expecting to see some heart activity logged, but<br></div><div>there is nothing.<br><br></div><div>What must be done to log heart events in application's log<br></div><div>or anywhere else? Because I want to monitor that heart log file and<br>be notified by e-mail when such events occurs.<br></div><div><br></div><div>Thanks,<br></div><div>Bogdan<br></div><div><br></div><div><div><div><br></div></div></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Jul 6, 2015 at 12:36 PM, Lukas Larsson <span dir="ltr"><<a href="mailto:lukas@erlang-solutions.com" target="_blank">lukas@erlang-solutions.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hello Bogdan,<div><br></div><div>See some answers inline: <br><div class="gmail_extra"><br><div class="gmail_quote"><div><div class="h5">On Mon, Jul 6, 2015 at 10:33 AM, Bogdan Andu <span dir="ltr"><<a href="mailto:bog495@gmail.com" target="_blank">bog495@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div><div><div>Hi,<br><br></div>I was made some experiments with heart <br></div>and I found something surpizing, athough it<br></div>does the job.<br><br></div>I start a erlang vm in daemon mode under user called<br></div>_user0 with home in /var/app like this <br>(from a shell script /var/app/appd run with sudo as a priv user):<br><br>case $1 in<br> start)<br><br>su - _user0 -c "$ERL -boot start_sasl -config $LOG +K true +A 4 -sname $NODE -heart -detached -s app_ctl start $NODE"<br><div><div><div><div><div><div><br>;;<br> <br> restart)<br> /usr/local/lib/erlang/lib/erl_interface-3.7.20/bin/erl_call -q -sname $NODE<br> sleep 2<br> $ERL -boot start_sasl -config $LOG +K true +A 4 -sname $NODE\<br> -heart -detached -s app_ctl start $NODE<br> ;;<br><br>....<br><br></div><div>exit 0<br><br></div><div>environment vars are(under user _user0): <br><br>HEART_COMMAND=/bin/sh /var/app/appd restart<br>ERL_CRASH_DUMP_SECONDS=10<br><br></div><div>I have noticed 3 problems:<br></div><div>1) Starting the daemon (as a priv user) with sudo sh /var/app/appd start<br></div><div> it starts the heart subsystem, but when I issue sudo kill -9 <pid-of_erlang- <br> vm-monitored-byheart>, the erlang vm is killed but heart never restarts it;<br></div><div> Running as _user0 the command /bin/sh /var/app/appd restart manually<br></div><div> heart restarts the system monitored after was killed;<br></div></div></div></div></div></div></div></blockquote><div><br></div></div></div><div>It sounds as if heart for some reason cannot execute the HEART_COMMAND. Why that might be I don't know, maybe you could try to run it as a non-daemon, or at least redirect the stderr printouts to some file. heart might print things to stderr if it cannot execute HEART_COMMAND.</div><span class=""><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div><div><div><div><br></div><div>2) Everytime I kill a vm monitored by heart with kill -9 <pid-of-vm> the heart procces restarts it immediately, and after that the heart process dies itself,and if in restart is not mentioned -heart option, the heart process is not restarted for the newly restarted erlang vm.<br></div></div></div></div></div></div></div></blockquote><div><br></div></span><div>When you supply -heart to the erlang vm command line you tell that VM to monitor itself using the heart mechanism. So if, as you say, you don't pass -heart on the HEART_COMMAND command, the new vm will not be restarted. This is by design. </div><span class=""><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div><div><div><div><br></div><div>3) It seems the default timeout of 60 seconds is not respected because<br></div><div>the vm is restarted immediately -heart option is specified in restart script;<br></div></div></div></div></div></div></div></blockquote><div><br></div></span><div>Which timeout is it that you are referring to here? HEART_BEAT_TIMEOUT or HEART_BEAT_BOOT_DELAY? HEART_BEAT_TIMEOUT is the maximum time it will take for heart to detect that something is wrong with the VM, if it can detect that something is wrong earlier then it will.</div><span class=""><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div><div><div><div></div><div><br></div><div>So a heart process is tied up to an erlang vm that it monitors and it dies after it spawns another erlang vm?<br></div></div></div></div></div></div></div></blockquote><div> </div></span><div>yes</div><span class=""><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div><div><div><div><br></div><div>The docs are not clear about these.<br><br></div><div>Having said these what are the best practices to use heart and why <br></div><div>heart behaves like above?<br><br></div><div>It seems heart works with kill -KILL|SIGV <pid-of-vm>, but I am not sure<br></div><div>what happens if the erlang vm crashes when runs out of memory of file descriptors.<br></div><div>Is the vm restarted by heart in these conditions?<br></div></div></div></div></div></div></div></blockquote><div><br></div></span><div>It should be. The only reason for heart not to restart the VM (that I can think of right now) is if you call init:stop(), or if the command line that you gave to HEART_COMMAND does not work.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=""><div dir="ltr"><div><div><div><div><div><div></div><div><br></div><div>System: OTP 17.5 64 bits<br><br></div><div>Thanks,<br></div><div>Bogdan<br></div><div><br></div></div></div></div></div></div></div>
<br></span>_______________________________________________<br>
erlang-questions mailing list<br>
<a href="mailto:erlang-questions@erlang.org" target="_blank">erlang-questions@erlang.org</a><br>
<a href="http://erlang.org/mailman/listinfo/erlang-questions" rel="noreferrer" target="_blank">http://erlang.org/mailman/listinfo/erlang-questions</a><br>
<br></blockquote></div><br></div></div></div>
</blockquote></div><br></div></div>