heart does not restart node launched with run_erl
Serge
serge@REDACTED
Sun Jan 29 17:03:08 CET 2006
We happened to resolve this issue by handling SIGCHLD in run_erl. When
run_erl is executing $HEART_COMMAND that includes erl with a -heart
option: 'run_erl ... "erl ... -heart"', the following is observed:
1. run_erl starts erl
2. erl starts heart
3. heart monitors erl
If erl gets killed or exits, then
1. heart restarts HEART_COMMAND
2. new run_erl detects an active UDS (owned by old run_erl) and exits
3. heart gets terminated (since it restarted the HEART_COMMAND)
4. old run_erl gets terminated as well (I don't recall right now what
triggers its termination)
At the end we end up with no Erlang running. Attached is a patch to
run_erl that addresses this issue by forcing run_erl to exit upon
detecting the death of the node started by HEART_COMMAND. Note that
this patch also includes the patch provided by Ernie Makris / Jouni Rynö
(news://news.gmane.org:119/025601c5cf6c$459cd1d0$4601a8c0@hercules) for
RedHat ES 4.0 and Fedora.
I hope it can be included in the next release.
Regards,
Serge
erlang-questions@REDACTED wrote:
> Hi all,
> Ran into a weird problem. I have an embedded application that is started with run_erl from a .sh script. I also use heart to restart the application. HEART_COMMAND is set to launch the same start.sh script that was used to start the application initially. At the start, the process tree looks as follows:
>
> 3196 ? S 0:00 /home/drpdev/erts-5.4.10/bin/run_erl -daemon /home/drpdev/var/tmp/drp /home/drpdev/var/log/drp -exec /home/drpdev/bin/start_erl
> 3202 pts/2 Ssl+ 0:02 _ /home/drpdev/erts-5.4.10/bin/beam -- -root /home/drpdev -progname drip -- -home /home/drpdev -boot /home/drpdev/releases/1.
> 3222 ? Ss 0:00 _ heart -pid 3202
> 3227 ? Ss 0:00 _ inet_gethost 4
> 3228 ? S 0:00 | _ inet_gethost 4
> 3229 ? Ss 0:00 _ sh -s disksup
>
> To test the restart, I kill pid 3202 and see the following:
>
> 3222 ? Ss 0:00 heart -pid 3202
> 3196 ? S 0:00 /home/drpdev/erts-5.4.10/bin/run_erl -daemon /home/drpdev/var/tmp/drp /home/drpdev/var/log/drp -exec /home/drpdev/bin/start_erl
> 3202 ? Zs 0:02 _ [beam] <defunct>
>
>
> Next, heart launches the script:
>
> 3253 ? S 0:00 /bin/bash /home/drpdev/bin/drip.sh start
> 3272 ? S 0:00 _ sleep 3
> 3196 ? S 0:00 /home/drpdev/erts-5.4.10/bin/run_erl -daemon /home/drpdev/var/tmp/drp /home/drpdev/var/log/drp -exec /home/drpdev/bin/start_erl
> 3202 ? Zs 0:02 _ [beam] <defunct>
>
> The sleep 3 is right before it calls the run_erl command to start the embedded application. Note that the old run_erl (pid 3196) is still hanging around although the node itself (pid 3202) is defunct.
>
> When drip.sh calls run_erl, the old run_erl (pid 3196) goes away, but no new run_erl process appears. Application is not started either. erlang.log.1 does not showI see the following in the run_erl.log:
>
> -------
> Pty master read; run_erl [3196] Wed Jan 4 15:59:37 2006
> Pty master read; run_erl [3196] Wed Jan 4 16:00:46 2006
> Pty master read; run_erl [3196] Wed Jan 4 16:00:51 2006
> Pty master read; run_erl [3279] Wed Jan 4 16:00:54 2006
> /home/drpdev/erts-5.4.10/bin/run_erl: pid is : 3279
> run_erl [3196] Wed Jan 4 16:00:54 2006
> FIFO read; run_erl [3196] Wed Jan 4 16:00:54 2006
> OK
> run_erl [3196] Wed Jan 4 16:00:54 2006
> Pty master read; run_erl [3196] Wed Jan 4 16:00:54 2006
> Pty master read; run_erl [3196] Wed Jan 4 16:00:54 2006
> Pty master read; run_erl [3196] Wed Jan 4 16:00:54 2006
> Erlang closed the connection.
> -------
>
> I am curious why new run_erl (pid 3279) process did not start. Also, why did the old run_erl (pid 3196) did not terminate until the new run_erl attempted to start? I verified that this is not a coincidence - old run_erl will remain hanging in the process list until a new run_erl is started.
>
> Please, let me know if anyone else experienced similar issue. If needed I can provide additional info/config files, but not sure at this point which ones.
>
> Thank you.
> Dmitry Korsun
> IDT Corp.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: run_erl.patch
URL: <http://erlang.org/pipermail/erlang-patches/attachments/20060129/28cbe8ce/attachment.ksh>
More information about the erlang-patches
mailing list