[erlang-questions] Starting Erlang as a Unix daemon

Sat Jan 10 04:23:41 CET 2009

Sam,

Actually there are already tools that can help you with this task.

Use run_erl (http://www.erlang.org/doc/apps/erts/index.html) in order to 
start your node (without providing the -daemon option).  In your shell 
script run run_erl as a background task, and use erl_call (from 
erl_interface) to attempt connecting to the node or alternatively just 
grep the erlang.log.1 (produced by run_erl) for expected successful 
startup message.

Lately I've been using this approach to install and run nodes with help 
of chkconfig on Linux quite successfully.  The outer shell script is run 
as root, and all it does is starting run_erl in the background as a 
non-privileged user, detecting its death and restarting the process 
using some throttling rate to avoid too frequent restarts in case of bad 
configuration or something like that.

We've had some bad experience with the -heart option to erl in 
combination with run_erl.  As I recall two issues came up:

1. The later has some race conditions related to signal handling and 
calling select().  We are actually running a patched version of run_erl 
that fixes the issue.
http://www.erlang.org/pipermail/erlang-patches/2006-January/000147.html

2. On a few occasions we've seen on a busy node heart missing some 
heartbeat replies and killing the monitored node.  Some details on this 
can be found here:
http://erlang.org/pipermail/erlang-questions/2006-December/024365.html

On the contrary the shell-based startup alternative described above 
works pretty stably.

Serge

Sam Bobroff wrote:
> I already use a system very like the one described on your web page but
> I'm having several problems with it. They are:
> 
> * The init script should return success (a zero exit code and an "OK"
>   message) if and only if the server has started successfully. Running
>   "erl -detached" (as your script does) unfortunately always returns
>   success. It's just not acceptable to have a service report that it's
>   started successfully when it hasn't.
> * Running the stop init script should only return success if it has
>   actually managed to stop the server. (*)
> 
> Without these properties a system can't be (safely) used from package
> managers like RPM and it also looks bad to system administrators (who
> have to use "ps" and look at some log file just to see if it's started
> up).
> 
> I think I'll have to write a C wrapper program to handle this. It could
> register itself as a C node and run Erlang as a (non-detached) child
> process. It could then display any output from Erlang during the start
> up phase on the terminal, and once it receives a specific message from
> the Erlang application (via a simple message to the C node) it could
> detach itself from the terminal and return a result code (leaving Erlang
> running in the background). If that sounds useful to anyone else, let
> me know and I'll try to open source it :-)
 >
> (Even better would be to include this functionality in erl itself, and
> provide a BIF that you could call from within your Erlang application to
> complete the detachment and return a result code from erl to the shell.)
> 
> Cheers,
> 
> Sam.
> 
> (*) I've got this part working fairly well by using a system similar to
> yours but by also passing the result of the stop function out to the
> init script using halt(1) or halt(0).