[erlang-questions] initializing process

Fri Sep 18 04:04:20 CEST 2009

2009/9/18 Jayson Vantuyl <kagato@REDACTED>:
> What exactly are you trying to do?  It's really easy to create complex race
> conditions and strange startup ordering issues doing something like this.
>
> If you can have the node signal you (much less work).  Just start out in an
> "uninitialized" state set a short timeout and just go idle.  When you
> receive the "other_node_started" event, go into "running" mode.  I'd
> recommend using sync_send_event from the server you're waiting on.  That
> guarantees to close the loop in case they start up in a weird order.
>
> Timeouts are handy and highly underutilized.  Other great uses include
> "forcibly garbage collecting during inactivity" and "delaying before
> hibernating".
>
> If you must ping the node from the waiting side, you can simulate a delaying
> loop with a gen_server (and a gen_fsm) by using the timeout feature.  Just
> send the ping and set a timeout.  Have the timeout function send the ping
> again and set a timeout again.  Eventually, you'll get the response and can
> go about your merry way.
>
> Both of these methods prevent you from sitting in init for an undefined
> amount of time.

Yep. I use a hello message for this kind of thing. Start in an
unconnected state and start sending hello to a known target. When
responses start getting returned (with a valid ref() of course) then
go into fully running mode. If the target process is the last thing to
start in the peer system then you know the whole peer is up and
running.

> All of this gets complicated in the face of restarts.  It may not really be
> workable.  It would help to know more about what you're doing.
>
> I think the "correct" way (i.e. overkill in the style of Ericsson) is to do
> this at an application level.  You can set up a distributed Erlang
> application to start in "phases".  Depending on your deployment, that might
> be the ultimate answer, since it also provides hooks for takeover /
> failover, which may be necessary depending on your application.

You can do it at the supervisor level rather than application level.
Since children are started in order, you can have one of the children
not return from init() until it confirms the peer is up and running.
This will block the supervisor from starting any subsequent children
until the prerequisites are in place.