[erlang-questions] initializing process

Jayson Vantuyl kagato@REDACTED
Fri Sep 18 04:11:43 CEST 2009


I think that synchronizing the supervisor this way is a bad thing, as  
it doesn't allow you to specify a limited "start time".  In other  
words, I'd consider an uninitialized server as running, otherwise the  
supervisory machinery hangs.  I'm also not sure that this behavior is  
guaranteed in the future.

Also, since this is a cross-node thing, it's going to make "restart" a  
painful process that requires manually coordinating multiple  
machines.  Even when we've been scripting these things on EC2, it's  
just not feasible to serialize all of this for large numbers of hosts  
(especially with EC2's, understandably long, startup delay).

I really think this is better done with a series of gen_fsms that can  
bounce into and out of a "waiting" state, rather than requiring all  
manner of coordination when badness occurs.  In other worse, work with  
Erlang's supervision framework and synchronization primitives, not  
against them.

On Sep 17, 2009, at 7:04 PM, Richard Andrews wrote:

> 2009/9/18 Jayson Vantuyl <kagato@REDACTED>:
>> What exactly are you trying to do?  It's really easy to create  
>> complex race
>> conditions and strange startup ordering issues doing something like  
>> this.
>>
>> If you can have the node signal you (much less work).  Just start  
>> out in an
>> "uninitialized" state set a short timeout and just go idle.  When you
>> receive the "other_node_started" event, go into "running" mode.  I'd
>> recommend using sync_send_event from the server you're waiting on.   
>> That
>> guarantees to close the loop in case they start up in a weird order.
>>
>> Timeouts are handy and highly underutilized.  Other great uses  
>> include
>> "forcibly garbage collecting during inactivity" and "delaying before
>> hibernating".
>>
>> If you must ping the node from the waiting side, you can simulate a  
>> delaying
>> loop with a gen_server (and a gen_fsm) by using the timeout  
>> feature.  Just
>> send the ping and set a timeout.  Have the timeout function send  
>> the ping
>> again and set a timeout again.  Eventually, you'll get the response  
>> and can
>> go about your merry way.
>>
>> Both of these methods prevent you from sitting in init for an  
>> undefined
>> amount of time.
>
> Yep. I use a hello message for this kind of thing. Start in an
> unconnected state and start sending hello to a known target. When
> responses start getting returned (with a valid ref() of course) then
> go into fully running mode. If the target process is the last thing to
> start in the peer system then you know the whole peer is up and
> running.
>
>> All of this gets complicated in the face of restarts.  It may not  
>> really be
>> workable.  It would help to know more about what you're doing.
>>
>> I think the "correct" way (i.e. overkill in the style of Ericsson)  
>> is to do
>> this at an application level.  You can set up a distributed Erlang
>> application to start in "phases".  Depending on your deployment,  
>> that might
>> be the ultimate answer, since it also provides hooks for takeover /
>> failover, which may be necessary depending on your application.
>
> You can do it at the supervisor level rather than application level.
> Since children are started in order, you can have one of the children
> not return from init() until it confirms the peer is up and running.
> This will block the supervisor from starting any subsequent children
> until the prerequisites are in place.
>
> ________________________________________________________________
> erlang-questions mailing list. See http://www.erlang.org/faq.html
> erlang-questions (at) erlang.org
>



More information about the erlang-questions mailing list