[erlang-questions] non-trivial supervisor idioms?

Ulf Wiger ulf.wiger@REDACTED
Tue Sep 28 09:07:16 CEST 2010


On 27 Sep 2010, at 20:05, Edmond Begumisa wrote:

> Hi again Ulf,
> 
> It's great to get 'the guy' on this subject online so I'm going to take full advantage and ask two more questions that have been dogging me...
> 
> Firstly, is there an open-source project you know of that uses included-applications and/or start phases properly that I could take a peek at? Maybe in OTP source itself?

Off the top of my head, I really can't think of any. :)

The area where start phases really come in handy is when your application needs 
to support failover/takeover behaviour. This is also when the StartType argument
is needed. One can implement takeover by writing a special start phase that instructs
the processes to take over processing from the other side. In general, it is best to do
this at a point where all processes have been started and are ready to process 
incoming requests.

The initial reason for start phases was that the complex call-handling applications
at Ericsson had some pretty horrendous dependencies to sort out before they
could start accepting calls, and doing this work in the init function of the processes
simply wasn't feasible. Also, when a process dies in the init function, this is 
interpreted as a start error, and the application start will fail, whereas individual 
processes have proper supervision while they are responding to requests from
the start phase code (which runs in the application_starter process).

Included applications were mainly introduced since the same call-handling 
applications needed to move as one during failover and takeover, and starting
a dozen or so top applications made that much more difficult. It was just too much
code and too many modules to integrate into one single application without one
more structuring layer.

Initially, I wrote some code that read .appSrc files in each sub-application and 
integrated them into one larger application, using a top-level resource file - I
think it had the extension .appLm (as in load module - never mind; it made sense
at Ericsson, and it was so long ago that I may be remembering wrong). This was 
later generalised by OTP into included_applications.

The O&M applications also had a problem during takeover: The snmp code 
assumed that the snmp agent was locally registered on the same node, which 
wasn't necessarily the case during the transition - either on the node taking over
or on the node where it ran before. We then created a wrapper application that 
included all the O&M applications, and called the individual start functions for 
each included app.

Later, we moved away from that, as we had to also support applications that 
were written according to a different timeline, and therefore couldn't be integrated
the same way as our other apps. I came up with a solution for starting and stopping
included apps and plugging in their start phase hooks in the right places in the 
startup flow, but for some reason people found it complicated... :)

The better solution was to make use of the fact that the application controller now
had a message passing interface for controlling the starting and stopping of apps.
We were already using this in our cluster controller, so we could extend it by
specifying distributed start dependencies and which applications needed to do
takeover in parallel. This way, the cluster controller knew in which order to move 
applications during takeover, and in which order to terminate them, once migrated.
Unfortunately, all this code is proprietary. It's on my long list of things I'd like to do,
but that list just keeps growing, without much ever being removed from it...

A long time ago, I made a prototype (and sent to OTP) that introduced start phase
dependencies. This would IMHO make it much easier to specify dependencies 
between applications. As an example, mnesia loads tables in the background, so 
when the application:start() function returns, one cannot assume that tables are 
loaded, and has to call mnesia:wait_for_tables() (which can time out, and has some
corner cases where tables will never be loaded without intervention - not that the
function itself will tell you when they occur). It might be better if mnesia had a 
load_tables start phase, which other applications could depend on.

BR,
Ulf W

> 
> Secondly, I've always liked the idea of using included applications not necessarily for start phases but as a delayed/start-on-demand mechanism (taking advantage of the fact that included apps are automatically loaded but not started.) That is, manually calling application:start(foo) only if a particular feature of my app is used. But I have one query that made attempts for such use short-lived... the fact that an application can only be included by one other application. I think this limitation makes it harder to use included apps and start phases especially if you're using apps that are not in-house.
> 
> For example, lets say CouchDB starts using mnesia (ok that's dumb but...) and decide to start it up using start phases (and therefore add it as an included application in couch.app) Then I have my FunkyApp that's been using mnesia too as included application. I then decide to use CouchDB for a new funky feature of FunkyApp. Now things break because mnesia is being used by both FunkyApp and CouchDB. To fix this, I not only have to modify my in-house app I have to modify the out-house CouchDB too.
> 
> Is there an obvious fix to this I've been missing?
> 
> - Edmond -
> 
> On Tue, 28 Sep 2010 03:19:29 +1000, Ulf Wiger <ulf.wiger@REDACTED> wrote:
> 
> 
> On 27 Sep 2010, at 18:14, Edmond Begumisa wrote:
> 
> Ulf,
> 
> I've been doing such initialisation in the init function of a worker manager process. Using Daniel's example, I might have a gen_server child of the main supervisor called db_mgr and set up the mnesia schema in db_mgr:init
> 
> Have I been doing the 'wrong' thing OTP-wise?
> 
> Not necessarily, but my personal preference is to cleanly separate setup code
> from application startup. This is in part because I used to work on a very complex
> product, where the setup was decidedly non-trivial, and the startup process had
> to be optimised in several steps.
> 
> Still, even there, I believe that the setup logic was bootstrapped into the startup
> phase, but the code was still kept cleanly separated. The only thing that was
> part of the startup was a simple check to see if the setup code had been run.
> 
> BR,
> Ulf W
> 
> 
> 
> - Edmond  -
> 
> On Tue, 28 Sep 2010 00:31:47 +1000, Ulf Wiger <ulf.wiger@REDACTED> wrote:
> 
> On 27/09/2010 16:15, Daniel Goertzen wrote:
> I've read the documentation on supervision and have seen a few tutorials,
> but they don't seem to move beyond the core concepts.  For example, what
> happens if you want to check and optionally setup an mnesia schema during
> startup...where should this code go?  In the supervisor init() or
> start_link() function?  Should I have my supervisor create a worker process
> whole sole job is to do this kind of setup and then dynamically add other
> workers (or supervisors) to the supervisor with start_child()?
> 
> I strongly recommend doing that sort of thing in a separate procedure,
> rather than in the startup phase.
> 
> If you want your application to be able to bootstrap itself, I would
> suggest that you either:
> 
> - create a special application that runs before your other apps,
>  and verifies that the installation is ok. To this end, it might be
>  useful to know that you can pre-sort the .rel file. The systools lib
>  will only change the sort order if needed to respect start
>  dependencies.
> - Introduce start_phases, then do minimal work in the init function,
>  and push the rest to functions that are called from start phase
>  hooks. This also has the advantage that you know that your processes
>  are all started and ready to respond during the init phase.
> 
> Start phases are documented in
> http://www.erlang.org/doc/apps/kernel/application.html#Module:start_phase-3
> 
> BR,
> Ulf W
> 
> 
> 
> --
> Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
> 
> Ulf Wiger, CTO, Erlang Solutions, Ltd.
> http://erlang-solutions.com
> 
> 
> 
> 
> ________________________________________________________________
> erlang-questions (at) erlang.org mailing list.
> See http://www.erlang.org/faq.html
> To unsubscribe; mailto:erlang-questions-unsubscribe@REDACTED
> 
> 
> 
> --
> Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

Ulf Wiger, CTO, Erlang Solutions, Ltd.
http://erlang-solutions.com





More information about the erlang-questions mailing list