Dynamic Node Additions
Ulf Wiger
etxuwig@REDACTED
Wed Dec 11 11:26:45 CET 2002
On Wed, 11 Dec 2002, Per Bergqvist wrote:
>My personal view is: use non distributed applications and
>roll your own interlocking and failover mechanism.
Whoa! Heavy advice. I would suggest that you can get pretty
far with dist_ac, though. I'm not at all sure that people in
general will be more successful rolling their own than using
dist_ac as it is.
Having said that, we (AXD 301) don't use it. We have rolled
our own distributed application controller (based on a
prototype by Martin Björklund). It is not for the faint of
heart, but ours works well and is extremely well tested.
I will look into making it available. The agreement with OTP
was that it will eventually become part of OTP, but I will
not promise that whatever I may publish will be compatible
with whatever they may include in OTP in the future. (:
Daniesc, what you can do to get started is to use dist_ac,
and replicate state data in mnesia. This way, your
application can get started quickly on the other node.
Things to consider when you upgrade one node at a time like
that:
- The mnesia schema cannot be upgraded one node at a time.
You could work around this by using one "registry" table
(using only key+value attributes) for starters. This
doesn't "solve" the problem, but gives you a chance to
address it manually during your upgrade.
- Make sure that you handle all interaction across node
boundaries with extra care. If a procotol between
processes on different erlang nodes changes, you will
have a harder time (it can be handled, but complicates
things). Adding a version field in messages going between
nodes could help you a little down the road.
I'm sure you will eventually also learn to perform smooth
upgrades synchronized across multiple nodes. OTP supports
this, but a good tutorial is needed...
/Uffe
On Wed, 11 Dec 2002, Per Bergqvist wrote:
>Hi,
>
>... [snip] ...
>
>>
>> (We were looking at downing 1 node, loading the relevant boot
>scripts etc and then bringing it up again, then downing node 2 doing
>the same, the caveat however is that every application must be run on
>at least two nodes, and both those nodes must not go down
>simultaneously).
>>
>
>(If all nodes providing a service are down there is not much to do, is
>it ?).
>
>Is this a SASL distributed applications ?
>I experienced severe problems with the distributed application at a
>customer site earlier this spring.
>My analysis was that the distributed application controller and it's
>underlying protocol is broken.
>It is really easy to get the distributed application controller into
>deadlock states when two nodes start at the same time (e.g. reboot
>after a power failure on two identical hosts).
>
>Another bizarro side effect is that dist_ac always stops the active
>running instance of the application in a distributed cluster of nodes
>and starts it on the last started node.
>
>My personal view is: use non distributed applications and roll your
>own interlocking and failover mechanism.
>
>/Per
>
>=========================================================
>Per Bergqvist
>Synapse Systems AB
>Phone: +46 709 686 685
>Email: per@REDACTED
>
--
Ulf Wiger, Senior Specialist,
/ / / Architecture & Design of Carrier-Class Software
/ / / Strategic Product & System Management
/ / / Ericsson Telecom AB, ATM Multiservice Networks
More information about the erlang-questions
mailing list