distributed application - replication of state

Thu Aug 25 14:54:51 CEST 2005

I was hoping to hear that with dist_ac it were possible to run a 
distributed application on two nodes, yet have only one registered with 
global, so that all calls dist_server:get_value() would be directed to 
the primary node while the second one is running in stand-by mode and 
accepting replicated state in the manner you described below.  Would 
there be any drawbacks?

I believe it would be quite beneficial for such functionality to be a 
part of the OTP.  Otherwise dist_ac is only focused on smooth controlled 
takeover, but not on state persistence during various types of process 
crashes.

Thoughts?

Serge

BTW, the "SCO vs IBM" deal you mentioned is internal Ericsson's thing?

Ulf Wiger (AL/EAB) wrote:
> If you have lots of state, I guess you have to replicate
> continuously while the erlang comm is up. After nodedown,
> you don't replicate, obviously. Upon detection of 
> partitioned network, you switch from thinking of the 
> situation as 'one processor down' to 'partitioned
> network', and try to sort things out as well as you 
> can...
> 
> One thing I'm currently working on is a version-controlled
> channel which uses plain TCP -- not distributed erlang.
> 
> The idea is (1) to be able to handle small differences in
> the protocol between two nodes. The ends of the channel
> automatically choose a conversion path, if possible, between
> two endpoint versions; and (2) to be able to communicate
> even if the erlang distribution is severed (we may do this 
> deliberately for some types of upgrade.)
> 
> My prototypes so far allow programs to send messages and 
> monitor processes on the other side. You can also monitor
> the channel. The overhead compared to distributed Erlang is 
> quite low, and the channel has its own heartbeat.
> 
> I haven't even attempted yet to get permission to release
> it as Open Source. It was easier before the whole SCO vs
> IBM deal started...
> 
> /Uffe
> 
> 
>>-----Original Message-----
>>From: Serge Aleynikov [mailto:serge@REDACTED]
>>Sent: den 25 augusti 2005 14:22
>>To: Ulf Wiger (AL/EAB)
>>Subject: Re: distributed application - replication of state
>>
>>
>>What about when the state is large enough (such as a content 
>>of an ets 
>>table) to be able to fit in a reasonably small UDP packet?
>>
>>Also, how does this help when there's no network partition, and the 
>>primary instance of the application crashes on node X, then the 
>>secondary one on node Y doesn't have access to the state of 
>>the crashed 
>>instance, and there are no UDP heartbeats to fetch the state from?
>>
>>Serge
>>
>>Ulf Wiger (AL/EAB) wrote:
>>
>>>One way to do it is to simply pass some payload 
>>>in your UDP heartbeats. It doesn't matter much if
>>>some of the packets are lost, since you keep 
>>>sending them, and it's the reception of such a 
>>>packet that alerts you that you have a partitioned
>>>network.
>>>
>>>/Uffe
>>>
>>>
>>>
>>>>-----Original Message-----
>>>>From: owner-erlang-questions@REDACTED
>>>>[mailto:owner-erlang-questions@REDACTED]On Behalf Of 
>>
>>Serge Aleynikov
>>
>>>>Sent: den 25 augusti 2005 14:00
>>>>To: Erlang Questions
>>>>Subject: distributed application - replication of state
>>>>
>>>>
>>>>Hello,
>>>>
>>>>I was experimenting with the 'dist' distributed application 
>>>>(from Ulf's 
>>>>"OTP Release Handling Tutorial") to implement controlled 
>>
>>handling of 
>>
>>>>network partitioning using a UDP heartbeat, and came up with the 
>>>>following question.
>>>>
>>>>When there is a crash of one of two distributed nodes, or a network 
>>>>partition I would like to ensure that the second node takes 
>>
>>over the 
>>
>>>>state from the other node.  In current implementation, however, the 
>>>>state is taken over smoothly only when the application is 
>>
>>running at 
>>
>>>>secondary node and it gets started at the primary node.
>>>>
>>>>Is there some common approach on how that state should be 
>>
>>replicated 
>>
>>>>between two nodes?  Do we need a separate application 
>>
>>running on all 
>>
>>>>nodes responsible for replication, that our distributed application 
>>>>would consult upon startup to initialize the state?  I 
>>>>suppose that the 
>>>>same problem had to be solved in mnesia.
>>>>
>>>>Regards,
>>>>
>>>>Serge
>>>>
>>>>-- 
>>>>Serge Aleynikov
>>>>R&D Telecom, IDT Corp.
>>>>Tel: (973) 438-3436
>>>>Fax: (973) 438-1464
>>>>serge@REDACTED
>>
> 

-- 
Serge Aleynikov
R&D Telecom, IDT Corp.
Tel: (973) 438-3436
Fax: (973) 438-1464
serge@REDACTED