[erlang-questions] Processes & Fault Tolerance

Mon Jan 3 03:06:47 CET 2011

Slight correction...

On Mon, 03 Jan 2011 12:38:38 +1100, Edmond Begumisa  
<ebegumisa@REDACTED> wrote:

> Hello all,
>
> I've been trying to wrap my Erlang's fault tolerant features  
> particularly in relation to processes.
>

Should be: I've been trying to wrap my head around Erlang's fault tolerant  
features particularly in relation to processes.

Sorry.

> I've heard/read repeatedly that the primary reason why Erlang's  
> designers opted for a share-nothing policy is not rooted in concurrency  
> but rather in fault-tolerance. When nothing is shared, everything is  
> copied. When everything is copied processes can take over from one  
> another when things fail. I follow this reasoning but I don't follow how  
> to apply it.
>
> I fully understand and appreciate how supervision trees are used to  
> restart processes if they fail. What I don't get is what to do when you  
> don't want to restart but want to take over, say on another node. I know  
> that at a higher-level, OTP has some take-over/fail-over schematics (at  
> the application level.) I'm trying to understand things at the processes  
> level - why Erlang is the way it is so I can better use it to make my  
> currently fault-intolerant program fault tolerant.
>
> Specifically, how can one process take over from another if it fails? It  
> appears to may that the only way to do this would be to somehow retrieve  
> not only the state of the process (say, gen_server's state) but also the  
> messages in its mailbox. Where does the design decision to share-nothing  
> for the sake of fault-tolerance come into play for processes? Please  
> help me "get" this!
>
> Thanks in advance.
>
> - Edmond -
>
>
>

-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/