Processes & Fault Tolerance
Mon Jan 3 02:38:38 CET 2011
I've been trying to wrap my Erlang's fault tolerant features particularly
in relation to processes.
I've heard/read repeatedly that the primary reason why Erlang's designers
opted for a share-nothing policy is not rooted in concurrency but rather
in fault-tolerance. When nothing is shared, everything is copied. When
everything is copied processes can take over from one another when things
fail. I follow this reasoning but I don't follow how to apply it.
I fully understand and appreciate how supervision trees are used to
restart processes if they fail. What I don't get is what to do when you
don't want to restart but want to take over, say on another node. I know
that at a higher-level, OTP has some take-over/fail-over schematics (at
the application level.) I'm trying to understand things at the processes
level - why Erlang is the way it is so I can better use it to make my
currently fault-intolerant program fault tolerant.
Specifically, how can one process take over from another if it fails? It
appears to may that the only way to do this would be to somehow retrieve
not only the state of the process (say, gen_server's state) but also the
messages in its mailbox. Where does the design decision to share-nothing
for the sake of fault-tolerance come into play for processes? Please help
me "get" this!
Thanks in advance.
- Edmond -
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
More information about the erlang-questions