[erlang-questions] question how to recover a 'stateful' app when Erlang node crashes?

Jesper Louis Andersen <>
Tue Jan 17 22:19:38 CET 2012


On 1/16/12 10:52 PM, Roman Shestakov wrote:
>
> what is the correct way to recover "stateful" Erlang application? In 
> my case, the app. which is crashing is a complex hierarchy of 
> fsm_processes each containing certain state. I understand how to 
> recover stateless processes with supervisors but what is the correct 
> way to recovery stateful apps? Clearly in my case I probably need some 
> kind of supervisor 'node' but what would be the steps to correctly 
> recover killed processes with their states? do I need to use a db and 
> replay the processes from disk on another node or can I have a node 
> with identical processes hierarchy?
>
The problem with a crashing process is that its internal state is not 
sound anymore. There was a reason as to why it went wrong. The problem 
with a crashing node is largely the same. There is a reason you ended up 
with resource exhaustion in the first place.

The trick is that there is no trick. You need another node to have your 
state or you need your state on stable storage once in a while so you 
can restart from it. The point is that you can then make sure that from 
this stable state there will be no trouble. Essentially you want to only 
store to disk when you are sure about some part of the system is 
consistent with your invariants. Or move your state to another node.

-- 
Jesper Louis Andersen
   Erlang Solutions Ltd., Copenhagen, DK

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20120117/45e1624f/attachment.html>


More information about the erlang-questions mailing list