[erlang-questions] question how to recover a 'stateful' app when Erlang node crashes?

Wed Jan 18 09:31:12 CET 2012

Simply brilliant idea to save the state when above a certain threshold. Thanks CGS!

Regards,
Zabrane

On Jan 18, 2012, at 4:03 AM, CGS wrote:

> To recover the state, I am also using one of the options given in this thread. But, to prevent memory starvation (preventing is more efficient than recovering, in my opinion), one can use erlang:memory() to build a monitor for the main application. So, a solution would be to have few "flags" for different levels of memory consumption and to instruct your application to return a "busy" flag (data not accepted) to the user in case the memory usage is over a certain level. For more complex monitors, one can use os:cmd() to get the system resources consumption. Another solution is to save the state only when the memory consumption is above a threshold, so, in case of crash, to be able to start from the last known state. This is less time consuming (to save the state every time it got changed is introducing a lag in your application), but you need to take into consideration many factors to decrease the risk of losing the state.
> 
> I hope this will help you to have at least a starting point in (re)designing your application.
> 
> 
> CGS
> 
> 
> On Tue, Jan 17, 2012 at 10:19 PM, Jesper Louis Andersen <jesper.louis.andersen@REDACTED> wrote:
> On 1/16/12 10:52 PM, Roman Shestakov wrote:
>> 
>> 
>> what is the correct way to recover "stateful" Erlang application? In my case, the app. which is crashing is a complex hierarchy of fsm_processes each containing certain state. I understand how to recover stateless processes with supervisors but what is the correct way to recovery stateful apps? Clearly in my case I probably need some kind of supervisor 'node' but what would be the steps to correctly recover killed processes with their states? do I need to use a db and replay the processes from disk on another node or can I have a node with identical processes hierarchy?
>> 
> The problem with a crashing process is that its internal state is not sound anymore. There was a reason as to why it went wrong. The problem with a crashing node is largely the same. There is a reason you ended up with resource exhaustion in the first place.
> 
> The trick is that there is no trick. You need another node to have your state or you need your state on stable storage once in a while so you can restart from it. The point is that you can then make sure that from this stable state there will be no trouble. Essentially you want to only store to disk when you are sure about some part of the system is consistent with your invariants. Or move your state to another node.
> -- 
> Jesper Louis Andersen
>   Erlang Solutions Ltd., Copenhagen, DK
> 
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
> 
> 
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20120118/ffc15c96/attachment.htm>