[erlang-questions] question how to recover a 'stateful' app when Erlang node crashes?

Wed Jan 18 04:03:33 CET 2012

To recover the state, I am also using one of the options given in this
thread. But, to prevent memory starvation (preventing is more efficient
than recovering, in my opinion), one can use erlang:memory() to build a
monitor for the main application. So, a solution would be to have few
"flags" for different levels of memory consumption and to instruct your
application to return a "busy" flag (data not accepted) to the user in case
the memory usage is over a certain level. For more complex monitors, one
can use os:cmd() to get the system resources consumption. Another solution
is to save the state only when the memory consumption is above a threshold,
so, in case of crash, to be able to start from the last known state. This
is less time consuming (to save the state every time it got changed is
introducing a lag in your application), but you need to take into
consideration many factors to decrease the risk of losing the state.

I hope this will help you to have at least a starting point in
(re)designing your application.

CGS

On Tue, Jan 17, 2012 at 10:19 PM, Jesper Louis Andersen <
jesper.louis.andersen@REDACTED> wrote:

>  On 1/16/12 10:52 PM, Roman Shestakov wrote:
>
>
>   what is the correct way to recover "stateful" Erlang application? In my
> case, the app. which is crashing is a complex hierarchy of fsm_processes
> each containing certain state. I understand how to recover stateless
> processes with supervisors but what is the correct way to recovery stateful
> apps? Clearly in my case I probably need some kind of supervisor 'node' but
> what would be the steps to correctly recover killed processes with their
> states? do I need to use a db and replay the processes from disk on another
> node or can I have a node with identical processes hierarchy?
>
>     The problem with a crashing process is that its internal state is not
> sound anymore. There was a reason as to why it went wrong. The problem with
> a crashing node is largely the same. There is a reason you ended up with
> resource exhaustion in the first place.
>
> The trick is that there is no trick. You need another node to have your
> state or you need your state on stable storage once in a while so you can
> restart from it. The point is that you can then make sure that from this
> stable state there will be no trouble. Essentially you want to only store
> to disk when you are sure about some part of the system is consistent with
> your invariants. Or move your state to another node.
>
> --
> Jesper Louis Andersen
>   Erlang Solutions Ltd., Copenhagen, DK
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20120118/924ff0db/attachment.htm>