[erlang-questions] Maintaining state between application failover

Wed Aug 14 14:53:24 CEST 2013

Hi,

I think you should have the state of the server updated in all nodes, so when a node goes down it will be restarted with the current state.

If entire application in all nodes goes down I think there is no sense in having the last state you can start over from the initial state (0 in this case) since the application will start entirely over. But if it does have any sense for other purposes, then when the fail over process start you should store the last state in a distribute storage manager like dets or mnesia and then you can restart the entire application having the lasted state stored in one of the distributed nodes (you don’t need to know which node because the distributed storage manager will give you the most updated information).

Best Regards,

Ivan.

De: erlang-questions-bounces@REDACTED [mailto:erlang-questions-bounces@REDACTED] En nombre de reaperman123456789@REDACTED
Enviado el: miércoles, 14 de agosto de 2013 7:21
Para: erlang-programming@REDACTED
Asunto: [erlang-questions] Maintaining state between application failover

Assuming a distributed application, how could state between application starts due to failover be maintained?

For illustration purposes, consider the following problem:
We want a kind of server that delivers unique numbers. Starting at 0, on each request this number is delivered and incremented.
For implementation, we use a gen_server process that keeps the current number in it's state. We put that process under a one-for-one supervisor, which serves as the top supervisor of the application.

Now, even in a non-distributed setup, the gen_server could not maintain the state between restarts managed by it's supervisor. We could store the current number in the environment of the application itself (which doesn't feel right, but for illustration purposes let's keep it in mind), where it would survive restarts of the gen_server process.
In a distributed setup, even the state stored in the application environment would not survive in case of an application failover. When the node on which the application is running dies and is restarted on another node, it starts at 0 again.

Performance considerations aside, the current number could be constantly kept and updated permanently in a state file which could then be read at startup. But since the nodes would usually be running on different machines, on failover the application would be restartet on another machine than the one where the file resided, and since the reason for the death of the erlang node is presumable the death of the hardware node, would not be accessible from the application started in failover mode. For keeping the state file accessible everywhere, we would need to put it on a NFS mount or something, but the NFS server would become a critical component in our setup, not to mention the overkill of running an extra machine for the single purpose of sharing a single file which would not exceed a few bytes in size. Using a database of whatever flavor is essentially the same.

So, how could state be efficiently maintained in the erlang way of doing things?

To clarify, I am not asking for a solution to the problem of generating unique numbers, there a probably a thousand ways to do this in a better way, UUID and whatnot. I am asking for ways to maintain state between restarts of a distributed application in a failover scenario. The example problem above is a special case of the general problem I am asking about, made up purely for the purpose of having a simple illustration. A solution of the general problem would automatically solve the special case, anyway ;)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130814/a2a5b4fc/attachment.htm>