Erlang and high availability

Loic Domaigne loic-dev@REDACTED
Tue Jun 6 20:35:06 CEST 2006


Dear Erlangers,

since several years now I am dealing with scalability, concurrency, high 
availability and fault tolerance problems using mainstream languages 
like C/C++/JAVA etc. I think I have a quite reasonable understanding of 
what can be achieved with those languages, and what their limitations are.

Although it is difficult to learn new tricks to an old dog, I really 
like to see beyond my own nose. I am interest to solve these problems by 
using truly different approaches. Erlang looks one of the most promizing 
language in that regard.

For my first study case, I would like to consider the standard heartbeat 
problem for a 2 nodes cluster. The cluster is composed of two physically 
distinct nodes A and B. The nodes may possibly have different hardware 
architecture/OS (to emphasize portability aspects). The 2 nodes are 
connected via two physically different networks N and N'.

I'd like to implement a simple heartbeat mechanism that achieves the 
following:

(*) detect a network failure: no heartbeat received over network N
     (resp. N') within a (pre-defined) period of time,  but heartbeat
     received over N' (resp. N) within the same period.

(*) detect a failure of the node A or B: no heartbeat from the
     corresponding node over both network N and N' within a (pre-defined)
     period of time.

Ideally, the heartbeat mechanism should use a lightweight protocol (like 
UDP).


The first idea that comes to my mind would be to use gen_udp and 
implement the protocol from the ground. But that's something I'd like to 
avoid, since I would have eventually have to manage the architecture 
differences between the nodes.

Furthermore, I am wondering if they are perhaps neater solutions to this 
problem. Indeed, Erlang has a built-in mechanism for exchanging message 
between processes. Second, Erlang already performs heartbeat between 
connected nodes.


I would be thankful for any advises, links to documents or code that 
would help me to make the first step in the right direction.

Thanks in advance,
Loic.

N.B> My apologize if you have answered a similar question already. 
Unfortunately the search function for the erlang.org mailing list 
archives doesn't work.



More information about the erlang-questions mailing list