[erlang-questions] Fault-Tolerant TCP/IP Servers

Thu Jul 17 12:23:52 CEST 2008

"David Mercer" <dmercer@REDACTED> writes:

> No, TCP/IP is used for the message stream I am thinking of.  Basically, the
> client opens a TCP/IP connection to the server, and then sends HL7 messages
> (a message format used in the healthcare industry) over the connection,
> delimited by beginning- and end-of-message control codes.  If the connection
> is closed (e.g., server fails), the client will attempt to reconnect.  My
> thought is that if the secondary can remap the network so that connections
> to that IP address are now routed to it, the reconnect would automatically
> go to the secondary.

Yup - this is relatively straightforward with linux/openbsd. You want
one virtual ip address associated with the master until it fails, at
which time the slave takes over the ip address. The clients will see the
server go down, but it'll be back up as far as they can tell very
quickly (master/slave failover).

As other posters mentioned, the technology to use is VRRP (virtual
router redundancy protocol - cisco proprietary but supported under linux
with keepalived) or CARP (common address redundancy protocol) on
*bsd.

I believe there was even an erlang only solution:
 * http://www1.erlang.org/ml-archive/erlang-questions/200212/msg00049.html
 * http://www.nabble.com/-erlang-questions--Setting-virtual-IP-via-Erlang-td7732623.html#a7750568

I don't know if anyone generalised this into a distributed OTP app that
takes over the IP address when the application is started on a node (and
given that OTP has all the machinery to work out which nodes to start
things on, monitors them for failure and restarts on new nodes, the only
thing this app would have to do is take over the address).

Hope some of those pointers help.

Cheers,
--Geoff Cant