Standby system handover

Serge Aleynikov <>
Fri Jul 7 13:50:19 CEST 2006


Chaitanya,

Wouldn't your packet forwarder be a single point of failure then? 
Unless you have some balancing capability at the client with added 
intelligence of removing unavailable packet forwarders from the list of 
known servers.  Though this increases the complexity of the client's 
implementation.

The way we deal with a similar problem domain is as follows.  Two hosts 
and two routers are selected to form a redundant cluster.  Each host has 
two NICs and a serial crossover link to its mate server.  Ethernet 
bonding kernel module is configured to form a bonding interface with a 
*single* MAC address assigned to both NICs in the active/standby link 
configuration.  Each NIC is connected to a router (CISCO 3750, or alike) 
running HSRP protocol.  The server-side connections are placed in the 
same VLAN.  This gives Layer 2 redundancy and resilience to router 
failures or NIC failures:

           |     L A N      |
           |                |
      +----+----+ HSRP +----+----+
      |  CISCO  +------+  CISCO  |
      |         +------+         |
      +----+----+      +----+----+
           |     \    /     |
           |      \  /      |
           |       \/       |
           | VIP1  /\  VIP2 |
           |      /  \      |
         +-+-----+    +-----+-+
         |Server1|    |Server2|
         +----+--+    +---+---+
              |           |
              +-----------+
           Serial Hartbeat Link

Secondly, we use http://linux-ha.org project for virtual IP management. 
  Two VIPs are configurd that are owned one per active server.  Clients 
talk to servers through these VIPs.  Additionally a serial hartbeat link 
is used as a separate hardware path between VIP management software for 
heath checks.  In the event that a server goes down or a service goes 
down for maintenance, the VIP owned by the server gets migrated to the 
other server until the former owner becomes available.  This gives us 
Layer 3 redundancy.

As far as the server application is concerned, do you need the servers 
running in the active-standby mode or load sharing?  If the servers need 
to share some state, you could store that state in mnesia, and have it 
replicated between both nodes, while both nodes would be available 
sharing the workload.

In this configuration there may not even be a need for a separate packet 
forwarder, as each TCP server would simply listen for incoming packets 
on the "0.0.0.0" address (which covers all VIPs currently managed by the 
server), and do its job, or be a protocol converter between some TCP 
protocol and Erlang terms forwarded to another Erlang gen_server process 
running in the cluster using some balancing method such as pg2 application.

This approach works well for us for making highly available server 
processes.

Regards,

Serge


Chaitanya Chalasani wrote:
> Hi,
> 
> We currently have a non-realtime standby server for our mission critical 
> application. In our struggle to make a realtime standby system my job is to 
> develop an application that does TCP/IP packet forwarding to the active 
> server and in an event of active server unavailability it would send the 
> packet to any of the configured standby server. The application is not 
> written in erlang, but I am planning to write the soft handover application 
> in erlang. I am attaching a test module for the same which serves the purpose 
> but I wanted to know a better design for the same or any feature in 
> erlang/OTP that can help me build more robust application.
> 
> 
> 
> ------------------------------------------------------------------------
> 
> -module(routerApp).
> -compile(export_all).
> 
> 
> listener(PortOwn,ClusterList) ->
>     case gen_tcp:listen(PortOwn,[binary,{packet,0},{keepalive,true},{reuseaddr,true}]) of
>         {ok,LSock} -> 
>         	acceptConnections(LSock,ClusterList),
> 			gen_tcp:close(LSock),
> 			listener(PortOwn,ClusterList);
> 		Other ->
> 			io:format("Received ~p~n",[Other])
> 	end.
> 
> acceptConnections(LSock,ClusterList) ->
>     case gen_tcp:accept(LSock) of
>         {ok,Sock} ->
>             Pid = spawn(routerApp,clientConnectionThread,[Sock,ClusterList]),
>             gen_tcp:controlling_process(Sock,Pid ),
>             acceptConnections(LSock,ClusterList);
>         {error,Reason} ->
>             io:format("Unknown error ~p~n",[Reason]);
>         Other ->
>             io:format("Unknown responce ~p~n",[Other])
>     end.
> 
> clientConnectionThread(Sock,[{Ipaddress,Port}|ClusterList]) ->
>     case gen_tcp:connect(Ipaddress,Port ,[binary,{active,true}] ) of
>         {ok,ClustSock} ->
>             case listenForMessages(Sock,ClustSock) of
>                 {error,clusterNodeClosed} ->
>                     clientConnectionThread(Sock,ClusterList++[{Ipaddress,Port}]);
>                 {error,clientClosed} ->
>                 	io:format("Client Closed and so parallel thread dieing~n");
>                 Other ->
>                     io:format("Unknown error ~p and so parallel thread dieing~n",[Other])
>             end;
>         Other1 ->
>             io:format("Received ~p while connecting to ~p ~n",[Other1,{Ipaddress,Port}] ),
>             clientConnectionThread(Sock,ClusterList++[{Ipaddress,Port}])
>     end.
>                 
> listenForMessages(Sock,ClustSock) ->
>     receive
> 		{tcp_closed,ClustSock} ->
>             gen_tcp:close(ClustSock),
>             gen_tcp:close(Sock),
>             {error,clientCloses};
> 			%{error,clusterNodeClosed};
>         {tcp_closed,Sock} ->
>             gen_tcp:close(Sock),
>             gen_tcp:close(ClustSock),
>             {error,clientClosed};
>         {tcp,ClustSock,Data} ->
>             io:format("Received ~w from server socket~n",[Data]),
>             gen_tcp:send(Sock,Data ),
> 			listenForMessages(Sock,ClustSock);
>         {tcp,Sock,Data} ->
>             io:format("Received ~w from client socket~n",[Data]),
>             gen_tcp:send(ClustSock,Data ),
>             listenForMessages(Sock,ClustSock);
>         Other ->
>             io:format("Received unknown info ~p~n",[Other]),
>             {error,unknown}
> 	end.   




More information about the erlang-questions mailing list