[erlang-patches] Running mnesia across a firewall

Wed Mar 5 04:58:04 CET 2008

While most of the times when using mnesia I had a fairly straight 
forward network setup, recently I ran into a need to be able to use 
remote mnesia's interface in presence of a firewall and would like to 
share my experience along with a net_kernel patch that adds a useful 
feature.

The setup was as follows: two nodes (NodeA & NodeB) are inside the 
firewall and NodeC is outside the firewall.  NodeA & NodeB can connect 
to NodeC, but all inbound access from NodeC is blocked.  NodeA & NodeB 
are running mnesia with a few tables with disc_copies.  NodeC uses 
mnesia's remote interface to access data tables on nodes inside the 
firewall.

The diagram below shows network topology with corresponding Erlang's 
kernel options.

NodeA
   ^ {dist_auto_connect, once}
   |
   |
   |  INSIDE  | OUTSIDE
   +----------|firewall------> NodeC
   |          |                {dist_auto_connect, never},
   |                           {inet_dist_listen_min, X},
   v                           {inet_dist_listen_max, Y},
NodeB                         {global_groups, [{outside, [NodeC]}]}
{dist_auto_connect, once}

     Notes about NodeC kernel options:
     - {dist_auto_connect, never} prevents the node from
       attempting connections inside the firewall.
     - min/max listen options converge the firewall rules
       to a minimal reserved setup.
     - global_groups prevent global from connecting/synching
       with other nodes inside the firewall after a connection
       is made to some node inside the firewall.

All three nodes subscribe to net_kernel:monitor_nodes/1, and when NodeA 
disconnects from NodeB (or vice versa) they enable UDP heartbeat and 
upon detecting that the peer node is responding, restart one of the 
nodes, which re-synchs mnesia.

If network access between NodeA (or NodeB) and NodeC is lost, NodeC 
shuts down mnesia application and waits for {nodeup, NodeA} event, after 
which it starts up mnesia.  Without using this approach (i.e. not 
shutting down mnesia on NodeC during network outage) upon healing the 
network and reconnecting NodeC to NodeA/B, mnesia would detect 
partitioned network condition and stop replicating data between NodeA 
and NodeB (which is quite odd because NodeC doesn't have any local 
tables and it's undesirable to have its visibility impact nodes inside 
the firewall).

The main problem with this setup is that when NodeC looses connection to 
NodeA/NodeB, either one of these two nodes would need to periodically 
attempt to reconnect to NodeC.  However, because {dist_auto_connect, 
once} option is used on NodeA/NodeB, net_kernel wouldn't let 
re-establishing connection to NodeC unless *both* NodeA and NodeB are 
bounced!

The main culprit is the net_kernel's dist_auto_connect option that is an 
all or none setting that cannot vary depending on connecting attempt to 
a given node.  The attached patch (for R12B-1) solves this issue by 
introducing an additional kernel option:

     {dist_auto_connect, {callback, M, F}}

This option allows to register a callback function M:F/2 with signature:

     (Action, Node) -> Mode
         Action = connect | disconnect
         Node   = node()
         Mode   = once | never | true

that will be called when a Node tries to connect (or looses connection). 
  Modes once and never are documented in kernel(3), and 'true' means to 
continue connection action.

This patch allows to define different connecting behavior for connecting 
a@REDACTED and b@REDACTED from connecting behavior of node a@REDACTED (or 
b@REDACTED) and c@REDACTED

If others find this option as useful as I do, perhaps we can pursue the 
OTP team to merge this patch with the distribution.

Regards,

Serge.

P.S. Here's a sample implementation of this custom function:

nodeA&B.config:
===============
	[
	 {kernel,
	  [
	     {dist_auto_connect,
	         {callback, net_kernel_connector, dist_auto_connect}}
           ]},

	 {mnesia,
	  [
	     {extra_db_nodes, [a@REDACTED, b@REDACTED]}
	  ]}
	].

nodeC.config:
=============
	[
	 {kernel,
	  [
	     {dist_auto_connect, never},
	     {global_groups, [{outside, [c@REDACTED]}]},
	     {inet_dist_listen_min, 8111},
	     {inet_dist_listen_max, 8119}
           ]}

	 {mnesia,
	  [
	     {extra_db_nodes, [a@REDACTED, b@REDACTED]}
	  ]}
	].

-module(net_kernel_connector).
-export([dist_auto_connect/2]).

dist_auto_connect(Action, Node) ->
     case application:get_env(mnesia, extra_db_nodes) of
     {ok, Masters} ->
         IamMaster    = lists:member(node(), Masters),
         NodeIsMaster = lists:member(Node,   Masters),

         case {IamMaster, NodeIsMaster} of
         {true, true} ->
             once;
         {_, _} ->
             has_access(node(), Node)
         end;
     _ ->
         has_access(node(), Node)
     end.

has_access(From, To)     -> has_access2(host(From), host(To)).
has_access2('hostC', _) -> never;
has_access2(_, _)        -> true.

host(Node) ->
     L = atom_to_list(Node),
     [_, H] = string:tokens(L, "@"),
     list_to_atom(H).

-------------- next part --------------
A non-text attachment was scrubbed...
Name: R12-1.net_kernel.patch
Type: application/octet-stream
Size: 2722 bytes
Desc: not available
URL: <http://erlang.org/pipermail/erlang-patches/attachments/20080304/fd702785/attachment.obj>