From saleyn@REDACTED Wed Mar 5 04:58:04 2008 From: saleyn@REDACTED (Serge Aleynikov) Date: Tue, 04 Mar 2008 22:58:04 -0500 Subject: [erlang-patches] Running mnesia across a firewall Message-ID: <47CE1A4C.3020707@gmail.com> While most of the times when using mnesia I had a fairly straight forward network setup, recently I ran into a need to be able to use remote mnesia's interface in presence of a firewall and would like to share my experience along with a net_kernel patch that adds a useful feature. The setup was as follows: two nodes (NodeA & NodeB) are inside the firewall and NodeC is outside the firewall. NodeA & NodeB can connect to NodeC, but all inbound access from NodeC is blocked. NodeA & NodeB are running mnesia with a few tables with disc_copies. NodeC uses mnesia's remote interface to access data tables on nodes inside the firewall. The diagram below shows network topology with corresponding Erlang's kernel options. NodeA ^ {dist_auto_connect, once} | | | INSIDE | OUTSIDE +----------|firewall------> NodeC | | {dist_auto_connect, never}, | {inet_dist_listen_min, X}, v {inet_dist_listen_max, Y}, NodeB {global_groups, [{outside, [NodeC]}]} {dist_auto_connect, once} Notes about NodeC kernel options: - {dist_auto_connect, never} prevents the node from attempting connections inside the firewall. - min/max listen options converge the firewall rules to a minimal reserved setup. - global_groups prevent global from connecting/synching with other nodes inside the firewall after a connection is made to some node inside the firewall. All three nodes subscribe to net_kernel:monitor_nodes/1, and when NodeA disconnects from NodeB (or vice versa) they enable UDP heartbeat and upon detecting that the peer node is responding, restart one of the nodes, which re-synchs mnesia. If network access between NodeA (or NodeB) and NodeC is lost, NodeC shuts down mnesia application and waits for {nodeup, NodeA} event, after which it starts up mnesia. Without using this approach (i.e. not shutting down mnesia on NodeC during network outage) upon healing the network and reconnecting NodeC to NodeA/B, mnesia would detect partitioned network condition and stop replicating data between NodeA and NodeB (which is quite odd because NodeC doesn't have any local tables and it's undesirable to have its visibility impact nodes inside the firewall). The main problem with this setup is that when NodeC looses connection to NodeA/NodeB, either one of these two nodes would need to periodically attempt to reconnect to NodeC. However, because {dist_auto_connect, once} option is used on NodeA/NodeB, net_kernel wouldn't let re-establishing connection to NodeC unless *both* NodeA and NodeB are bounced! The main culprit is the net_kernel's dist_auto_connect option that is an all or none setting that cannot vary depending on connecting attempt to a given node. The attached patch (for R12B-1) solves this issue by introducing an additional kernel option: {dist_auto_connect, {callback, M, F}} This option allows to register a callback function M:F/2 with signature: (Action, Node) -> Mode Action = connect | disconnect Node = node() Mode = once | never | true that will be called when a Node tries to connect (or looses connection). Modes once and never are documented in kernel(3), and 'true' means to continue connection action. This patch allows to define different connecting behavior for connecting a@REDACTED and b@REDACTED from connecting behavior of node a@REDACTED (or b@REDACTED) and c@REDACTED If others find this option as useful as I do, perhaps we can pursue the OTP team to merge this patch with the distribution. Regards, Serge. P.S. Here's a sample implementation of this custom function: nodeA&B.config: =============== [ {kernel, [ {dist_auto_connect, {callback, net_kernel_connector, dist_auto_connect}} ]}, {mnesia, [ {extra_db_nodes, [a@REDACTED, b@REDACTED]} ]} ]. nodeC.config: ============= [ {kernel, [ {dist_auto_connect, never}, {global_groups, [{outside, [c@REDACTED]}]}, {inet_dist_listen_min, 8111}, {inet_dist_listen_max, 8119} ]} {mnesia, [ {extra_db_nodes, [a@REDACTED, b@REDACTED]} ]} ]. -module(net_kernel_connector). -export([dist_auto_connect/2]). dist_auto_connect(Action, Node) -> case application:get_env(mnesia, extra_db_nodes) of {ok, Masters} -> IamMaster = lists:member(node(), Masters), NodeIsMaster = lists:member(Node, Masters), case {IamMaster, NodeIsMaster} of {true, true} -> once; {_, _} -> has_access(node(), Node) end; _ -> has_access(node(), Node) end. has_access(From, To) -> has_access2(host(From), host(To)). has_access2('hostC', _) -> never; has_access2(_, _) -> true. host(Node) -> L = atom_to_list(Node), [_, H] = string:tokens(L, "@"), list_to_atom(H). -------------- next part -------------- A non-text attachment was scrubbed... Name: R12-1.net_kernel.patch Type: application/octet-stream Size: 2722 bytes Desc: not available URL: From ulf.wiger@REDACTED Wed Mar 5 09:05:32 2008 From: ulf.wiger@REDACTED (Ulf Wiger (TN/EAB)) Date: Wed, 05 Mar 2008 09:05:32 +0100 Subject: [erlang-patches] Running mnesia across a firewall In-Reply-To: <47CE1A4C.3020707@gmail.com> References: <47CE1A4C.3020707@gmail.com> Message-ID: <47CE544C.7010700@ericsson.com> Serge Aleynikov skrev: > > The main culprit is the net_kernel's dist_auto_connect option that is an > all or none setting that cannot vary depending on connecting attempt to > a given node. The attached patch (for R12B-1) solves this issue by > introducing an additional kernel option: > > {dist_auto_connect, {callback, M, F}} Have you thought about solving it with an application that periodically tries calling net_kernel:connect(Node)? BR, Ulf W From saleyn@REDACTED Wed Mar 5 13:09:21 2008 From: saleyn@REDACTED (Serge Aleynikov) Date: Wed, 05 Mar 2008 07:09:21 -0500 Subject: [erlang-patches] Running mnesia across a firewall In-Reply-To: <47CE544C.7010700@ericsson.com> References: <47CE1A4C.3020707@gmail.com> <47CE544C.7010700@ericsson.com> Message-ID: <47CE8D71.3090704@gmail.com> Ulf Wiger (TN/EAB) wrote: > Serge Aleynikov skrev: >> >> The main culprit is the net_kernel's dist_auto_connect option that is >> an all or none setting that cannot vary depending on connecting >> attempt to a given node. The attached patch (for R12B-1) solves this >> issue by introducing an additional kernel option: >> >> {dist_auto_connect, {callback, M, F}} > > Have you thought about solving it with an application that > periodically tries calling net_kernel:connect(Node)? On which node, though? If this is one of the "master" nodes A holding a disk copy of a table X, then the node must have {dist_auto_connect, once} set. If the firewall prohibits inbound access to this node A from some other node C that uses remote mnesia interface to access table X, then the only way to establish connection to node C is to do on node A net_kernel:connect(C). However if that connection drops, there's no way to reestablish that connection without restarting node A. Remember that in case of {dist_auto_connect, once} net_kernel checks if a connection is barred and if it is it won't allow to connect to a node that previously was connected. Serge From ulf.wiger@REDACTED Wed Mar 5 13:26:16 2008 From: ulf.wiger@REDACTED (Ulf Wiger (TN/EAB)) Date: Wed, 05 Mar 2008 13:26:16 +0100 Subject: [erlang-patches] Running mnesia across a firewall In-Reply-To: <47CE8D71.3090704@gmail.com> References: <47CE1A4C.3020707@gmail.com> <47CE544C.7010700@ericsson.com> <47CE8D71.3090704@gmail.com> Message-ID: <47CE9168.5090504@ericsson.com> Serge Aleynikov skrev: > Ulf Wiger (TN/EAB) wrote: >> Serge Aleynikov skrev: >>> >>> The main culprit is the net_kernel's dist_auto_connect option that is >>> an all or none setting that cannot vary depending on connecting >>> attempt to a given node. The attached patch (for R12B-1) solves this >>> issue by introducing an additional kernel option: >>> >>> {dist_auto_connect, {callback, M, F}} >> >> Have you thought about solving it with an application that >> periodically tries calling net_kernel:connect(Node)? > > On which node, though? If this is one of the "master" nodes A holding a > disk copy of a table X, then the node must have {dist_auto_connect, > once} set. If the firewall prohibits inbound access to this node A from > some other node C that uses remote mnesia interface to access table X, > then the only way to establish connection to node C is to do on node A > net_kernel:connect(C). However if that connection drops, there's no way > to reestablish that connection without restarting node A. Remember that > in case of {dist_auto_connect, once} net_kernel checks if a connection > is barred and if it is it won't allow to connect to a node that > previously was connected. Correction 1: It's net_kernel:connect_node(Node). My bad. Correction 2: net_kernel:connect_node/1 ignores the value of dist_auto_connect What we've done is to keep a "maintenance channel" (not distr Erlang), over which we can negotiate which node should restart. BR, Ulf W From saleyn@REDACTED Thu Mar 6 13:44:02 2008 From: saleyn@REDACTED (Serge Aleynikov) Date: Thu, 06 Mar 2008 07:44:02 -0500 Subject: [erlang-patches] Running mnesia across a firewall In-Reply-To: <47CE9168.5090504@ericsson.com> References: <47CE1A4C.3020707@gmail.com> <47CE544C.7010700@ericsson.com> <47CE8D71.3090704@gmail.com> <47CE9168.5090504@ericsson.com> Message-ID: <47CFE712.20706@gmail.com> Ulf Wiger (TN/EAB) wrote: > Serge Aleynikov skrev: >> On which node, though? If this is one of the "master" nodes A holding >> a disk copy of a table X, then the node must have {dist_auto_connect, >> once} set. If the firewall prohibits inbound access to this node A >> from some other node C that uses remote mnesia interface to access >> table X, then the only way to establish connection to node C is to do >> on node A net_kernel:connect(C). However if that connection drops, >> there's no way to reestablish that connection without restarting node >> A. Remember that in case of {dist_auto_connect, once} net_kernel >> checks if a connection is barred and if it is it won't allow to >> connect to a node that previously was connected. > > Correction 1: It's net_kernel:connect_node(Node). My bad. > > Correction 2: net_kernel:connect_node/1 ignores the value of > dist_auto_connect > > What we've done is to keep a "maintenance channel" (not distr Erlang), > over which we can negotiate which node should restart. It turned out that was making the same mistake by using net_kernel:connect(Node) rather than net_kernel:connect_node(Node). Quite easy to get confused as two functions have the same signature. :-( Thanks for pointing this out! So for making mnesia work across a firewall a combination of kernel options including global_groups as well as user-level pinging/starting/stopping remote mnesia is sufficient. Serge From Raymond.Xiong@REDACTED Mon Mar 17 07:21:10 2008 From: Raymond.Xiong@REDACTED (Raymond Xiong) Date: Mon, 17 Mar 2008 14:21:10 +0800 Subject: [erlang-patches] crypto library makefile patch Message-ID: <20080317062109.GA7378@Sun.Com> When I configured erlang in the following way on Solaris: $ ./configure --with-ssl=/usr/sfw/lib --enable-dynamic-ssl-lib crypto_drv.so dones't include full pathname of libcrypto: $ ldd ./lib/crypto/priv/lib/sparc-sun-solaris2.11/crypto_drv.so libcrypto.so.0.9.8 => (file not found) ... lib/crypto/c_src/Makefile.in does try to support this, but fails due to a bug. The patch below fixes it. Thanks, rayx --- otp_src_R12B-1/lib/crypto/c_src/Makefile.in.orig Sat Mar 15 22:03:03 2008 +++ otp_src_R12B-1/lib/crypto/c_src/Makefile.in Sat Mar 15 22:04:28 2008 @@ -79,6 +79,7 @@ ifeq ($(HOST_OS),) HOST_OS := $(shell $(ERL_TOP)/erts/autoconf/config.guess) endif +DYNAMIC_CRYPTO_LIB=@SSL_DYNAMIC_ONLY@ LD_R_FLAG=@DED_LD_FLAG_RUNTIME_LIBRARY_PATH@ ifeq ($(strip $(LD_R_FLAG)),) LD_R_OPT = @@ -89,7 +90,6 @@ LD_R_OPT = endif endif -DYNAMIC_CRYPTO_LIB=@SSL_DYNAMIC_ONLY@ ifeq ($(DYNAMIC_CRYPTO_LIB),yes) CRYPTO_LINK_LIB=-L$(SSL_LIBDIR) -lcrypto