[erlang-questions] SSL distribution issues

Paul Guyot <>
Sat Jan 14 11:04:04 CET 2012


Is anyone successfully using SSL distribution on production servers?

While running a couple of nodes works properly on a development machine, we have serious issues on a real production cluster.
Our nodes ping other nodes very early, before our applications are started.

We observed two serious issues:
- pinging another node randomly blocks indefinitely, whether the other node is pingable or not (e.g. not over SSL or with a different cookie) ;
- after a while (after pings timeout), ssl_tls_dist_proxy just crashes.

=ERROR REPORT==== 2012-01-13 16:48:58 ===
** Generic server ssl_tls_dist_proxy terminating 
** Last message in was {connect,IP,25669}				<-- this is another SSL node with the same cookie
** When Server state == {state,{#Port<0.284>,#Port<0.285>},
** Reason for termination == 
** {{badmatch,{error,badarg}},

The relevant code is the following:

handle_call({connect, Ip, Port}, {From, _}, State) ->
    Me = self(),
    Pid = spawn_link(fun() -> setup_proxy(Ip, Port, Me) end),
	{Pid, go_ahead, LPort} -> 
	    Res = {ok, Socket} = try_connect(LPort),
	    ok = gen_tcp:controlling_process(Socket, From),		<---- line 90
	    flush_old_controller(From, Socket),
	    {reply, Res, State};
	{Pid, Error} ->
	    {reply, Error, State}

The crash happens because From is no longer alive.

For the record, this is master branch and the SSL parameters are the following :

	-proto_dist inet_tls
		server_certfile /otp_root/ssl/${NODE_NAME}.pem
		client_certfile /otp_root/ssl/${NODE_NAME}.pem
		server_secure_renegotiate true
		client_secure_renegotiate true
		server_verify verify_peer
		client_verify verify_peer
		server_fail_if_no_peer_cert true
		server_cacertfile /otp_root/ssl/ca.pem
		client_cacertfile /otp_root/ssl/ca.pem
		server_depth 2
		client_depth 2

Did we miss something obvious?

Semiocast            http://semiocast.com/
+33.183627948 - 20 rue Lacaze, 75014 Paris

More information about the erlang-questions mailing list