Request for next net_kernel release: new switch "net_setuptime_millis"
Patrik Nyblom
pan@REDACTED
Fri Jan 9 16:03:14 CET 2004
Hi,
Well, the reason for the 7s delay is simply that a suspended erlang node
is still in some sense "alive". The listen socket is still listening and
it's still registered with it's local epmd. On the other hand, if the
node is completely halted, epmd has it unregistered and connection
attempts fail instantly. What happens is for a suspended node (suspended
node named b@REDACTED, connected from a@REDACTED):
a@REDACTED issues a request for b@REDACTED to epmd on host2. The request gets
answered immediately, a portnumber is the answer.
a@REDACTED connects to the given port (which succeeds as TCP/IP and the
socket interface works that way).
a@REDACTED sends the initial handshake string
a@REDACTED waits the stipulated timeout for the answer.
-> pang after 7 s
If the network cable is cut to host2, we would instead get:
a@REDACTED tries to connect to epmd on host2. The connection attempt fails
after the stipulated timeout.
-> pang after 7 s
On the other hand, if the node has never been alive:
a@REDACTED issues a request to epmd which immediately answers that there is
no such host.
-> pang after a few millis.
So, it's not that the distribution in node A keeps any data, it's simply
that we have a timeout situation when a node is suspended, regardless if
we have known the node before or not.
TCP/IP is unfortunately not a protocol made for realtime applications...
That's what all this long timeout hazzle comes from. A better network
protocol would make things much funnier for Erlang, no ticking, no
timeouting etc... A reliable protocol that monitors the links... a
protocol designed not for web browsing but for telecom... Sigh...
Timeouts shorter than 1 s could be useful though. I agree. I'll add the
possibility to define fractions of seconds to the configuration.
Cheers,
/Patrik
Reto Kramer wrote:
> I found where the 7s ping/net_connect delay I was seeing if I was
> pinging (or multi_call'ing) a node who's connection had timeout comes
> from (see net_kernel excerpt below).
>
> I'm glad it's configurable (thanks for the foresight, whoever did
> it!), however I need it smaller than 1s, which is the current minimum
> I can set.
>
> Is there any chance that in the next release of the net_kernel we
> could add a new (equally undocumented, i.e. unsupported) switch:
> net_setuptime_millis? I suggest the net_setuptime switch dominates if
> present to maintain backwards compatibly.
>
> Funny enough, when the node in question (say c) never started in the
> first place, then a multi_call to [a,b,c] does not suffer from the
> connection setup delay, it's only because a connection (that timed
> out) had been setup initially that I see the kernel setuptime
> timeout/delay.
>
> Perhaps someone could educate me here - it seems that perhaps not all
> the data structures associated with the timed out connection are
> cleaned up (i.e. behavior is different from when the connection has
> never been created in the first place).
>
> Thanks,
> - Reto
>
> PS: in the meantime, I'll resort to removing the failed node from the
> nodeset I multicall to, and then re-add it using a multicast based
> discovery protocol (the latter I have anyway). It would be nice if my
> app could be naive about all that and just blindly multi_call to a
> node set.
>
> --------- net_kernel: ---------
> [...]
> %% Default connection setup timeout in milliseconds.
> %% This timeout is set for every distributed action during
> %% the connection setup.
> -define(SETUPTIME, 7000).
> [...]
> connecttime() ->
> case application:get_env(kernel, net_setuptime) of
> {ok, Time} when integer(Time), Time > 0, Time < 120 ->
> Time * 1000;
> _ ->
> ?SETUPTIME
> end.
> [...]
> --------- net_kernel: ---------
>
> ______________________
> An engineer can do for a dime what any fool can do for a dollar.
> -- unknown
More information about the erlang-questions
mailing list