[erlang-questions] strange node communication startup delay
Anthony Shipman
als@REDACTED
Tue Mar 4 15:11:51 CET 2008
I have several nodes running on the same computer, so inter-node
communication is normally very fast. Shortly after starting the first
node starts a third node which has a gen_server that immediately registers
with a gen_server in the first node. There is a long round-trip delay
of up to 3 seconds at this point. (The second node is idle at
this time).
The third node performs these steps at startup:
case net_kernel:connect_node(Node) of
true ->
log:format("muConnector: connected to ~p", [Node]),
global:sync(),
monitor_node(Node, true),
onUp(State);
...
log:format("muConnector: MU is up"),
...
log:format("fleetMgrImpl: MU connected"),
regIface:rtt(),
regIface:register(fleet, ID, self()),
log:format("fleetMgrImpl: Fleet process registered"),
regIface:rtt(),
which results in these log messages on the third node:
5-Mar-2008 00:35:14.517| muConnector: connected to 'SMU_mySite@REDACTED'
5-Mar-2008 00:35:14.723| muConnector: MU is up
5-Mar-2008 00:35:14.723| fleetMgrImpl: MU connected
5-Mar-2008 00:35:16.174| regIface:rtt delay 1449075 us
5-Mar-2008 00:35:16.215| fleetMgrImpl: Fleet process registered
5-Mar-2008 00:35:16.215| regIface:rtt delay 842 us
The first call to rtt has measured a 1.45s delay for a simple echo.
I have process tracing on the first node with options: send, 'receive',
procs, timestamp, running, set_on_spawn. The trace on the first node
shows lots of global name registration activity around 00:35:14.721
to 00:35:14.722:
5-Mar-2008 00:35:14.721| [<0.11.0>,
'receive',
{'$gen_call',{<8232.12.0>,{#Ref<8232.0.0.182>,'SMU_mySite@REDACTED'}},
{set_lock,{global,[<0.12.0>,<8232.12.0>]}}}]
................
5-Mar-2008 00:35:14.722| [<0.11.0>,
send,
{'$gen_cast',
{exchange,
'SMU_mySite@REDACTED',
[{'mySite/registry',<0.84.0>,{global,random_exit_name}},
{'mySite/sigMgr',<0.108.0>,{global,random_exit_name}},
{'mySite/evtMgr',<0.90.0>,{global,random_exit_name}},
{'mySite\001muMgr',<0.106.0>,{global,random_exit_name}}],
[],
{1204,637714,517547}}},
{global_name_server,fleet_1_mySite@REDACTED}]
................
5-Mar-2008 00:35:14.722| [<0.11.0>,
send,
{'$gen_cast',{resolved,'SMU_mySite@REDACTED',
[],
['SPU1_mySite@REDACTED'],
['SPU1_mySite@REDACTED'],
[],
{1204,637714,517547}}},
{global_name_server,fleet_1_mySite@REDACTED}]
5-Mar-2008 00:35:14.722| [<0.11.0>,
send,
{cancel,fleet_1_mySite@REDACTED,
{1204,637714,517912},#Fun<global.7.70730269>},
<0.12.0>]
................
5-Mar-2008 00:35:14.723| [<0.12.0>,
send,
{'$gen_call',{<0.12.0>,{#Ref<0.0.0.388>,'SPU1_mySite@REDACTED'}},
{del_lock,{global,[<0.12.0>,<8232.12.0>]}}},
{global_name_server,'SPU1_mySite@REDACTED'}]
................
5-Mar-2008 00:35:14.761| [<0.11.0>,
'receive',
{'$gen_call',{<8232.12.0>,{#Ref<8232.0.0.192>,'SMU_mySite@REDACTED'}},
{del_lock,{global,[<0.12.0>,<8232.12.0>]}}}]
5-Mar-2008 00:35:14.761| [<0.12.0>,out,{global,loop_the_locker,1}]
Next the trace shows the arrival of the echo message with its
time-stamp. The message left the third node at 00:35:14.725 but it
did not arrive at the first node until 00:35:16.174. The reply was
essentially immediate.
5-Mar-2008 00:35:16.174| [<0.84.0>,
'receive',
{'$gen_call',{<8232.87.0>,#Ref<8232.0.0.204>},
{regEcho,{1204,637714,725844}}}]
5-Mar-2008 00:35:16.174| [<0.84.0>,in,{gen_server,loop,6}]
5-Mar-2008 00:35:16.174| [<0.84.0>,send,{#Ref<8232.0.0.204>,{ok,
{1204,637714,725844}}},<8232.87.0>]
According to the process trace nothing else happened on the first node
between 00:35:14.761 and 00:35:16.174. So why the 1.4 second delay
in communication?
Can I get traces of the inter-node communication?
More information about the erlang-questions
mailing list