enotsock again

Marthin Laubscher marthin@REDACTED
Fri Apr 21 17:30:14 CEST 2006


Hey all,

 

I'm being snookered by the same distributed system startup/socket error
problem that seems to have done the rounds before.

 

(http://www.erlang.org/ml-archive/erlang-questions/200507/msg00081.html,

http://www.erlang.org/ml-archive/erlang-questions/200411/msg00120.html seems
evidence of two previous times).

 

Further clues I can offer those in the know are:

I have several machines running XP SP2 with all the latest updates, only one
of which has the problem.

The (notebook) with the problem is has multiple interfaces (dialup shared on
LAN).

In response to previous comments, yes, the network interface(s) works just
fine (several applications).

Testing indicates that it is neither the windows firewall nor Symantec's
that is causing the problem.

The timing of the occurrences of the problem does not seem to support the
notion that it's some M$ security update that breaks the socket library
(unless they keep making the same booboo and fixing it every couple of
months). Even more unlikely with only one bum machine out of several running
at exactly the same level of updates.

One current hypothesis is that the disturbance is caused by choosing
incorrectly amongst multiple interfaces. An attempt to
gen_tcp:listen(...,[.,{ip,{LAN ip address}}]) did however not yield any
different result - still {error,enotsock}.

However, somebody might be able to make something of the following:

Starting (w)erl -sname abc yields the unformatted trace mentioned in the
previous posts. Those can be reduced to the following error

 

Erlang (BEAM) emulator version 5.4.13 [threads:0]

 

Eshell V5.4.13  (abort with ^G)

1> net_kernel:start([aname]).

 

=INFO REPORT==== 21-Apr-2006::16:32:57 ===

Protocol: "inet_tcp": register/listen error: enotsock

{error,{shutdown,{child,undefined,

                        net_sup_dynamic,

                        {erl_distribution,start_link,[[aname]]},

                        permanent,

                        1000,

                        supervisor,

                        [erl_distribution]}}}

2>

 

A further clue: R10B-8 (which I'm still using for my current project) does
not even start in non-distributed mode when the dialup connection is
disconnected. The error story follows:

{error_logger,{{2006,4,21},{17,9,3}},supervisor_report,[{supervisor,{local,k
ernel_sup}},{errorContext,start_error},{reason,{'DOWN',#Ref<0.0.0.14>,proces
s,<0.16.0>,normal}},{offender,[{pid,undefined},{name,code_server},{mfa,{code
,start_link,[]}},{restart_type,permanent},{shutdown,2000},{child_type,worker
}]}]}

{error_logger,{{2006,4,21},{17,9,3}},crash_report,[[{pid,<0.7.0>},{registere
d_name,[]},{error_info,{shutdown,{kernel,start,[normal,[]]}}},{initial_call,
{application_master,init,[<0.5.0>,<0.6.0>,{appl_data,kernel,[application_con
troller,erl_reply,auth,boot_server,code_server,disk_log_server,disk_log_sup,
erl_prim_loader,error_logger,file_server,file_server_2,fixtable_server,globa
l_group,global_name_server,heart,init,kernel_config,kernel_sup,net_kernel,ne
t_sup,rex,user,os_server,ddll_server,erl_epmd,inet_db,pg2],undefined,{kernel
,[]},[application,application_controller,application_master,application_star
ter,auth,code,code_aux,packages,code_server,dist_util,erl_boot_server,erl_di
stribution,erl_prim_loader,erl_reply,erlang,error_handler,error_logger,file,
file_server,old_file_server,file_io_server,prim_file,global,global_group,glo
bal_search,group,heart,inet6_tcp,inet6_tcp_dist,inet6_udp,inet_config,inet_h
osts,inet_gethost_native,inet_tcp_dist,init,kernel,kernel_config,net,net_adm
,net_kernel,os,ram_file,rpc,user,user_drv,user_sup,disk_log,disk_log_1,disk_
log_server,disk_log_sup,dist_ac,erl_ddll,erl_epmd,erts_debug,gen_tcp,gen_udp
,prim_inet,inet,inet_db,inet_dns,inet_parse,inet_res,inet_tcp,inet_udp,pg2,s
eq_trace,wrap_log_reader,zlib,otp_ring0],[],infinity,infinity},normal]}},{an
cestors,[<0.6.0>]},{messages,[{'EXIT',<0.8.0>,normal}]},{links,[<0.6.0>,<0.5
.0>]},{dictionary,[]},{trap_exit,true},{status,running},{heap_size,987},{sta
ck_size,21},{reductions,1062}],[]]}

{error_logger,{{2006,4,21},{17,9,3}},std_info,[{application,kernel},{exited,
{shutdown,{kernel,start,[normal,[]]}}},{type,permanent}]}

{"Kernel pid terminated",application_controller,shutdown}

 

The 'DOWN' reason on line 1 suggests to some extent that it is trying to
access an interface that is down - i.e. the dialup connection. I have run
the same version o a machine with only a dialup connection (GPRS) and it
worked fine.

 

A final "clue" is that on the troubled machine, in distributed mode or not,
successful or not, R10B-8 or R10B-10, there seems to be a much longer pause
while (w)erl is starting up during which there is no impact on CPU load - it
may be some os-level timeout that's involved.

 

This problem has hit me, as always, at the most inopportune time - on
"holiday" 1000 miles away from my normal and stable development environment,
with urgent modification to make and test. Any advice or help (short of
reinstalling XP which isn't an option at the moment) will be greatly
appreciated.

 

Thanks,

 

Marthin

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20060421/c1f8c0f2/attachment.htm>


More information about the erlang-questions mailing list