[erlang-questions] R13B stop_select error

Marc Sugiyama marcsugiyama@REDACTED
Fri May 8 23:59:21 CEST 2009


We've also encountered some problems with R13B wrt tcp connection
handling (Erlang R13B (erts-5.7.1) [source] [64-bit] [smp:8:8] [rq:8]
[async-threads:0] [hipe] [kernel-poll:true]).  We're using a mochiweb
server to listen on three ports, two of which are actively used.  One
port is http, the other two are non-http protocols.  Our mochiweb http
server is modified to use active mode.

In one test run, the server stopped processing new connections.  As
far as I could tell, the VM was accepting the connection but not
passing it on to the Erlang code (that is, telnet to the port showed
connection established, but the acceptors were stuck in
prim_inet:accept0/2).

I haven't been successful in reproducing the problem, but I did get
these messages during one test run.  The server continued to function
correctly (as far as I can tell):

=ERROR REPORT==== 8-May-2009::12:58:56 ===
driver_select(0x000000000000ca43, 1289, ERL_DRV_READ ERL_DRV_USE, 1)
by tcp_inet driver #Port<0.313923> failed: fd=1289 (re)selected before
stop_select was called for driver tcp_inet


=ERROR REPORT==== 8-May-2009::12:58:56 ===
driver_select(0x000000000000ca46, 1294, ERL_DRV_READ ERL_DRV_USE, 1)
by tcp_inet driver #Port<0.313926> failed: fd=1294 (re)selected before
stop_select was called for driver tcp_inet


=ERROR REPORT==== 8-May-2009::12:58:56 ===
driver_select(0x000000000000ca48, 1296, ERL_DRV_READ ERL_DRV_USE, 1)
by tcp_inet driver #Port<0.313928> failed: fd=1296 (re)selected before
stop_select was called for driver tcp_inet


=ERROR REPORT==== 8-May-2009::12:58:56 ===
driver_select(0x000000000000ca6d, 3450, ERL_DRV_READ ERL_DRV_USE, 1)
by tcp_inet driver #Port<0.313965> failed: fd=3450 (re)selected before
stop_select was called for driver tcp_inet


=ERROR REPORT==== 8-May-2009::12:58:56 ===
driver_select(0x000000000000ca6f, 3452, ERL_DRV_READ ERL_DRV_USE, 1)
by tcp_inet driver #Port<0.313967> failed: fd=3452 (re)selected before
stop_select was called for driver tcp_inet


=ERROR REPORT==== 8-May-2009::12:58:56 ===
driver_select(0x000000000000ca70, 3453, ERL_DRV_READ ERL_DRV_USE, 1)
by tcp_inet driver #Port<0.313968> failed: fd=3453 (re)selected before
stop_select was called for driver tcp_inet


=ERROR REPORT==== 8-May-2009::12:58:56 ===
driver_select(0x000000000000ca73, 3454, ERL_DRV_READ ERL_DRV_USE, 1)
by tcp_inet driver #Port<0.313971> failed: fd=3454 (re)selected before
stop_select was called for driver tcp_inet


On Tue, May 5, 2009 at 3:15 AM, Sverker Eriksson
<sverker@REDACTED> wrote:
> Eugene Letuchy wrote:
>> Hey folks,
>>
>> I just discovered the following error in the logs of a production
>> system (running -smp +K true):
>>
>> [May  4 12:42:30 2009] [error] driver_select(0x0000000000099dd0, 158,
>> ERL_DRV_READ ERL_DRV_WRITE ERL_DRV_USE, 0) by tcp_inet driver
>> #Port<0.119119312> failed: fd=158 (re)selected before stop_select was
>> called for driver udp_inet
>>
>> The process that generated this error is a tcp thrift server
>> (http://svn.apache.org/viewvc/incubator/thrift/trunk/lib/erl/src/thrift_socket_server.erl?view=markup),
>> and the same code R12B-5 code did not generate these errors. After
>> this error, the thrift server process is no longer responsive (doesn't
>> do accept()s) ... can anyone on the core erl team give me a clue as to
>> what could cause this behavior, or what other info might be helpful?
>>
>> Thanks
>>
> Interesting, never seen this before.
>
> This error has to do with the safe closing of file descriptors for
> drivers that was introduced in R13A. It can not happen in earlier
> versions.
>
> The emulator is complaining about misbehaving driver(s). In
> this case tcp_inet and udp_inet both implemented in
> erts/emulator/drivers/common/inet_drv.c. That is the layer between the
> Erlang port interface and the C socket interface for TCP, UDP and SCTP
> communication.
>
> This error log says that the udp_inet driver has told the emulator that a
> socket should be closed.
> But the asynchronous driver callback (stop_select, new in R13A) to
> actually close the socket has not been called yet. THEN, the tcp_inet
> driver wants to close the same socket descriptor (158)???
>
>
> Looks like a R13-bug in inet_drv.c or maybe in the code that prints
> this error (erts/emulator/sys/common/erl_check_io.c).
>
> The most helpfull would be an easy way to reproduce this...
>
>
> /Sverker, Erlang/OTP Ericsson
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions
>



More information about the erlang-questions mailing list