inet_drv.c and send blocking

Alexey Shchepin <>
Thu Apr 28 02:17:15 CEST 2005


Hi!

It seems there is a bug in inet_drv.c which results in blocking of inet:send.

How to reproduce it:

1. Compile the following program (based on test case for inet_driver leaking
file descriptors by Luke Gorrie):

-module(inetblock).
-compile(export_all).

go(Port) ->
    spawn_link(fun() ->
		       {ok, L} = gen_tcp:listen(Port, [{active, false},
						       binary,
						       {reuseaddr, true},
						       {packet, 0}]),
		       accept_loop(L)
	       end).

accept_loop(L) ->
    case gen_tcp:accept(L) of
	{ok, S} ->
	    spawn(fun() -> worker(S) end),
	    flood(S);
	Err ->
	    exit({accept, Err})
    end.

worker(S) ->
    io:format("~p trying a read with timeout..~n", [self()]),
    case gen_tcp:recv(S, 0, infinity) of
	{ok, Data} ->
	    io:format("~p got ~p~n", [self(), Data]),
	    worker(S);
	{error, Rsn} ->
	    io:format("~p error: ~p~n", [self(), Rsn])
    end.

flood(S) ->
    io:format("~p trying to send...~n", [self()]),
    case gen_tcp:send(S, "abcdefghijklmnopqrstuvwxyz~n") of
	ok ->
	    io:format("~p got ok~n", [self()]),
	    flood(S);
	{error, Rsn} ->
	    io:format("~p error: ~p~n", [self(), Rsn])
    end.

2. Run inetblock:go(1234) in erlang shell.

3. Run telnet localhost 1234 in unix shell.

Now you should see a lot of following messages in erlang shell (I've recompiled
beam with "#define INET_DRV_DEBUG 1" in inet_drv.c and added some more debug
messages, so likely you will see only part of them):

<0.37.0> trying to send...
tcp_sendv(102): s=10, about to send 0,27 bytes
<0.37.0> got ok

and lines with english alphabet in telnet.

4. Press Control-] in telnet.

Now you should see "telnet>" prompt and 

<0.37.0> trying to send...
tcp_sendv(132): s=10, about to send 0,27 bytes
tcp_sendv(132): s=10, Send failed, queuing
sock_select(132): flags=02, onoff=1, event_mask=03
<0.37.0> got ok
<0.37.0> trying to send...
<0.37.0> got ok
<0.37.0> trying to send...
<0.37.0> got ok
<0.37.0> trying to send...
<0.37.0> got ok
...
<0.37.0> trying to send...
tcp_sendv(132): s=10, queue exceeded: need 8196 bytes
<0.37.0> got ok
<0.37.0> trying to send...

in erlang shell.

5. Enter "q" in telnet prompt and press enter.  Output in erlang shell:

tcp_inet_input(132): entering
tcp_recv(132): request_len=0
tcp_remain(132): s=10, n=0, nfill=0 nsz=1024
 => more=1024 
tcp_recv(132): s=10 about to read 1024 bytes...
  => detected close
free_buffer: 1024
release_buffer: 1024
sock_select(132): flags=01, onoff=0, event_mask=02
deq(132): 1 627 32
deq(132): queue empty
<0.39.0> error: closed

That's all.  Now receiving process got its "closed" message, but sending
process didn't, and is blocked inside gen_tcp:send call.


I've tried to dig into inet_drv.c and now I think that the problem is inside
tcp_recv_closed function:

/* The socket has closed, cleanup and send event */
static int tcp_recv_closed(tcp_descriptor* desc)
{
    if (!desc->inet.active) {
	/* We must cancel any timer here ! */
	driver_cancel_timer(desc->inet.port);
	/* passive mode do not terminate port ! */
	tcp_clear_input(desc);
	if (desc->inet.exitf) {
/*1*/	    desc_close(INETP(desc));
	} else {
	    desc_close_read(INETP(desc));
	}
/*2*/	async_error_am_all(INETP(desc), am_closed);
	/* next time EXBADSEQ will be delivered  */
    }
    else {...}
    return -1;
}

After step 4 there was a call of

sock_select(INETP(desc),(FD_WRITE|FD_CLOSE), 1);

from tcp_sendv.  But after step 5 the function desc_close at /*1*/ removes
socket from FD_READ, FD_WRITE, and FD_CLOSE sets, thus forgeting that socket is
waiting for writing possibility.  If desc->inet.exitf = 0, then desc_close_read
is executed instead, and all works fine -- after output in step 5 you will see
this:

tcp_inet_output(132): s=10, About to send 16 items
tcp_inet_output(132): sock_sendv(16) errno = 32
driver_failure_eof(132) in drivers/common/inet_drv.c, line 5854
deq(132): queue empty
sock_select(132): flags=03, onoff=0, event_mask=00
<0.37.0> error: einval
sock_select(131): flags=03, onoff=0, event_mask=00

So, probably there should be some code to clean sending stuf at /*1*/, or exitf
should be 0 (but maybe it will break some behaviour in another place), or
tcp_send(v) should call enq_async, so async_error_am_all at /*2*/ will send
"{error, closed}" message to sending process...

Hope that helps :)



More information about the erlang-questions mailing list