[erlang-questions] Building a Non-blocking TCP server using OTP principles
Serge Aleynikov
saleyn@REDACTED
Fri Aug 17 15:04:29 CEST 2007
Samuel,
Thanks for your input. I reviewed your modifications and have to say
that there are several problems with this approach.
1. Not doing asynchronous accept and relying on a separate process to
accept connections may be *dangerous* if not handled properly as it
introduces race conditions that could potentially block the server
permanently.
Here's an important quote from ACE book "C++ Network Programming Vol.1":
"When an acceptor socket is passed to select(), it's marked as "active"
when a connection is received. Many servers use this event to indicate
that it's OK to call accept() without blocking. Unfortunately, there's a
race condition that stems from the asynchronous behavior of TCP/IP In
particular, after select() indicates an acceptor socket is active (but
before accept() is called) a client can close its connection, whereupon
accept() can block and potentially hang the entire application process.
To avoid this problem, acceptor sockets should always be set into
non-blocking mode when used with select()."
This applies to your changes indirectly. Under the hood of the network
driver, it still does the asynchronous accept, so the paragraph above
doesn't apply at the driver level. However, there may be a failure
between these two lines in the init/1:
{ok, Ref} = create_acceptor(Listen_socket),
{ok, #state{listener = Listen_socket,
acceptor = Ref,
module = Module}};
due to various reasons and despite the fact that it was linked, the
{'EXIT', Pid, Reason} message is presently not handled (trap_exit though
is turned on), so the process will be locked forever.
The same can happen if the acceptor process dies anywhere in the middle
of the F() function:
F = fun() ->
{ok, Socket} = gen_tcp:accept(Listener),
gen_tcp:controlling_process(Socket, Self),
gen_server:call(Self, {accept, Socket})
end,
As mentioned above, this can likely be fixed by proper handling of the
{'EXIT', Pid, Reason} and respawning acceptor when it happens. This,
however presents another challenge - what if the system runs out of file
descriptors - your listener process will be in an unhappy more of
constantly respawning acceptors that will die because of this line:
{ok, Socket} = gen_tcp:accept(Listener)
So you would need to monitor how many accept failures you got in the
last several seconds and do some intelligent recovery. This would
complicate code by quite a bit.
2. This new process is not OTP compliant - no supervisors know about it
and it doesn't process debug and system messages as per "6.2 Special
Processes" of Design Principles. This means that you may have problems
when you upgrade your system dynamically.
Partly these are some of the reasons I put together this tutorial to
show how to avoid such problems all together. :-)
I hope you will find this feedback useful.
Regards,
Serge
Samuel Tesla wrote:
> Serge,
>
> I really got a lot from your guide on building TCP servers. I really
> appreciate the work you put into it. I think I've got an improvement that
> you may want to consider putting up on the website.
>
> I wanted to read documentation for prim_inet:async_accept/2 so I could
> figure out what that -1 was for, and couldn't find any documentation. So, I
> Googled and discovered that there is no documentation on purpose (
> http://www.trapexit.org/forum/viewtopic.php?p=29157). Basically, it's not a
> guaranteed API between versions, whereas gen_tcp is. So, I set out to see
> if I could use gen_tcp:accept/1 instead of prim_inet:async_accept/2, and I
> was successful.
>
> I copied your source off the website and then made modifications. I only
> had to change the listener and the FSM modules, and I've attached the
> altered source files. The gist of what I did was spawn a linked process
> which does the accept, and then sends a call back to the main listener
> process. The whole sequence until the control has to be synchronous until
> the FSM gets into WAIT_FOR_DATA or the socket will disconnect and you'll
> start getting posix errors.
>
> There were a few other things I cleaned up or changed:
> * You don't need to copy socket options, as accept/1 does that.
> * You don't need to call gen_tcp:close/1 in terminate/2 as the listening
> socket will close when its controlling process exits.
> * I set {packet, 0} as I was testing with a raw telnet session.
>
> I hope you find this helpful!
>
> -- Samuel
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tcp_listener.erl
Type: application/octet-stream
Size: 5729 bytes
Desc: not available
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20070817/f284b0fb/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tcp_echo_fsm.erl
Type: application/octet-stream
Size: 6086 bytes
Desc: not available
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20070817/f284b0fb/attachment-0001.obj>
More information about the erlang-questions
mailing list