[erlang-questions] Building a Non-blocking TCP server using OTP principles

Serge Aleynikov saleyn@REDACTED
Sat Aug 18 04:27:20 CEST 2007

I can only comment that it would be really useful if the OTP team 
exposed the async_accept function to the gen_tcp module.


Samuel Tesla wrote:
> I suppose the question really becomes, then, how can you write a TCP server
> using the documented APIs of Erlang/OTP that you can reasonably expect to
> remain the same between releases?
> The APIs in prim_inet aren't intended to be used.  They can change between
> releases and that will cause problems with the original code you posted as
> well.  Whereas a solution that uses the functions exported from gen_tcp can
> be expected to survive upgrades to the OTP environment.
> I'm currently looking at ejabberd to see how they do things, as they manage
> a jabber server with only gen_tcp.
> -- Samuel
> On 8/17/07, Serge Aleynikov <saleyn@REDACTED> wrote:
>> Samuel,
>> Thanks for your input.  I reviewed your modifications and have to say
>> that there are several problems with this approach.
>> 1. Not doing asynchronous accept and relying on a separate process to
>> accept connections may be *dangerous* if not handled properly as it
>> introduces race conditions that could potentially block the server
>> permanently.
>> Here's an important quote from ACE book "C++ Network Programming Vol.1":
>> "When an acceptor socket is passed to select(), it's marked as "active"
>> when a connection is received. Many servers use this event to indicate
>> that it's OK to call accept() without blocking. Unfortunately, there's a
>> race condition that stems from the asynchronous behavior of TCP/IP In
>> particular, after select() indicates an acceptor socket is active (but
>> before accept() is called) a client can close its connection, whereupon
>> accept() can block and potentially hang the entire application process.
>> To avoid this problem, acceptor sockets should always be set into
>> non-blocking mode when used with select()."
>> This applies to your changes indirectly.  Under the hood of the network
>> driver, it still does the asynchronous accept, so the paragraph above
>> doesn't apply at the driver level.  However, there may be a failure
>> between these two lines in the init/1:
>>          {ok, Ref} = create_acceptor(Listen_socket),
>>          {ok, #state{listener = Listen_socket,
>>                      acceptor = Ref,
>>                      module   = Module}};
>> due to various reasons and despite the fact that it was linked, the
>> {'EXIT', Pid, Reason} message is presently not handled (trap_exit though
>> is turned on), so the process will be locked forever.
>> The same can happen if the acceptor process dies anywhere in the middle
>> of the F() function:
>>      F = fun() ->
>>                  {ok, Socket} = gen_tcp:accept(Listener),
>>                  gen_tcp:controlling_process(Socket, Self),
>>                  gen_server:call(Self, {accept, Socket})
>>          end,
>> As mentioned above, this can likely be fixed by proper handling of the
>> {'EXIT', Pid, Reason} and respawning acceptor when it happens.  This,
>> however presents another challenge - what if the system runs out of file
>> descriptors - your listener process will be in an unhappy more of
>> constantly respawning acceptors that will die because of this line:
>>                  {ok, Socket} = gen_tcp:accept(Listener)
>> So you would need to monitor how many accept failures you got in the
>> last several seconds and do some intelligent recovery.  This would
>> complicate code by quite a bit.
>> 2. This new process is not OTP compliant - no supervisors know about it
>> and it doesn't process debug and system messages as per "6.2 Special
>> Processes" of Design Principles.  This means that you may have problems
>> when you upgrade your system dynamically.
>> Partly these are some of the reasons I put together this tutorial to
>> show how to avoid such problems all together.  :-)
>> I hope you will find this feedback useful.
>> Regards,
>> Serge
>> Samuel Tesla wrote:
>>> Serge,
>>> I really got a lot from your guide on building TCP servers.  I really
>>> appreciate the work you put into it.  I think I've got an improvement
>> that
>>> you may want to consider putting up on the website.
>>> I wanted to read documentation for prim_inet:async_accept/2 so I could
>>> figure out what that -1 was for, and couldn't find any
>> documentation.  So, I
>>> Googled and discovered that there is no documentation on purpose (
>>> http://www.trapexit.org/forum/viewtopic.php?p=29157).  Basically, it's
>> not a
>>> guaranteed API between versions, whereas gen_tcp is.  So, I set out to
>> see
>>> if I could use gen_tcp:accept/1 instead of prim_inet:async_accept/2, and
>> I
>>> was successful.
>>> I copied your source off the website and then made modifications.  I
>> only
>>> had to change the listener and the FSM modules, and I've attached the
>>> altered source files.  The gist of what I did was spawn a linked process
>>> which does the accept, and then sends a call back to the main listener
>>> process.  The whole sequence until the control has to be synchronous
>> until
>>> the FSM gets into WAIT_FOR_DATA or the socket will disconnect and you'll
>>> start getting posix errors.
>>> There were a few other things I cleaned up or changed:
>>>  * You don't need to copy socket options, as accept/1 does that.
>>>  * You don't need to call gen_tcp:close/1 in terminate/2 as the
>> listening
>>> socket will close when its controlling process exits.
>>>  * I set {packet, 0} as I was testing with a raw telnet session.
>>> I hope you find this helpful!
>>> -- Samuel

More information about the erlang-questions mailing list