[erlang-questions] Building a Non-blocking TCP server using OTP principles

Samuel Tesla samuel.tesla@REDACTED
Fri Aug 17 22:04:36 CEST 2007


I suppose the question really becomes, then, how can you write a TCP server
using the documented APIs of Erlang/OTP that you can reasonably expect to
remain the same between releases?

The APIs in prim_inet aren't intended to be used.  They can change between
releases and that will cause problems with the original code you posted as
well.  Whereas a solution that uses the functions exported from gen_tcp can
be expected to survive upgrades to the OTP environment.

I'm currently looking at ejabberd to see how they do things, as they manage
a jabber server with only gen_tcp.

-- Samuel

On 8/17/07, Serge Aleynikov <saleyn@REDACTED> wrote:
>
> Samuel,
>
> Thanks for your input.  I reviewed your modifications and have to say
> that there are several problems with this approach.
>
> 1. Not doing asynchronous accept and relying on a separate process to
> accept connections may be *dangerous* if not handled properly as it
> introduces race conditions that could potentially block the server
> permanently.
>
> Here's an important quote from ACE book "C++ Network Programming Vol.1":
>
> "When an acceptor socket is passed to select(), it's marked as "active"
> when a connection is received. Many servers use this event to indicate
> that it's OK to call accept() without blocking. Unfortunately, there's a
> race condition that stems from the asynchronous behavior of TCP/IP In
> particular, after select() indicates an acceptor socket is active (but
> before accept() is called) a client can close its connection, whereupon
> accept() can block and potentially hang the entire application process.
> To avoid this problem, acceptor sockets should always be set into
> non-blocking mode when used with select()."
>
> This applies to your changes indirectly.  Under the hood of the network
> driver, it still does the asynchronous accept, so the paragraph above
> doesn't apply at the driver level.  However, there may be a failure
> between these two lines in the init/1:
>
>          {ok, Ref} = create_acceptor(Listen_socket),
>          {ok, #state{listener = Listen_socket,
>                      acceptor = Ref,
>                      module   = Module}};
>
> due to various reasons and despite the fact that it was linked, the
> {'EXIT', Pid, Reason} message is presently not handled (trap_exit though
> is turned on), so the process will be locked forever.
>
> The same can happen if the acceptor process dies anywhere in the middle
> of the F() function:
>
>      F = fun() ->
>                  {ok, Socket} = gen_tcp:accept(Listener),
>                  gen_tcp:controlling_process(Socket, Self),
>                  gen_server:call(Self, {accept, Socket})
>          end,
>
> As mentioned above, this can likely be fixed by proper handling of the
> {'EXIT', Pid, Reason} and respawning acceptor when it happens.  This,
> however presents another challenge - what if the system runs out of file
> descriptors - your listener process will be in an unhappy more of
> constantly respawning acceptors that will die because of this line:
>
>                  {ok, Socket} = gen_tcp:accept(Listener)
>
> So you would need to monitor how many accept failures you got in the
> last several seconds and do some intelligent recovery.  This would
> complicate code by quite a bit.
>
> 2. This new process is not OTP compliant - no supervisors know about it
> and it doesn't process debug and system messages as per "6.2 Special
> Processes" of Design Principles.  This means that you may have problems
> when you upgrade your system dynamically.
>
> Partly these are some of the reasons I put together this tutorial to
> show how to avoid such problems all together.  :-)
>
> I hope you will find this feedback useful.
>
> Regards,
>
> Serge
>
>
> Samuel Tesla wrote:
> > Serge,
> >
> > I really got a lot from your guide on building TCP servers.  I really
> > appreciate the work you put into it.  I think I've got an improvement
> that
> > you may want to consider putting up on the website.
> >
> > I wanted to read documentation for prim_inet:async_accept/2 so I could
> > figure out what that -1 was for, and couldn't find any
> documentation.  So, I
> > Googled and discovered that there is no documentation on purpose (
> > http://www.trapexit.org/forum/viewtopic.php?p=29157).  Basically, it's
> not a
> > guaranteed API between versions, whereas gen_tcp is.  So, I set out to
> see
> > if I could use gen_tcp:accept/1 instead of prim_inet:async_accept/2, and
> I
> > was successful.
> >
> > I copied your source off the website and then made modifications.  I
> only
> > had to change the listener and the FSM modules, and I've attached the
> > altered source files.  The gist of what I did was spawn a linked process
> > which does the accept, and then sends a call back to the main listener
> > process.  The whole sequence until the control has to be synchronous
> until
> > the FSM gets into WAIT_FOR_DATA or the socket will disconnect and you'll
> > start getting posix errors.
> >
> > There were a few other things I cleaned up or changed:
> >  * You don't need to copy socket options, as accept/1 does that.
> >  * You don't need to call gen_tcp:close/1 in terminate/2 as the
> listening
> > socket will close when its controlling process exits.
> >  * I set {packet, 0} as I was testing with a raw telnet session.
> >
> > I hope you find this helpful!
> >
> > -- Samuel
> >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20070817/396f7f9a/attachment.htm>


More information about the erlang-questions mailing list