[erlang-questions] Building a Non-blocking TCP server using OTP principles

Dave Rafkind dave.rafkind@REDACTED
Thu Sep 20 22:06:46 CEST 2007


Just to bring this back up again, I looked at ejabberd_listener.erl and as
far as I can tell,
the strategy is to have a "master acceptor" supervisor that babysits some
processes doing gen_tcp:listen() and gen_tcp:accept(), in a "normal"
blocking manner.

When each worker's gen_tcp:accept() returns with a client socket, the socket
is then passed along to a separate "controller" process (supervised by
something else) and the worker goes back to accepting.

Does the fact that the worker processes are properly supervised and can be
administrated through supervision make this an "officially safe" way to have
a non-blocking tcp server?

On 8/17/07, Samuel Tesla <samuel.tesla@REDACTED> wrote:
>
> I suppose the question really becomes, then, how can you write a TCPserver using the documented APIs of Erlang/OTP that you can reasonably
> expect to remain the same between releases?
>
> The APIs in prim_inet aren't intended to be used.  They can change between
> releases and that will cause problems with the original code you posted as
> well.  Whereas a solution that uses the functions exported from gen_tcp can
> be expected to survive upgrades to the OTP environment.
>
> I'm currently looking at ejabberd to see how they do things, as they
> manage a jabber server with only gen_tcp.
>
> -- Samuel
>
> On 8/17/07, Serge Aleynikov <saleyn@REDACTED> wrote:
> >
> > Samuel,
> >
> > Thanks for your input.  I reviewed your modifications and have to say
> > that there are several problems with this approach.
> >
> > 1. Not doing asynchronous accept and relying on a separate process to
> > accept connections may be *dangerous* if not handled properly as it
> > introduces race conditions that could potentially block the server
> > permanently.
> >
> > Here's an important quote from ACE book "C++ Network Programming Vol.1":
> >
> > "When an acceptor socket is passed to select(), it's marked as "active"
> > when a connection is received. Many servers use this event to indicate
> > that it's OK to call accept() without blocking. Unfortunately, there's a
> > race condition that stems from the asynchronous behavior of TCP/IP In
> > particular, after select() indicates an acceptor socket is active (but
> > before accept() is called) a client can close its connection, whereupon
> > accept() can block and potentially hang the entire application process.
> > To avoid this problem, acceptor sockets should always be set into
> > non-blocking mode when used with select()."
> >
> > This applies to your changes indirectly.  Under the hood of the network
> > driver, it still does the asynchronous accept, so the paragraph above
> > doesn't apply at the driver level.  However, there may be a failure
> > between these two lines in the init/1:
> >
> >          {ok, Ref} = create_acceptor(Listen_socket),
> >          {ok, #state{listener = Listen_socket,
> >                      acceptor = Ref,
> >                      module   = Module}};
> >
> > due to various reasons and despite the fact that it was linked, the
> > {'EXIT', Pid, Reason} message is presently not handled (trap_exit though
> >
> > is turned on), so the process will be locked forever.
> >
> > The same can happen if the acceptor process dies anywhere in the middle
> > of the F() function:
> >
> >      F = fun() ->
> >                  {ok, Socket} = gen_tcp:accept(Listener),
> >                  gen_tcp:controlling_process(Socket, Self),
> >                  gen_server:call(Self, {accept, Socket})
> >          end,
> >
> > As mentioned above, this can likely be fixed by proper handling of the
> > {'EXIT', Pid, Reason} and respawning acceptor when it happens.  This,
> > however presents another challenge - what if the system runs out of file
> > descriptors - your listener process will be in an unhappy more of
> > constantly respawning acceptors that will die because of this line:
> >
> >                  {ok, Socket} = gen_tcp:accept(Listener)
> >
> > So you would need to monitor how many accept failures you got in the
> > last several seconds and do some intelligent recovery.  This would
> > complicate code by quite a bit.
> >
> > 2. This new process is not OTP compliant - no supervisors know about it
> > and it doesn't process debug and system messages as per "6.2 Special
> > Processes" of Design Principles.  This means that you may have problems
> > when you upgrade your system dynamically.
> >
> > Partly these are some of the reasons I put together this tutorial to
> > show how to avoid such problems all together.  :-)
> >
> > I hope you will find this feedback useful.
> >
> > Regards,
> >
> > Serge
> >
> >
> > Samuel Tesla wrote:
> > > Serge,
> > >
> > > I really got a lot from your guide on building TCP servers.  I really
> > > appreciate the work you put into it.  I think I've got an improvement
> > that
> > > you may want to consider putting up on the website.
> > >
> > > I wanted to read documentation for prim_inet:async_accept/2 so I could
> > > figure out what that -1 was for, and couldn't find any
> > documentation.  So, I
> > > Googled and discovered that there is no documentation on purpose (
> > > http://www.trapexit.org/forum/viewtopic.php?p=29157).  Basically, it's
> > not a
> > > guaranteed API between versions, whereas gen_tcp is.  So, I set out to
> > see
> > > if I could use gen_tcp:accept/1 instead of prim_inet:async_accept/2,
> > and I
> > > was successful.
> > >
> > > I copied your source off the website and then made modifications.  I
> > only
> > > had to change the listener and the FSM modules, and I've attached the
> > > altered source files.  The gist of what I did was spawn a linked
> > process
> > > which does the accept, and then sends a call back to the main listener
> >
> > > process.  The whole sequence until the control has to be synchronous
> > until
> > > the FSM gets into WAIT_FOR_DATA or the socket will disconnect and
> > you'll
> > > start getting posix errors.
> > >
> > > There were a few other things I cleaned up or changed:
> > >  * You don't need to copy socket options, as accept/1 does that.
> > >  * You don't need to call gen_tcp:close/1 in terminate/2 as the
> > listening
> > > socket will close when its controlling process exits.
> > >  * I set {packet, 0} as I was testing with a raw telnet session.
> > >
> > > I hope you find this helpful!
> > >
> > > -- Samuel
> > >
> >
> >
> >
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20070920/97ebe30d/attachment.htm>


More information about the erlang-questions mailing list