I suppose the question really becomes, then, how can you write a TCP server using the documented APIs of Erlang/OTP that you can reasonably expect to remain the same between releases?<br><br>The APIs in prim_inet aren't intended to be used. They can change between releases and that will cause problems with the original code you posted as well. Whereas a solution that uses the functions exported from gen_tcp can be expected to survive upgrades to the OTP environment.
<br><br>I'm currently looking at ejabberd to see how they do things, as they manage a jabber server with only gen_tcp.<br><br>-- Samuel<br><br><div><span class="gmail_quote">On 8/17/07, <b class="gmail_sendername">Serge Aleynikov
</b> <<a href="mailto:saleyn@gmail.com">saleyn@gmail.com</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Samuel,<br>
<br>Thanks for your input. I reviewed your modifications and have to say<br>that there are several problems with this approach.<br><br>1. Not doing asynchronous accept and relying on a separate process to<br>accept connections may be *dangerous* if not handled properly as it
<br>introduces race conditions that could potentially block the server<br>permanently.<br><br>Here's an important quote from ACE book "C++ Network Programming Vol.1":<br><br>"When an acceptor socket is passed to select(), it's marked as "active"
<br>when a connection is received. Many servers use this event to indicate<br>that it's OK to call accept() without blocking. Unfortunately, there's a<br>race condition that stems from the asynchronous behavior of TCP/IP In
<br>particular, after select() indicates an acceptor socket is active (but<br>before accept() is called) a client can close its connection, whereupon<br>accept() can block and potentially hang the entire application process.
<br>To avoid this problem, acceptor sockets should always be set into<br>non-blocking mode when used with select()."<br><br>This applies to your changes indirectly. Under the hood of the network<br>driver, it still does the asynchronous accept, so the paragraph above
<br>doesn't apply at the driver level. However, there may be a failure<br>between these two lines in the init/1:<br><br> {ok, Ref} = create_acceptor(Listen_socket),<br> {ok, #state{listener = Listen_socket,
<br> acceptor = Ref,<br> module = Module}};<br><br>due to various reasons and despite the fact that it was linked, the<br>{'EXIT', Pid, Reason} message is presently not handled (trap_exit though
<br>is turned on), so the process will be locked forever.<br><br>The same can happen if the acceptor process dies anywhere in the middle<br>of the F() function:<br><br> F = fun() -><br> {ok, Socket} = gen_tcp:accept(Listener),
<br> gen_tcp:controlling_process(Socket, Self),<br> gen_server:call(Self, {accept, Socket})<br> end,<br><br>As mentioned above, this can likely be fixed by proper handling of the<br>
{'EXIT', Pid, Reason} and respawning acceptor when it happens. This,<br>however presents another challenge - what if the system runs out of file<br>descriptors - your listener process will be in an unhappy more of
<br>constantly respawning acceptors that will die because of this line:<br><br> {ok, Socket} = gen_tcp:accept(Listener)<br><br>So you would need to monitor how many accept failures you got in the<br>last several seconds and do some intelligent recovery. This would
<br>complicate code by quite a bit.<br><br>2. This new process is not OTP compliant - no supervisors know about it<br>and it doesn't process debug and system messages as per "6.2 Special<br>Processes" of Design Principles. This means that you may have problems
<br>when you upgrade your system dynamically.<br><br>Partly these are some of the reasons I put together this tutorial to<br>show how to avoid such problems all together. :-)<br><br>I hope you will find this feedback useful.
<br><br>Regards,<br><br>Serge<br><br><br>Samuel Tesla wrote:<br>> Serge,<br>><br>> I really got a lot from your guide on building TCP servers. I really<br>> appreciate the work you put into it. I think I've got an improvement that
<br>> you may want to consider putting up on the website.<br>><br>> I wanted to read documentation for prim_inet:async_accept/2 so I could<br>> figure out what that -1 was for, and couldn't find any documentation. So, I
<br>> Googled and discovered that there is no documentation on purpose (<br>> <a href="http://www.trapexit.org/forum/viewtopic.php?p=29157">http://www.trapexit.org/forum/viewtopic.php?p=29157</a>). Basically, it's not a
<br>> guaranteed API between versions, whereas gen_tcp is. So, I set out to see<br>> if I could use gen_tcp:accept/1 instead of prim_inet:async_accept/2, and I<br>> was successful.<br>><br>> I copied your source off the website and then made modifications. I only
<br>> had to change the listener and the FSM modules, and I've attached the<br>> altered source files. The gist of what I did was spawn a linked process<br>> which does the accept, and then sends a call back to the main listener
<br>> process. The whole sequence until the control has to be synchronous until<br>> the FSM gets into WAIT_FOR_DATA or the socket will disconnect and you'll<br>> start getting posix errors.<br>><br>> There were a few other things I cleaned up or changed:
<br>> * You don't need to copy socket options, as accept/1 does that.<br>> * You don't need to call gen_tcp:close/1 in terminate/2 as the listening<br>> socket will close when its controlling process exits.
<br>> * I set {packet, 0} as I was testing with a raw telnet session.<br>><br>> I hope you find this helpful!<br>><br>> -- Samuel<br>><br><br><br></blockquote></div><br>