[erlang-bugs] Weird behaviour of gen_tcp:send/2 and erlang:port_command/3 with nosuspend to the same port on R16B03

Lukas Larsson lukas@REDACTED
Tue Jun 24 19:28:36 CEST 2014


Hello,

I was able to reproduce your testcase after removing all the "msg" 
printouts and starting a couple of clients at the same time. It seems 
that the sockets are hitting the high_msgq_watermark limit and then as 
data gets flushes they are not set to run again. I'll see if I can dig 
out what it is that is causing this behavior.

Lukas
On 24/06/14 16:56, Vasily Demidenok wrote:
> Hello list, we faced with some gen_tcp related problems after switch 
> from erlang R15B03 to R16B03-01
>
> The problem is as following: When server produce data faster then 
> consumer can handle, after
> the out server's buffers are full and incoming client's buffers are 
> full gen_tcp:send/2 call on server side blocks forever in 
> erts_internal:port_command/3. After this, even when client consumes all
> the data and the buffers are empty server process remains to be 
> suspended in that call
>
> This problem does not occur always, but quite often.
>
> Some details on implementation are below, I also shrink the example to
> this small app so you can check the code: 
> https://github.com/define-null/tcp_failing_ex
>
> Server is implemented in such a way, that it listen on 8899 port, then 
> when client connect to it spawn main srv process and plenty of 
> workers, which start to write to this port after client send some 
> special msg. The main process is responsible for commands from the 
> client and send responses via gen_tcp:send/2, while workers try to 
> write some stream data to the client and use erang:port_command with 
> no-suspend. So workers send only up-to-date data, dropping any in case 
> client is slow.
>
> The behaviour which we see is as following:
> At first phase producer fills OS and erlang driver's buffers. Consumer 
> read data as it arrives and server drop data which it cannot send. So 
> we see buffer size growing on both side out queue of the server and in 
> queue of the client respectively
>
> After some moment in time, i guess when the buffers are completely 
> filled, server try respond to
> ping message of the client, using gen_tcp:send/2 call. After that, it 
> blocks there forever, even after client consumes all the messages. The 
> situation does not change and the srv process remains in the suspended 
> state, while it's incoming buffer begins to grow when client send more 
> ping messages.
>
> Below is the output on the system with two slow clients, where for the 
> first client server's process is already blocked in gen_tcp:send/2 
> call, while the second is served well.
>
> Every 2.0s: netstat -al | grep 8899                         Tue Jun 24 
> 16:34:51 2014
>
> tcp4      36      0  localhost.8899         localhost.63263       
>  ESTABLISHED
> tcp4       0       0  localhost.63263        localhost.8899         
> ESTABLISHED
> tcp4       0 130990  localhost.8899         localhost.63257       
>  ESTABLISHED
> tcp4  619190   0  localhost.63257        localhost.8899     ESTABLISHED
> tcp4       0       0  *.8899                 *.*          LISTEN
>
> This is the output for the client process from github example, where 
> we see that
> after send operation (ping msg) no incoming msg come any more.
> {{2014,6,24},{16,33,28}}: msg
> {{2014,6,24},{16,33,28}}: msg
> {{2014,6,24},{16,33,28}}: msg
> {{2014,6,24},{16,33,28}}: msg
> {{2014,6,24},{16,33,28}}: msg
> {{2014,6,24},{16,33,28}}: msg
> {{2014,6,24},{16,33,28}}: msg
> {{2014,6,24},{16,33,28}}: msg
> {{2014,6,24},{16,33,28}}: msg
> {{2014,6,24},{16,33,28}}: msg
> {{2014,6,24},{16,33,48}} before send
> {{2014,6,24},{16,33,48}} after send ok
> {{2014,6,24},{16,34,9}} before send
> {{2014,6,24},{16,34,9}} after send ok
> {{2014,6,24},{16,34,30}} before send
> {{2014,6,24},{16,34,30}} after send ok
> {{2014,6,24},{16,34,51}} before send
> {{2014,6,24},{16,34,51}} after send ok
> {{2014,6,24},{16,35,12}} before send
> ....
>
> Server blocked process output:
>
> {{2014,6,24},{16,33,21}}: <0.95.0> ping
> {{2014,6,24},{16,33,21}} bsend: <0.95.0>
> {{2014,6,24},{16,33,21}} asend: <0.95.0> ok
> {{2014,6,24},{16,33,48}}: <0.95.0> ping
> {{2014,6,24},{16,33,48}} bsend: <0.95.0>
> %% (no asend message after it)
>
> (tcp_failing_node@REDACTED 
> <mailto:tcp_failing_node@REDACTED>)1> 
> erlang:process_info(pid(0,95,0)).
> [{current_function,{erts_internal,port_command,3}},
>  {initial_call,{proc_lib,init_p,5}},
>
> Bug is not always reproducible, but occurs quite often. The problem is 
> that even
> after server's out buffers are empty data does not arrive to the 
> client, and incoming buffer grow
> as client send ping messages to the server. (So erlang:port_command/3 
> with no-suspend always return false
> when another main server process for this connection is suspended in 
> gen_tcp:send/2)
>
> And then it's getting only worse as already mentioned
>
> Every 2.0s: netstat -al | grep 8899                         Tue Jun 24 
> 16:56:59 2014
>
> tcp4     804      0  localhost.8899         localhost.63263       
>  ESTABLISHED
> tcp4       0      0  localhost.63263        localhost.8899         
> ESTABLISHED
> tcp4       0      0  localhost.8899         localhost.63257       
>  ESTABLISHED
> tcp4       0      0  localhost.63257        localhost.8899         
> ESTABLISHED
> tcp4       0      0  *.8899                 *.*        LISTEN
>
> We faced with this after switching to R16B03 from R15B03, I know there 
> were some changes in port_command handling, i guess why we got such 
> behaviour?
>
>
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://erlang.org/mailman/listinfo/erlang-bugs

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20140624/0e4939e4/attachment.htm>


More information about the erlang-bugs mailing list