[erlang-bugs] Weird behaviour of gen_tcp:send/2 and erlang:port_command/3 with nosuspend to the same port on R16B03

Fri Jul 4 22:47:10 CEST 2014

The problem remains even if only erlang:port_command/3 with nosuspend
option is used. (no calls for gen_tcp:send and many processes write to the
same socket)

2014-06-24 21:28 GMT+04:00 Lukas Larsson <lukas@REDACTED>:

>  Hello,
>
> I was able to reproduce your testcase after removing all the "msg"
> printouts and starting a couple of clients at the same time. It seems that
> the sockets are hitting the high_msgq_watermark limit and then as data gets
> flushes they are not set to run again. I'll see if I can dig out what it is
> that is causing this behavior.
>
> Lukas
>
> On 24/06/14 16:56, Vasily Demidenok wrote:
>
>  Hello list, we faced with some gen_tcp related problems after switch
> from erlang R15B03 to R16B03-01
>
>  The problem is as following: When server produce data faster then
> consumer can handle, after
> the out server's buffers are full and incoming client's buffers are full
> gen_tcp:send/2 call on server side blocks forever in
> erts_internal:port_command/3. After this, even when client consumes all
>  the data and the buffers are empty server process remains to be suspended
> in that call
>
>  This problem does not occur always, but quite often.
>
>  Some details on implementation are below, I also shrink the example to
> this small app so you can check the code:
> https://github.com/define-null/tcp_failing_ex
>
>  Server is implemented in such a way, that it listen on 8899 port, then
> when client connect to it spawn main srv process and plenty of workers,
> which start to write to this port after client send some special msg. The
> main process is responsible for commands from the client and send responses
> via gen_tcp:send/2, while workers try to write some stream data to the
> client and use erang:port_command with no-suspend. So workers send only
> up-to-date data, dropping any in case client is slow.
>
>  The behaviour which we see is as following:
> At first phase producer fills OS and erlang driver's buffers. Consumer
> read data as it arrives and server drop data which it cannot send. So we
> see buffer size growing on both side out queue of the server and in queue
> of the client respectively
>
>  After some moment in time, i guess when the buffers are completely
> filled, server try respond to
> ping message of the client, using gen_tcp:send/2 call. After that, it
> blocks there forever, even after client consumes all the messages. The
> situation does not change and the srv process remains in the suspended
> state, while it's incoming buffer begins to grow when client send more ping
> messages.
>
>  Below is the output on the system with two slow clients, where for the
> first client server's process is already blocked in gen_tcp:send/2 call,
> while the second is served well.
>
>  Every 2.0s: netstat -al | grep 8899
>
> Tue Jun 24 16:34:51 2014
>
>  tcp4      36      0  localhost.8899         localhost.63263
>  ESTABLISHED
> tcp4       0       0  localhost.63263        localhost.8899
> ESTABLISHED
> tcp4       0 130990  localhost.8899         localhost.63257
>  ESTABLISHED
> tcp4  619190   0  localhost.63257        localhost.8899         ESTABLISHED
> tcp4       0       0  *.8899                 *.*                    LISTEN
>
>  This is the output for the client process from github example, where we
> see that
> after send operation (ping msg) no incoming msg come any more.
> {{2014,6,24},{16,33,28}}: msg
> {{2014,6,24},{16,33,28}}: msg
> {{2014,6,24},{16,33,28}}: msg
> {{2014,6,24},{16,33,28}}: msg
> {{2014,6,24},{16,33,28}}: msg
> {{2014,6,24},{16,33,28}}: msg
> {{2014,6,24},{16,33,28}}: msg
> {{2014,6,24},{16,33,28}}: msg
> {{2014,6,24},{16,33,28}}: msg
> {{2014,6,24},{16,33,28}}: msg
> {{2014,6,24},{16,33,48}} before send
> {{2014,6,24},{16,33,48}} after send ok
> {{2014,6,24},{16,34,9}} before send
> {{2014,6,24},{16,34,9}} after send ok
> {{2014,6,24},{16,34,30}} before send
> {{2014,6,24},{16,34,30}} after send ok
> {{2014,6,24},{16,34,51}} before send
> {{2014,6,24},{16,34,51}} after send ok
> {{2014,6,24},{16,35,12}} before send
> ....
>
>  Server blocked process output:
>
>  {{2014,6,24},{16,33,21}}: <0.95.0> ping
> {{2014,6,24},{16,33,21}} bsend: <0.95.0>
> {{2014,6,24},{16,33,21}} asend: <0.95.0> ok
> {{2014,6,24},{16,33,48}}: <0.95.0> ping
> {{2014,6,24},{16,33,48}} bsend: <0.95.0>
> %% (no asend message after it)
>
>  (tcp_failing_node@REDACTED)1> erlang:process_info(pid(0,95,0)).
> [{current_function,{erts_internal,port_command,3}},
>  {initial_call,{proc_lib,init_p,5}},
>
>  Bug is not always reproducible, but occurs quite often. The problem is
> that even
> after server's out buffers are empty data does not arrive to the client,
> and incoming buffer grow
> as client send ping messages to the server. (So erlang:port_command/3 with
> no-suspend always return false
> when another main server process for this connection is suspended in
> gen_tcp:send/2)
>
>  And then it's getting only worse as already mentioned
>
>  Every 2.0s: netstat -al | grep 8899
>
> Tue Jun 24 16:56:59 2014
>
>  tcp4     804      0  localhost.8899         localhost.63263
>  ESTABLISHED
> tcp4       0      0  localhost.63263        localhost.8899
> ESTABLISHED
> tcp4       0      0  localhost.8899         localhost.63257
>  ESTABLISHED
> tcp4       0      0  localhost.63257        localhost.8899
> ESTABLISHED
> tcp4       0      0  *.8899                 *.*                    LISTEN
>
>  We faced with this after switching to R16B03 from R15B03, I know there
> were some changes in port_command handling, i guess why we got such
> behaviour?
>
>
> _______________________________________________
> erlang-bugs mailing listerlang-bugs@REDACTED://erlang.org/mailman/listinfo/erlang-bugs
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20140705/4b8a6890/attachment.htm>