[erlang-bugs] 回复: Weird behaviour of gen_tcp:send/2 anderlang:port_command/3 with nosuspend to the same port on R16B03

o鸡蛋黄o 53817681@REDACTED
Mon Jul 28 11:41:54 CEST 2014


Dear,

      a bug‍  in erlang vm sourcecode  , port_get_data_1 process arg is NULL , I had fixed it,thanks.




------------------ 原始邮件 ------------------
发件人: "Vasily Demidenok";<define.null@REDACTED>;
发送时间: 2014年7月28日(星期一) 下午5:16
收件人: "Lukas Larsson"<lukas@REDACTED>; 
抄送: "erlang-bugs"<erlang-bugs@REDACTED>; 
主题: Re: [erlang-bugs] Weird behaviour of gen_tcp:send/2 anderlang:port_command/3 with nosuspend to the same port on R16B03



Hello again
The friend of mine faced with the same problem. He assumes it's not tcp driver related problem, but port internals changes made in R16 release.Downgrading to R15B03 fixed their problem. +n options with d/s/a params also did not help.
 

The app to reproduce bug:
https://github.com/kudryashov-sv/ErlangCPort.git





 2014-07-05 0:47 GMT+04:00 Vasily Demidenok <define.null@REDACTED>:
 The problem remains even if only erlang:port_command/3 with nosuspend option is used. (no calls for gen_tcp:send and many processes write to the same socket)


 2014-06-24 21:28 GMT+04:00 Lukas Larsson <lukas@REDACTED>:
                   Hello,
     
     I was able to reproduce your testcase after removing all the "msg"     printouts and starting a couple of clients at the same time. It     seems that the sockets are hitting the high_msgq_watermark limit and     then as data gets flushes they are not set to run again. I'll see if     I can dig out what it is that is causing this behavior. 
     
     Lukas
     On 24/06/14 16:56, Vasily Demidenok       wrote:
     
     

                Hello list, we faced with some gen_tcp related problems           after switch from erlang R15B03 to R16B03-01
         
         
         The problem is as following: When server produce data           faster then consumer can handle, after 
         the out server's buffers are full and incoming client's           buffers are full gen_tcp:send/2 call on server side blocks           forever in erts_internal:port_command/3. After this, even when           client consumes all
                    the data and the buffers are empty server process remains to           be suspended in that call
         
         
         This problem does not occur always, but quite often.
         
         
         Some details on implementation are below, I also shrink the           example to 
         this small app so you can check the code: https://github.com/define-null/tcp_failing_ex
         
         
         Server is implemented in such a way, that it listen on 8899           port, then when client connect to it spawn main srv process           and plenty of workers, which start to write to this port after           client send some special msg. The main process is responsible           for commands from the client and send responses via           gen_tcp:send/2, while workers try to write some stream data to           the client and use erang:port_command with no-suspend. So           workers send only up-to-date data, dropping any in case client           is slow. 
         
         
         The behaviour which we see is as following:
         At first phase producer fills OS and erlang driver's           buffers. Consumer read data as it arrives and server drop data           which it cannot send. So we see buffer size growing on both           side out queue of the server and in queue of the client           respectively
         
         
         After some moment in time, i guess when the buffers are           completely filled, server try respond to
         ping message of the client, using gen_tcp:send/2 call.           After that, it blocks there forever, even after client           consumes all the messages. The situation does not change and           the srv process remains in the suspended state, while it's           incoming buffer begins to grow when client send more ping           messages.
         
         
         Below is the output on the system with two slow clients,           where for the first client server's process is already blocked           in gen_tcp:send/2 call, while the second is served well.
         
         
         Every 2.0s: netstat -al | grep 8899                                                                                                                                   Tue Jun 24 16:34:51 2014
         
         
         tcp4      36      0  localhost.8899         localhost.63263                  ESTABLISHED
         tcp4       0       0  localhost.63263        localhost.8899                   ESTABLISHED
         tcp4       0 130990  localhost.8899         localhost.63257                  ESTABLISHED
         tcp4  619190   0  localhost.63257        localhost.8899                   ESTABLISHED
         tcp4       0       0  *.8899                 *.*                              LISTEN
         
         
         This is the output for the client process from github           example, where we see that
         after send operation (ping msg) no incoming msg come any           more.
         {{2014,6,24},{16,33,28}}: msg
         {{2014,6,24},{16,33,28}}: msg
         {{2014,6,24},{16,33,28}}: msg
         {{2014,6,24},{16,33,28}}: msg
         {{2014,6,24},{16,33,28}}: msg
         {{2014,6,24},{16,33,28}}: msg
         {{2014,6,24},{16,33,28}}: msg
         {{2014,6,24},{16,33,28}}: msg
         {{2014,6,24},{16,33,28}}: msg
         {{2014,6,24},{16,33,28}}: msg
         {{2014,6,24},{16,33,48}} before send
         {{2014,6,24},{16,33,48}} after send ok
         {{2014,6,24},{16,34,9}} before send
         {{2014,6,24},{16,34,9}} after send ok
         {{2014,6,24},{16,34,30}} before send
         {{2014,6,24},{16,34,30}} after send ok
         {{2014,6,24},{16,34,51}} before send
         {{2014,6,24},{16,34,51}} after send ok
         {{2014,6,24},{16,35,12}} before send
         ....
         
         
         Server blocked process output:
         
         
         {{2014,6,24},{16,33,21}}: <0.95.0> ping
         {{2014,6,24},{16,33,21}} bsend: <0.95.0>
         {{2014,6,24},{16,33,21}} asend: <0.95.0> ok
         {{2014,6,24},{16,33,48}}: <0.95.0> ping
         {{2014,6,24},{16,33,48}} bsend: <0.95.0>
         %% (no asend message after it)
         
         
         (tcp_failing_node@REDACTED)1>           erlang:process_info(pid(0,95,0)).
         [{current_function,{erts_internal,port_command,3}},
          {initial_call,{proc_lib,init_p,5}},
         
         
         Bug is not always reproducible, but occurs quite often. The           problem is that even
         after server's out buffers are empty data does not arrive           to the client, and incoming buffer grow
         as client send ping messages to the server. (So           erlang:port_command/3 with no-suspend always return false
         when another main server process for this connection is           suspended in gen_tcp:send/2)
         
         
         And then it's getting only worse as already mentioned
         
         
         Every 2.0s: netstat -al | grep 8899                                                                                                                                   Tue Jun 24 16:56:59 2014
         
         
         tcp4     804      0  localhost.8899         localhost.63263                  ESTABLISHED
         tcp4       0      0  localhost.63263        localhost.8899                   ESTABLISHED
         tcp4       0      0  localhost.8899         localhost.63257                  ESTABLISHED
         tcp4       0      0  localhost.63257        localhost.8899                   ESTABLISHED
         tcp4       0      0  *.8899                 *.*                              LISTEN
         
         
         We faced with this after switching to R16B03 from R15B03, I           know there were some changes in port_command handling, i guess           why we got such behaviour?
       
       
              
       

_______________________________________________ erlang-bugs mailing list erlang-bugs@REDACTED http://erlang.org/mailman/listinfo/erlang-bugs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20140728/998ac0bd/attachment.htm>


More information about the erlang-bugs mailing list