<div dir="ltr">The problem remains even if only erlang:port_command/3 with nosuspend option is used. (no calls for gen_tcp:send and many processes write to the same socket)</div><div class="gmail_extra"><br><br><div class="gmail_quote">
2014-06-24 21:28 GMT+04:00 Lukas Larsson <span dir="ltr"><<a href="mailto:lukas@erlang.org" target="_blank">lukas@erlang.org</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
Hello,<br>
<br>
I was able to reproduce your testcase after removing all the "msg"
printouts and starting a couple of clients at the same time. It
seems that the sockets are hitting the high_msgq_watermark limit and
then as data gets flushes they are not set to run again. I'll see if
I can dig out what it is that is causing this behavior. <br>
<br>
Lukas<div><div class="h5"><br>
<div>On 24/06/14 16:56, Vasily Demidenok
wrote:<br>
</div>
</div></div><blockquote type="cite"><div><div class="h5">
<div dir="ltr">
<div>Hello list, we faced with some gen_tcp related problems
after switch from erlang R15B03 to R16B03-01</div>
<div><br>
</div>
<div>The problem is as following: When server produce data
faster then consumer can handle, after </div>
<div>the out server's buffers are full and incoming client's
buffers are full gen_tcp:send/2 call on server side blocks
forever in erts_internal:port_command/3. After this, even when
client consumes all</div>
<div>
the data and the buffers are empty server process remains to
be suspended in that call</div>
<div><br>
</div>
<div>This problem does not occur always, but quite often.</div>
<div><br>
</div>
<div>Some details on implementation are below, I also shrink the
example to </div>
<div>this small app so you can check the code: <a href="https://github.com/define-null/tcp_failing_ex" target="_blank">https://github.com/define-null/tcp_failing_ex</a></div>
<div><br>
</div>
<div>Server is implemented in such a way, that it listen on 8899
port, then when client connect to it spawn main srv process
and plenty of workers, which start to write to this port after
client send some special msg. The main process is responsible
for commands from the client and send responses via
gen_tcp:send/2, while workers try to write some stream data to
the client and use erang:port_command with no-suspend. So
workers send only up-to-date data, dropping any in case client
is slow. </div>
<div><br>
</div>
<div>The behaviour which we see is as following:</div>
<div>At first phase producer fills OS and erlang driver's
buffers. Consumer read data as it arrives and server drop data
which it cannot send. So we see buffer size growing on both
side out queue of the server and in queue of the client
respectively</div>
<div><br>
</div>
<div>After some moment in time, i guess when the buffers are
completely filled, server try respond to</div>
<div>ping message of the client, using gen_tcp:send/2 call.
After that, it blocks there forever, even after client
consumes all the messages. The situation does not change and
the srv process remains in the suspended state, while it's
incoming buffer begins to grow when client send more ping
messages.</div>
<div><br>
</div>
<div>Below is the output on the system with two slow clients,
where for the first client server's process is already blocked
in gen_tcp:send/2 call, while the second is served well.</div>
<div><br>
</div>
<div>Every 2.0s: netstat -al | grep 8899
Tue Jun 24 16:34:51 2014</div>
<div><br>
</div>
<div>tcp4 36 0 localhost.8899 localhost.63263
ESTABLISHED</div>
<div>tcp4 0 0 localhost.63263 localhost.8899
ESTABLISHED</div>
<div>tcp4 0 130990 localhost.8899 localhost.63257
ESTABLISHED</div>
<div>tcp4 619190 0 localhost.63257 localhost.8899
ESTABLISHED</div>
<div>tcp4 0 0 *.8899 *.*
LISTEN</div>
<div><br>
</div>
<div>This is the output for the client process from github
example, where we see that</div>
<div>after send operation (ping msg) no incoming msg come any
more.</div>
<div>{{2014,6,24},{16,33,28}}: msg</div>
<div>{{2014,6,24},{16,33,28}}: msg</div>
<div>{{2014,6,24},{16,33,28}}: msg</div>
<div>{{2014,6,24},{16,33,28}}: msg</div>
<div>{{2014,6,24},{16,33,28}}: msg</div>
<div>{{2014,6,24},{16,33,28}}: msg</div>
<div>{{2014,6,24},{16,33,28}}: msg</div>
<div>{{2014,6,24},{16,33,28}}: msg</div>
<div>{{2014,6,24},{16,33,28}}: msg</div>
<div>{{2014,6,24},{16,33,28}}: msg</div>
<div>{{2014,6,24},{16,33,48}} before send</div>
<div>{{2014,6,24},{16,33,48}} after send ok</div>
<div>{{2014,6,24},{16,34,9}} before send</div>
<div>{{2014,6,24},{16,34,9}} after send ok</div>
<div>{{2014,6,24},{16,34,30}} before send</div>
<div>{{2014,6,24},{16,34,30}} after send ok</div>
<div>{{2014,6,24},{16,34,51}} before send</div>
<div>{{2014,6,24},{16,34,51}} after send ok</div>
<div>{{2014,6,24},{16,35,12}} before send</div>
<div>....</div>
<div><br>
</div>
<div>Server blocked process output:</div>
<div><br>
</div>
<div>{{2014,6,24},{16,33,21}}: <0.95.0> ping</div>
<div>{{2014,6,24},{16,33,21}} bsend: <0.95.0></div>
<div>{{2014,6,24},{16,33,21}} asend: <0.95.0> ok</div>
<div>{{2014,6,24},{16,33,48}}: <0.95.0> ping</div>
<div>{{2014,6,24},{16,33,48}} bsend: <0.95.0></div>
<div>%% (no asend message after it)</div>
<div><br>
</div>
<div>(<a href="mailto:tcp_failing_node@localdomain.me" target="_blank">tcp_failing_node@localdomain.me</a>)1>
erlang:process_info(pid(0,95,0)).</div>
<div>[{current_function,{erts_internal,port_command,3}},</div>
<div> {initial_call,{proc_lib,init_p,5}},</div>
<div><br>
</div>
<div>Bug is not always reproducible, but occurs quite often. The
problem is that even</div>
<div>after server's out buffers are empty data does not arrive
to the client, and incoming buffer grow</div>
<div>as client send ping messages to the server. (So
erlang:port_command/3 with no-suspend always return false</div>
<div>when another main server process for this connection is
suspended in gen_tcp:send/2)</div>
<div><br>
</div>
<div>And then it's getting only worse as already mentioned</div>
<div><br>
</div>
<div>Every 2.0s: netstat -al | grep 8899
Tue Jun 24 16:56:59 2014</div>
<div><br>
</div>
<div>tcp4 804 0 localhost.8899 localhost.63263
ESTABLISHED</div>
<div>tcp4 0 0 localhost.63263 localhost.8899
ESTABLISHED</div>
<div>tcp4 0 0 localhost.8899 localhost.63257
ESTABLISHED</div>
<div>tcp4 0 0 localhost.63257 localhost.8899
ESTABLISHED</div>
<div>tcp4 0 0 *.8899 *.*
LISTEN</div>
<div><br>
</div>
<div>We faced with this after switching to R16B03 from R15B03, I
know there were some changes in port_command handling, i guess
why we got such behaviour?</div>
</div>
<br>
<fieldset></fieldset>
<br>
</div></div><pre>_______________________________________________
erlang-bugs mailing list
<a href="mailto:erlang-bugs@erlang.org" target="_blank">erlang-bugs@erlang.org</a>
<a href="http://erlang.org/mailman/listinfo/erlang-bugs" target="_blank">http://erlang.org/mailman/listinfo/erlang-bugs</a>
</pre>
</blockquote>
<br>
</div>
</blockquote></div><br></div>