<div dir="ltr">My Basho colleague Reid Draper recently encountered a strange problem with RiakCS on OS X Mavericks (at least versions 10.9.1 and 10.9.2) where there are intermittent multi-second delays when receiving data. It does not occur on OS X versions 10.8.x or older. RiakCS uses Mochiweb internally, but I've also reproduced the problem with Yaws. It usually happens with transfers of 10k bytes or more but I've seen it with transfers as small as 1k as well. It occurs with all recent versions of Erlang that build easily on Mavericks, starting with R15B03 up to and including 17-rc2. I've also tried building Erlang with gcc instead of clang, but it happens either way.<div>
<br></div><div>Reid tried to reproduce the issue with non-Erlang approaches including python and netcat, and did not succeed in reproducing the problem. I haven't been able to create a small Erlang test case that exhibits the problem, so I've included a script below that clones the Yaws repo, builds it, and runs an interactive Yaws instance. To see the problem for yourself, run the script, then when Yaws comes up, you'll be in an Erlang shell where you can turn on dbg tracing and run a Yaws test as shown below:</div>
<div><br></div><div>dbg:tracer(), dbg:p(all,[call,timestamp,return_to]).</div><div>dbg:tpl(prim_inet,recv0,[]).</div><div>app_test:start([flush_large_post]).</div><div><br></div><div>The test might pass or it might fail, depending on how many transfer delays occur. But either way, if you look at the dbg trace output, you'll likely see cases where there are delays of multiples of 5 seconds within prim_inet:recv0. For example, in the output below we see two consecutive delays of 5 seconds within two prim_inet:recv0 calls:</div>
<div><br></div><div><div>(<0.83.0>) call prim_inet:recv0(#Port<0.1399>,2048000,10000) (Timestamp: {1393,</div><div> 896101,</div><div> 405228})</div>
<div>(<0.83.0>) returning to yaws:cli_recv/3 (Timestamp: {1393,896106,516602})</div><div>(<0.83.0>) call prim_inet:recv0(#Port<0.1399>,2048000,10000) (Timestamp: {1393,</div><div> 896106,</div>
<div> 521473})</div><div>(<0.83.0>) returning to yaws_server:flush/5 (Timestamp: {1393,896111,827620})</div></div><div><div><br></div><div>The delays here are roughly 5 seconds each, but I've seen delays of 10, 15, 25, and 30 seconds as well -- always around a multiple of 5.</div>
<div><br></div><div>The script for setting up Yaws for the test case is below (note that the script will remove any directory named "yaws" in the current working directory). If you have any questions about the problem, issues with the test script, or if there are more details you'd like me to try to collect, let me know.</div>
<div><br></div><div>--steve</div><div><br></div><div><div>#!/bin/bash</div><div>set -e</div><div>[[ -d yaws ]] && rm -rf yaws</div><div>git clone <a href="https://github.com/klacke/yaws.git">https://github.com/klacke/yaws.git</a></div>
<div>cd yaws</div><div>autoconf</div><div>./configure</div><div>ul=$(ulimit -n)</div><div>(( $ul < 1024 )) && ulimit -n 1024</div><div>make all</div><div>cd test</div><div>make all</div><div>cd t2</div><div>set +e</div>
<div>make setup test</div><div>yawspid=$(ps xa | grep yaw'[s]' | awk '{print $1}')</div><div>[[ -n "$yawspid" ]] && kill $yawspid</div><div>exec ../../bin/yaws -i --conf yaws.conf</div></div>
<div><br></div></div></div>