[erlang-bugs] OS X Mavericks prim_inet:recv0 delays

Steve Vinoski vinoski@REDACTED
Tue Mar 4 02:48:32 CET 2014


My Basho colleague Reid Draper recently encountered a strange problem with
RiakCS on OS X Mavericks (at least versions 10.9.1 and 10.9.2) where there
are intermittent multi-second delays when receiving data. It does not occur
on OS X versions 10.8.x or older. RiakCS uses Mochiweb internally, but I've
also reproduced the problem with Yaws. It usually happens with transfers of
10k bytes or more but I've seen it with transfers as small as 1k as well.
It occurs with all recent versions of Erlang that build easily on
Mavericks, starting with R15B03 up to and including 17-rc2. I've also tried
building Erlang with gcc instead of clang, but it happens either way.

Reid tried to reproduce the issue with non-Erlang approaches including
python and netcat, and did not succeed in reproducing the problem. I
haven't been able to create a small Erlang test case that exhibits the
problem, so I've included a script below that clones the Yaws repo, builds
it, and runs an interactive Yaws instance. To see the problem for yourself,
run the script, then when Yaws comes up, you'll be in an Erlang shell where
you can turn on dbg tracing and run a Yaws test as shown below:

dbg:tracer(), dbg:p(all,[call,timestamp,return_to]).
dbg:tpl(prim_inet,recv0,[]).
app_test:start([flush_large_post]).

The test might pass or it might fail, depending on how many transfer delays
occur. But either way, if you look at the dbg trace output, you'll likely
see cases where there are delays of multiples of 5 seconds within
prim_inet:recv0. For example, in the output below we see two consecutive
delays of 5 seconds within two prim_inet:recv0 calls:

(<0.83.0>) call prim_inet:recv0(#Port<0.1399>,2048000,10000) (Timestamp:
{1393,

896101,

405228})
(<0.83.0>) returning to yaws:cli_recv/3 (Timestamp: {1393,896106,516602})
(<0.83.0>) call prim_inet:recv0(#Port<0.1399>,2048000,10000) (Timestamp:
{1393,

896106,

521473})
(<0.83.0>) returning to yaws_server:flush/5 (Timestamp:
{1393,896111,827620})

The delays here are roughly 5 seconds each, but I've seen delays of 10, 15,
25, and 30 seconds as well -- always around a multiple of 5.

The script for setting up Yaws for the test case is below (note that the
script will remove any directory named "yaws" in the current working
directory). If you have any questions about the problem, issues with the
test script, or if there are more details you'd like me to try to collect,
let me know.

--steve

#!/bin/bash
set -e
[[ -d yaws ]] && rm -rf yaws
git clone https://github.com/klacke/yaws.git
cd yaws
autoconf
./configure
ul=$(ulimit -n)
(( $ul < 1024 )) && ulimit -n 1024
make all
cd test
make all
cd t2
set +e
make setup test
yawspid=$(ps xa | grep yaw'[s]' | awk '{print $1}')
[[ -n "$yawspid" ]] && kill $yawspid
exec ../../bin/yaws -i --conf yaws.conf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20140303/a7b70348/attachment.htm>


More information about the erlang-bugs mailing list