[erlang-questions] lowering jitter: best practices?

Thu May 28 01:25:08 CEST 2015

Jesper, thanks for the analysis.

Since your results suggested that R17 os:timestamp was heavily implicated
in my jitter error calculations (all too typical a result for
microbenchmarking, but I had to trust something), I tried again on a
variety of cloud linux boxes and got similar results, suggesting that
something may be up with R17.5.  Since you're further not seeing that with
R18rc2, I have to guess that either the new time API or the timer wheel
rework are improving matters.  Which is excellent news, even if it
currently doesn't compile on my box. :)

I'll do some more jitter exploration with R18rc2 in the coming weeks; if
anyone is reading this in the future and would like to know how it all went
down, please feel free to contact me for details.  Thanks to all for the
help.

F.

On Wed, May 27, 2015 at 5:17 AM, Jesper Louis Andersen <
jesper.louis.andersen@REDACTED> wrote:

> Hi,
>
> I applied the following patch:
>
> ; diff -u test_fsm5.orig test_fsm5.erl
> --- test_fsm5.orig 2015-05-27 13:54:34.381978128 +0200
> +++ test_fsm5.erl 2015-05-27 13:51:38.521826422 +0200
> @@ -10,7 +10,7 @@
>  -define(MINCOUNT, 300). % iters.  Will not record jitter until this many
> timeouts have passed.  Crude attempt to give schedulers settle time.
>
>  init([TickInterval,NotificationPid]) ->
> -  State = #state{ last_time = get_os_time(), tickinterval = TickInterval,
> pid = NotificationPid },
> +  State = #state{ last_time = erlang:monotonic_time(), tickinterval =
> TickInterval, pid = NotificationPid },
>    {ok, s_firststate, State, TickInterval}.
>
>  handle_event(_Event, StateName, StateData) ->
> @@ -29,10 +29,10 @@
>    {ok, StateName, StateData}.
>
>  s_firststate(timeout, #state{ last_time = LastTime, count = Count ,
> tickinterval = TickInterval, pid = NotificationPid, jitters = Jitters } =
> StateData) ->
> -  NewTime = get_os_time(),
> -  TimeDiff = NewTime - LastTime,
> -  Jitter = TimeDiff - (TickInterval * 1000), % microseconds
> +  NewTime = erlang:monotonic_time(),
>    gen_fsm:start_timer(TickInterval, timeout),
> +  TimeDiff = erlang:convert_time_unit(NewTime - LastTime, native,
> micro_seconds),
> +  Jitter = TimeDiff - (TickInterval * 1000), % microseconds
>    case {(Count > ?MINCOUNT), (Count < ?MAXCOUNT)} of
>      {false, true} ->
>        { next_state, s_firststate, StateData#state{ last_time = NewTime,
> count = Count + 1, pid = NotificationPid, jitters = Jitters } };
> @@ -81,10 +81,6 @@
>    report(TickFrom,NumFSMs),
>    go_run(NumFSMs, TickFrom + TickStep, TickTo, TickStep).
>
> -get_os_time() ->
> -  {MegaS, S, MicroS} = os:timestamp(),
> -  (MegaS * 1000000 * 1000000 + S * 1000000 + MicroS).
> -
>  await(0) -> ok;
>  await(N) ->
>    receive _ ->
> @@ -93,6 +89,7 @@
>
>  report(Tick, NumFSMs) ->
>    X = lists:sort([A || {_, A} <- ets:lookup(metrics,jitter)]),
> +  file:write_file("observations.txt", [[integer_to_binary(E), $\n] || E
> <- X]),
>    Avg = lists:sum(X)/length(X),
>    Max = lists:max(X),
>    Min = lists:min(X),
>
> which switches to Erlang/OTP 18-rc2 and also uses the new time API.
> Machine is a Core i7-4900MQ CPU, ArchLinux, fairly recent linux kernel
> (Linux lady-of-pain 4.0.4-2-ARCH #1 SMP PREEMPT Fri May 22 03:05:23 UTC
> 2015 x86_64 GNU/Linux). It also dumps the jitter to "observations.txt" so
> we can look at the observations.
>
> We run `erl +sbt db +C multi_time_warp` to bind schedulers to cores and
> request a timing mode which is able to maximize precise/accurate monotonic
> time.
>
> Then I ran the data through a bit of R:
>
> > x <- read.csv("observations.txt", header=FALSE)
> > require(ggplot2)
> > p <- ggplot(x, aes(x = V1))
> > png("observations.png")
> > p + geom_density()
> > dev.off()
> X11cairo
>        2
> > summary(x)
>        V1
>  Min.   : 153
>  1st Qu.: 973
>  Median : 998
>  Mean   :1001
>  3rd Qu.:1028
>  Max.   :8018
>
> One may wonder what the spread is of the upper quantiles:
>
> > quantile(x$V1,  c(.9, .95, .99, .999, .9999, .99999))
>      90%      95%      99%    99.9%   99.99%  99.999%
> 1065.000 1083.000 1142.000 1818.001 5045.000 7945.010
>
> The kernel density plot is attached, and it places itself around 1000
> quite precisely.
>
> This is actually somewhat to be expected. When we request a timeout of
> 40ms, we are somewhere inside a milli-second T. We can be close to the
> "start", i.e., T.001, or close to the "end", i.e., T.997. If we define T =
> 0 as our start point for our time, then we clearly can't wake up at P = 40,
> because 40 - 0.997 is 39.003, which would violate our wait time and wake up
> too early. Hence, we round our wake-time up to the next full milli-second,
> which is 41. This tracks extremely well with our numbers.
>
> But it is highly unlikely that our 1000 processes would all trigger in the
> *same* millisecond, which would make a few of them round up to a different
> timepoint for wakeup. At least this would be a plausible explanation for
> jitter less than 1000 micro-seconds.
>
> As for the 99.99th and 99.999th percentile, I think these can be
> attributed to something else happening in the system: OS, Garbage
> Collection, etc. I'm not sure the culprit there is the timer wheels.
>
> Another point worth mentioning is that if 1000 processes wake up at the
> same millisecond edge, then they will queue over N cores. So invariably,
> this means some of the processes will have jitter. The workload you are
> working with is very "spiky" in this case, even if the average load on the
> system is very low. Do the math, assuming perfect spread, each core gets
> 1000 div 8 = 125 processes. They all awaken at the same time. Even if we
> can handle each process in 1 micro-second, there will be a process with a
> 125 micro-second latency. That is, since we pack so many processes in a
> short window, the system becomes more sensitive to small fluctuations. The
> numbers suggest we handle each process in less than a microsecond.
>
> Going even faster, in the low nanoseconds, requires a change of system,
> since Erlang isn't the right tool for the job at that scale. You need to
> pack data in arrays to get better cache-access patterns at that scale since
> a DRAM hit is roughly 30ns (or more!). The functional nature of Erlang will
> hurt here. This is usually a job for OCaml, C, Go, GPUs, FPGAs or ASICs.
>
>
> On Tue, May 26, 2015 at 11:03 PM, Felix Gallo <felixgallo@REDACTED>
> wrote:
>
>> Innovative thinking, Jesper!  But in this case, in this testbed, the fsms
>> aren't getting any messages other than those which they are delivering to
>> themselves.  Which adds to the intrigue.
>>
>> I took your suggestion and tried using gen_fsm:start_timer/2.
>> Interestingly it slightly increased the jitter variance and the negative
>> jitter issue is still present.  It's possible that my, ah,
>> rapidly-and-pragmatically-built testbed suffers from some flaw, but I'm not
>> seeing it.
>>
>> Here's my code:
>>
>> https://gist.github.com/anonymous/47cde5e60a619319053f
>>
>> Here's sample output on this small but moderately modern non-cloud osx
>> machine:
>>
>> > test_fsm5:go(1000,40,40,10).
>> waiting for 1000 FSMs, tickrate 40
>> avg: 1324.1012703862662
>> max: 50219
>> min: -184
>> median: 1018
>> 95th: 2615
>> 99th: 9698
>>
>> note that the max is 50ms of jitter; the min is negative 184 us jitter,
>> and the median jitter is about 1ms, which correlates well with my beliefs
>> about scheduler wakeup timers...
>>
>> F.
>>
>>
>> On Tue, May 26, 2015 at 12:09 PM, Jesper Louis Andersen <
>> jesper.louis.andersen@REDACTED> wrote:
>>
>>>
>>> On Tue, May 26, 2015 at 8:52 PM, Felix Gallo <felixgallo@REDACTED>
>>> wrote:
>>>
>>>> {next_state,NextStateName,NewStateData,Timeout}
>>>
>>>
>>> This explains why you sometimes get less than 30ms sleep times. If an
>>> event reaches the process before Timeout, then the timeout is not
>>> triggered. Also, it may explain the jitter you are seeing, because an early
>>> event will reset the timeout. Try using gen_fsm:start_timer/2 or
>>> erlang:send_after...
>>>
>>> If the problem persists, check lcnt. If you are locked on the timer
>>> wheel, then consider release 18 :)
>>>
>>>
>>> --
>>> J.
>>>
>>
>>
>
>
> --
> J.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20150527/9902e269/attachment.htm>