[erlang-questions] lowering jitter: best practices?

Wed May 27 14:17:35 CEST 2015

Hi,

I applied the following patch:

; diff -u test_fsm5.orig test_fsm5.erl

--- test_fsm5.orig 2015-05-27 13:54:34.381978128 +0200
+++ test_fsm5.erl 2015-05-27 13:51:38.521826422 +0200
@@ -10,7 +10,7 @@
 -define(MINCOUNT, 300). % iters.  Will not record jitter until this many
timeouts have passed.  Crude attempt to give schedulers settle time.

 init([TickInterval,NotificationPid]) ->
-  State = #state{ last_time = get_os_time(), tickinterval = TickInterval,
pid = NotificationPid },
+  State = #state{ last_time = erlang:monotonic_time(), tickinterval =
TickInterval, pid = NotificationPid },
   {ok, s_firststate, State, TickInterval}.

 handle_event(_Event, StateName, StateData) ->
@@ -29,10 +29,10 @@
   {ok, StateName, StateData}.

 s_firststate(timeout, #state{ last_time = LastTime, count = Count ,
tickinterval = TickInterval, pid = NotificationPid, jitters = Jitters } =
StateData) ->
-  NewTime = get_os_time(),
-  TimeDiff = NewTime - LastTime,
-  Jitter = TimeDiff - (TickInterval * 1000), % microseconds
+  NewTime = erlang:monotonic_time(),
   gen_fsm:start_timer(TickInterval, timeout),
+  TimeDiff = erlang:convert_time_unit(NewTime - LastTime, native,
micro_seconds),
+  Jitter = TimeDiff - (TickInterval * 1000), % microseconds
   case {(Count > ?MINCOUNT), (Count < ?MAXCOUNT)} of
     {false, true} ->
       { next_state, s_firststate, StateData#state{ last_time = NewTime,
count = Count + 1, pid = NotificationPid, jitters = Jitters } };
@@ -81,10 +81,6 @@
   report(TickFrom,NumFSMs),
   go_run(NumFSMs, TickFrom + TickStep, TickTo, TickStep).

-get_os_time() ->
-  {MegaS, S, MicroS} = os:timestamp(),
-  (MegaS * 1000000 * 1000000 + S * 1000000 + MicroS).
-
 await(0) -> ok;
 await(N) ->
   receive _ ->
@@ -93,6 +89,7 @@

 report(Tick, NumFSMs) ->
   X = lists:sort([A || {_, A} <- ets:lookup(metrics,jitter)]),
+  file:write_file("observations.txt", [[integer_to_binary(E), $\n] || E <-
X]),
   Avg = lists:sum(X)/length(X),
   Max = lists:max(X),
   Min = lists:min(X),

which switches to Erlang/OTP 18-rc2 and also uses the new time API. Machine
is a Core i7-4900MQ CPU, ArchLinux, fairly recent linux kernel (Linux
lady-of-pain 4.0.4-2-ARCH #1 SMP PREEMPT Fri May 22 03:05:23 UTC 2015
x86_64 GNU/Linux). It also dumps the jitter to "observations.txt" so we can
look at the observations.

We run `erl +sbt db +C multi_time_warp` to bind schedulers to cores and
request a timing mode which is able to maximize precise/accurate monotonic
time.

Then I ran the data through a bit of R:

> x <- read.csv("observations.txt", header=FALSE)
> require(ggplot2)
> p <- ggplot(x, aes(x = V1))
> png("observations.png")
> p + geom_density()
> dev.off()
X11cairo
       2
> summary(x)
       V1
 Min.   : 153
 1st Qu.: 973
 Median : 998
 Mean   :1001
 3rd Qu.:1028
 Max.   :8018

One may wonder what the spread is of the upper quantiles:

> quantile(x$V1,  c(.9, .95, .99, .999, .9999, .99999))
     90%      95%      99%    99.9%   99.99%  99.999%
1065.000 1083.000 1142.000 1818.001 5045.000 7945.010

The kernel density plot is attached, and it places itself around 1000 quite
precisely.

This is actually somewhat to be expected. When we request a timeout of
40ms, we are somewhere inside a milli-second T. We can be close to the
"start", i.e., T.001, or close to the "end", i.e., T.997. If we define T =
0 as our start point for our time, then we clearly can't wake up at P = 40,
because 40 - 0.997 is 39.003, which would violate our wait time and wake up
too early. Hence, we round our wake-time up to the next full milli-second,
which is 41. This tracks extremely well with our numbers.

But it is highly unlikely that our 1000 processes would all trigger in the
*same* millisecond, which would make a few of them round up to a different
timepoint for wakeup. At least this would be a plausible explanation for
jitter less than 1000 micro-seconds.

As for the 99.99th and 99.999th percentile, I think these can be attributed
to something else happening in the system: OS, Garbage Collection, etc. I'm
not sure the culprit there is the timer wheels.

Another point worth mentioning is that if 1000 processes wake up at the
same millisecond edge, then they will queue over N cores. So invariably,
this means some of the processes will have jitter. The workload you are
working with is very "spiky" in this case, even if the average load on the
system is very low. Do the math, assuming perfect spread, each core gets
1000 div 8 = 125 processes. They all awaken at the same time. Even if we
can handle each process in 1 micro-second, there will be a process with a
125 micro-second latency. That is, since we pack so many processes in a
short window, the system becomes more sensitive to small fluctuations. The
numbers suggest we handle each process in less than a microsecond.

Going even faster, in the low nanoseconds, requires a change of system,
since Erlang isn't the right tool for the job at that scale. You need to
pack data in arrays to get better cache-access patterns at that scale since
a DRAM hit is roughly 30ns (or more!). The functional nature of Erlang will
hurt here. This is usually a job for OCaml, C, Go, GPUs, FPGAs or ASICs.


On Tue, May 26, 2015 at 11:03 PM, Felix Gallo <felixgallo@REDACTED> wrote:

> Innovative thinking, Jesper!  But in this case, in this testbed, the fsms
> aren't getting any messages other than those which they are delivering to
> themselves.  Which adds to the intrigue.
>
> I took your suggestion and tried using gen_fsm:start_timer/2.
> Interestingly it slightly increased the jitter variance and the negative
> jitter issue is still present.  It's possible that my, ah,
> rapidly-and-pragmatically-built testbed suffers from some flaw, but I'm not
> seeing it.
>
> Here's my code:
>
> https://gist.github.com/anonymous/47cde5e60a619319053f
>
> Here's sample output on this small but moderately modern non-cloud osx
> machine:
>
> > test_fsm5:go(1000,40,40,10).
> waiting for 1000 FSMs, tickrate 40
> avg: 1324.1012703862662
> max: 50219
> min: -184
> median: 1018
> 95th: 2615
> 99th: 9698
>
> note that the max is 50ms of jitter; the min is negative 184 us jitter,
> and the median jitter is about 1ms, which correlates well with my beliefs
> about scheduler wakeup timers...
>
> F.
>
>
> On Tue, May 26, 2015 at 12:09 PM, Jesper Louis Andersen <
> jesper.louis.andersen@REDACTED> wrote:
>
>>
>> On Tue, May 26, 2015 at 8:52 PM, Felix Gallo <felixgallo@REDACTED>
>> wrote:
>>
>>> {next_state,NextStateName,NewStateData,Timeout}
>>
>>
>> This explains why you sometimes get less than 30ms sleep times. If an
>> event reaches the process before Timeout, then the timeout is not
>> triggered. Also, it may explain the jitter you are seeing, because an early
>> event will reset the timeout. Try using gen_fsm:start_timer/2 or
>> erlang:send_after...
>>
>> If the problem persists, check lcnt. If you are locked on the timer
>> wheel, then consider release 18 :)
>>
>>
>> --
>> J.
>>
>
>


-- 
J.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20150527/364eb15f/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: observations.png
Type: image/png
Size: 10253 bytes
Desc: not available
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20150527/364eb15f/attachment.png>