[erlang-questions] lowering jitter: best practices?

Wed May 27 11:07:36 CEST 2015

Hello,

This appears to be an interesting issue. First of all I’ve not seen a negative jitter and my impression was that your FSM loop suffer from measurement error. I’ve changed the measurement loop to be very-very tiny but there was not significant gain on jitter it is on par with your original code.

...
T0 = os:timestamp(),
receive
...
after TickInterval ->
   T1     = os:timestamp(),
   Jitter = timer:now_diff(T1, T0) - (TickInterval * 1000)
   ...
end
...

All-in-all, I’ve run the measurement both on my laptop Mac Book (Intel i5) and Amazon (cr1.8xlarge). The results was steady. You can find them below. I’ve seen jitter over 50ms for 10K process on my laptop only.
I am tend to thing that you are experience high jitter due to excessive CPU utilisation in your test bed. However, you can run the test with single FSM, the jitter is far from 0. May be it is time look VM internal. 

You have proposed three possible solutions in your earlier email, I am afraid all of them will suffer from ‘jitter’ due to multiple reasons including the overhead in network communication. The option #1 (timer per fsm) still looks more feasible from my perspective. You might implement adaptive timer to minimise experienced errors. 

Laptop:
kolesnik@REDACTED:tmp$ erl -sbt ts -sws very_eager -swt high
Erlang/OTP 17 [erts-6.2] [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false]

Eshell V6.2  (abort with ^G)
1> test_fsm5:go(1000, 50, 50, 1).
waiting for 1000 FSMs, tickrate 50
avg: 3164.69932160804
max: 13372
min: 4
median: 2426
95th: 8339
99th: 10552
all_done

AWS:

[ec2-user@REDACTED ~]$ /usr/local/xxx/erts-6.2/bin/erl -sbt ts -sws very_eager -swt high
Erlang/OTP 17 [erts-6.2] [source] [64-bit] [smp:32:32] [async-threads:10] [hipe] [kernel-poll:false]

Eshell V6.2  (abort with ^G)
1> test_fsm5:go(1000, 50, 50, 1).
waiting for 1000 FSMs, tickrate 50
avg: 998.798351758794
max: 1926
min: 82
median: 998
95th: 1152
99th: 1204
all_done

- Dmitry

> On 27 May 2015, at 00:03, Felix Gallo <felixgallo@REDACTED> wrote:
> 
> Innovative thinking, Jesper!  But in this case, in this testbed, the fsms aren't getting any messages other than those which they are delivering to themselves.  Which adds to the intrigue.  
> 
> I took your suggestion and tried using gen_fsm:start_timer/2.  Interestingly it slightly increased the jitter variance and the negative jitter issue is still present.  It's possible that my, ah, rapidly-and-pragmatically-built testbed suffers from some flaw, but I'm not seeing it.
> 
> Here's my code:
> 
> https://gist.github.com/anonymous/47cde5e60a619319053f <https://gist.github.com/anonymous/47cde5e60a619319053f>
> 
> Here's sample output on this small but moderately modern non-cloud osx machine:
> 
> > test_fsm5:go(1000,40,40,10).
> waiting for 1000 FSMs, tickrate 40
> avg: 1324.1012703862662
> max: 50219
> min: -184
> median: 1018
> 95th: 2615
> 99th: 9698
> 
> note that the max is 50ms of jitter; the min is negative 184 us jitter, and the median jitter is about 1ms, which correlates well with my beliefs about scheduler wakeup timers...
> 
> F.
> 
> 
> On Tue, May 26, 2015 at 12:09 PM, Jesper Louis Andersen <jesper.louis.andersen@REDACTED <mailto:jesper.louis.andersen@REDACTED>> wrote:
> 
> On Tue, May 26, 2015 at 8:52 PM, Felix Gallo <felixgallo@REDACTED <mailto:felixgallo@REDACTED>> wrote:
> {next_state,NextStateName,NewStateData,Timeout}
> 
> This explains why you sometimes get less than 30ms sleep times. If an event reaches the process before Timeout, then the timeout is not triggered. Also, it may explain the jitter you are seeing, because an early event will reset the timeout. Try using gen_fsm:start_timer/2 or erlang:send_after...
> 
> If the problem persists, check lcnt. If you are locked on the timer wheel, then consider release 18 :)
> 
> 
> -- 
> J.
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20150527/b2c27d4b/attachment.htm>