<div dir="ltr"><div><div><div><div>I haven't seen anything unexpected in os:timestamp(). No jumps at all.<br><br></div><div>CPU is an Intel Xeon X3430.<br><br>I have reproduced it in the LosAngeles/Pacific Time (GMT -8) and US East coast time zone (GMT -5).<br>
</div><br></div>I have not yet tried R16B. I'll be starting that today. I'm also trying to improve the test program, since it's taking quite a long time between jumps for me as well. I'll let you know as soon as I have a better one.<br>
<br></div>You have no idea how relieved I am that you are looking into this!<br><br></div><div>Thanks,<br></div>Garret Smith<br></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Tue, Mar 5, 2013 at 3:06 AM, Patrik Nyblom <span dir="ltr"><<a href="mailto:pan@erlang.org" target="_blank">pan@erlang.org</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div>Hi again...<br>
<br>
I'm not sure about one thing. What happens to os:timestamp()
during these jumps? Does it stay on track or does it also jump
around?<br>
<br>
I've tried to reproduce it with your program, but has not yet
succeeded. Have you seen this on the R16B release as well?<br>
<br>
Is the hardware in any way fancy (like a lot of cores, some new
processor I don't have or something else?) or is there anything
else special about the machine? Also the time zone you're running
in would be interesting, as there is some time zone specific code
there...<br>
<br>
I would really like to be able to reproduce it so you don't have
to do all the tests at your site, it might end up being really
time consuming for you if I make to many mistakes :) <br>
<br>
Cheers,<br>
/Patrik<div><div class="h5"><br>
<br>
<br>
On 03/05/2013 08:50 AM, Patrik Nyblom wrote:<br>
</div></div></div><div><div class="h5">
<blockquote type="cite">
<div>Hi!<br>
<br>
On 03/05/2013 02:26 AM, Garret Smith wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>
<div>
<div>
<div>
<div>
<div>
<div>I have been beating my head against a wall
for weeks tracking down spooky behaviour[sic] in
one of our production systems. I finally
tracked it down to "jumps" in the times returned
by erlang:now(), causing all timers in the
system to expire at once. I have witnessed this
bug on R15B01, both 64 and 32-bit versions
running on Windows Server 2008 R2, both on bare
metal and VirtualBox VM.<br>
<br>
</div>
The time jump is always around 2126000 seconds, or
a little over 24 days. The now() time does not
try to converge with os:timestamp() as the
documentation suggests, and as I confirmed it does
if you just change the system clock.<br>
<br>
</div>
Another VM running concurrently on the same machine
but with little load (diagnostic node &
production node) did not time jump.<br>
<br>
</div>
<div>Higher load seems to make the time jumps happen
more often.<br>
</div>
<div><br>
</div>
Frequency between time jumps varies between seconds
and hours, but when a jump occurs, it is always
2126000 + (9 to 26) seconds.<br>
<br>
</div>
I never see the jump in logfile timestamps that use
os:timestamp() for tagging log messages. I had to start
tracing a production node before I caught the jump.
Here are some lines from a trace, where the timestamp in
trace_ts is printed using calendar:now_to_local_time()
and then in raw tuple format:<br>
<br>
2013-4-16 21:40:1.993399|{1366,173601,993399}<br>
2013-4-16 21:40:1.993400|{1366,173601,993400}<br>
2013-5-11 12:13:41.986961|{1368,299621,986961}<br>
2013-5-11 12:13:41.986962|{1368,299621,986962}<br>
<br>
</div>
then a bit later...<br>
<br>
2013-5-11 12:36:19.955129|{1368,300979,955129}<br>
2013-5-11 12:36:19.955130|{1368,300979,955130}<br>
2013-6-5 3:9:49.538830|{1370,426989,538830}<br>
2013-6-5 3:9:49.538833|{1370,426989,538833}<br>
<br>
</div>
</div>
</div>
</blockquote>
Gah! That's obviously not supposed to happen... <br>
<blockquote type="cite">
<div dir="ltr">
<div>I captured many such jumps over the course of a day or
so. Obviously from the dates, 2 jumps happened before I
started tracing.<br>
<br>
</div>
<div>I was able to reproduce the bug, though not as
efficiently as my production system, with the following
sample program: <a href="https://gist.github.com/garret-smith/5087169" target="_blank">https://gist.github.com/garret-smith/5087169</a><br>
<br>
It took over an hour of runtime before the first time jump.
I am working on a better way to reproduce it at the moment,
but it's hard to test the test with a bug so intermittent.<br>
</div>
<div><br>
</div>
<div>I am also testing various other VM versions. My first
hope was that this was limited to the 64-bit version where
we first encountered the problem, but a change to the 32-bit
version has only made the problem happen less often, not
eliminated it.<br>
<br>
</div>
<div>We never saw this bug with R14B03 which we were running
previously to R15B01. However, system load is different so
I can't make a direct comparison. I did notice a few
significant updates to the Windows time related code between
R14B03 and R15:<br>
<br>
</div>
<div>git log sys_time.c<br>
</div>
<div><br>
</div>
<div>commit 46eb4359b05b220861453a869dc734480ec045a6<br>
Author: Patrik Nyblom <<a href="mailto:pan@erlang.org" target="_blank">pan@erlang.org</a>><br>
Date: Tue Dec 6 19:07:16 2011 +0100<br>
<br>
Emulate localtime, gmtime and mktime to enable negative
time_t<br>
<br>
commit 913f05af100e98a8665bbb6168e89fbcfe4ece75<br>
Author: Bj<C3><B6>rn-Egil Dahlberg <<a href="mailto:egil@erlang.org" target="_blank">egil@erlang.org</a>><br>
Date: Fri Dec 2 15:25:06 2011 +0100<br>
<br>
Teach windows sys_localtime_r<br>
<br>
<br>
</div>
</div>
</blockquote>
Yep, that's me... But even if I gave a totally weird time back
from those, the erlang:now logic should have stopped this from
happening. I'll try to reproduce using your example program. If
nothing else helps, I'll instrument a VM that gives som traces in
the time code...<br>
<blockquote type="cite">
<div dir="ltr">
<div>I am completely stumped. What can I do next to help
track down the source of the bug?<br>
<br>
</div>
</div>
</blockquote>
Unfortunately, so am I. Especially weird that it's load related...
Maybe something is not locked as it should be...<br>
<blockquote type="cite">
<div dir="ltr">
<div>Thanks,<br>
</div>
<div>Garret Smith<br>
</div>
</div>
</blockquote>
Thanks for reporting, I'll get back to you!<br>
<br>
Cheers,<br>
/Patrik<br>
<blockquote type="cite"> <br>
<fieldset></fieldset>
<br>
<pre>_______________________________________________
erlang-bugs mailing list
<a href="mailto:erlang-bugs@erlang.org" target="_blank">erlang-bugs@erlang.org</a>
<a href="http://erlang.org/mailman/listinfo/erlang-bugs" target="_blank">http://erlang.org/mailman/listinfo/erlang-bugs</a>
</pre>
</blockquote>
<br>
<br>
<fieldset></fieldset>
<br>
<pre>_______________________________________________
erlang-bugs mailing list
<a href="mailto:erlang-bugs@erlang.org" target="_blank">erlang-bugs@erlang.org</a>
<a href="http://erlang.org/mailman/listinfo/erlang-bugs" target="_blank">http://erlang.org/mailman/listinfo/erlang-bugs</a>
</pre>
</blockquote>
<br>
</div></div></div>
<br>_______________________________________________<br>
erlang-bugs mailing list<br>
<a href="mailto:erlang-bugs@erlang.org">erlang-bugs@erlang.org</a><br>
<a href="http://erlang.org/mailman/listinfo/erlang-bugs" target="_blank">http://erlang.org/mailman/listinfo/erlang-bugs</a><br>
<br></blockquote></div><br></div>