<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">Hi!<br>
<br>
There's a patched version of the R15B02 dll in my public dropbox,
under the name r15.beam.smp.dll:<br>
<br>
<a class="moz-txt-link-freetext" href="http://dl.dropbox.com/u/17212223/r15.beam.smp.dll">http://dl.dropbox.com/u/17212223/r15.beam.smp.dll</a><br>
<br>
If you replace the R15 beam.smp.dll with this one, the werl slogan
should contain the version erts-5.9.2.0.1, if you could try that
on the real app, I would be immensely grateful!<br>
<br>
Cheers,<br>
/Patrik <br>
<br>
On 03/12/2013 02:09 PM, Vance Shipley wrote:<br>
</div>
<blockquote
cite="mid:CAMoa0NL4PnNXJRe+1D0hnigeBqbtQrHt8hYiF-5kC9sO1smaPw@mail.gmail.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=ISO-8859-1">
<p dir="ltr">C</p>
<div class="gmail_quote">On Mar 5, 2013 6:56 AM, "Garret Smith"
<<a moz-do-not-send="true"
href="mailto:garret.smith@gmail.com">garret.smith@gmail.com</a>>
wrote:<br type="attribution">
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">
<div>
<div>
<div>
<div>
<div>
<div>
<div>I have been beating my head against a wall
for weeks tracking down spooky behaviour[sic]
in one of our production systems. I finally
tracked it down to "jumps" in the times
returned by erlang:now(), causing all timers
in the system to expire at once. I have
witnessed this bug on R15B01, both 64 and
32-bit versions running on Windows Server 2008
R2, both on bare metal and VirtualBox VM.<br>
<br>
</div>
The time jump is always around 2126000 seconds,
or a little over 24 days. The now() time does
not try to converge with os:timestamp() as the
documentation suggests, and as I confirmed it
does if you just change the system clock.<br>
<br>
</div>
Another VM running concurrently on the same
machine but with little load (diagnostic node
& production node) did not time jump.<br>
<br>
</div>
<div>Higher load seems to make the time jumps happen
more often.<br>
</div>
<div><br>
</div>
Frequency between time jumps varies between seconds
and hours, but when a jump occurs, it is always
2126000 + (9 to 26) seconds.<br>
<br>
</div>
I never see the jump in logfile timestamps that use
os:timestamp() for tagging log messages. I had to
start tracing a production node before I caught the
jump. Here are some lines from a trace, where the
timestamp in trace_ts is printed using
calendar:now_to_local_time() and then in raw tuple
format:<br>
<br>
2013-4-16 21:40:1.993399|{1366,173601,993399}<br>
2013-4-16 21:40:1.993400|{1366,173601,993400}<br>
2013-5-11 12:13:41.986961|{1368,299621,986961}<br>
2013-5-11 12:13:41.986962|{1368,299621,986962}<br>
<br>
</div>
then a bit later...<br>
<br>
2013-5-11 12:36:19.955129|{1368,300979,955129}<br>
2013-5-11 12:36:19.955130|{1368,300979,955130}<br>
2013-6-5 3:9:49.538830|{1370,426989,538830}<br>
2013-6-5 3:9:49.538833|{1370,426989,538833}<br>
<br>
</div>
I captured many such jumps over the course of a day or
so. Obviously from the dates, 2 jumps happened before I
started tracing.<br>
<br>
</div>
<div>I was able to reproduce the bug, though not as
efficiently as my production system, with the following
sample program: <a moz-do-not-send="true"
href="https://gist.github.com/garret-smith/5087169"
target="_blank">https://gist.github.com/garret-smith/5087169</a><br>
<br>
It took over an hour of runtime before the first time
jump. I am working on a better way to reproduce it at the
moment, but it's hard to test the test with a bug so
intermittent.<br>
</div>
<div><br>
</div>
<div>I am also testing various other VM versions. My first
hope was that this was limited to the 64-bit version where
we first encountered the problem, but a change to the
32-bit version has only made the problem happen less
often, not eliminated it.<br>
<br>
</div>
<div>We never saw this bug with R14B03 which we were running
previously to R15B01. However, system load is different
so I can't make a direct comparison. I did notice a few
significant updates to the Windows time related code
between R14B03 and R15:<br>
<br>
</div>
<div>git log sys_time.c<br>
</div>
<div><br>
</div>
<div>commit 46eb4359b05b220861453a869dc734480ec045a6<br>
Author: Patrik Nyblom <<a moz-do-not-send="true"
href="mailto:pan@erlang.org" target="_blank">pan@erlang.org</a>><br>
Date: Tue Dec 6 19:07:16 2011 +0100<br>
<br>
Emulate localtime, gmtime and mktime to enable
negative time_t<br>
<br>
commit 913f05af100e98a8665bbb6168e89fbcfe4ece75<br>
Author: Bj<C3><B6>rn-Egil Dahlberg <<a
moz-do-not-send="true" href="mailto:egil@erlang.org"
target="_blank">egil@erlang.org</a>><br>
Date: Fri Dec 2 15:25:06 2011 +0100<br>
<br>
Teach windows sys_localtime_r<br>
<br>
<br>
</div>
<div>I am completely stumped. What can I do next to help
track down the source of the bug?<br>
<br>
</div>
<div>Thanks,<br>
</div>
<div>Garret Smith<br>
</div>
</div>
<br>
_______________________________________________<br>
erlang-bugs mailing list<br>
<a moz-do-not-send="true" href="mailto:erlang-bugs@erlang.org">erlang-bugs@erlang.org</a><br>
<a moz-do-not-send="true"
href="http://erlang.org/mailman/listinfo/erlang-bugs"
target="_blank">http://erlang.org/mailman/listinfo/erlang-bugs</a><br>
<br>
</blockquote>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
erlang-bugs mailing list
<a class="moz-txt-link-abbreviated" href="mailto:erlang-bugs@erlang.org">erlang-bugs@erlang.org</a>
<a class="moz-txt-link-freetext" href="http://erlang.org/mailman/listinfo/erlang-bugs">http://erlang.org/mailman/listinfo/erlang-bugs</a>
</pre>
</blockquote>
<br>
</body>
</html>