[erlang-bugs] R15B01 erlang:now() jumping ~24 days into the future

Patrik Nyblom pan@REDACTED
Tue Mar 12 14:38:28 CET 2013


Hi!

There's a patched version of the R15B02 dll in my public dropbox, under 
the name r15.beam.smp.dll:

http://dl.dropbox.com/u/17212223/r15.beam.smp.dll

If you replace the R15 beam.smp.dll with this one, the werl slogan 
should contain the version erts-5.9.2.0.1, if you could try that on the 
real app, I would be immensely grateful!

Cheers,
/Patrik

On 03/12/2013 02:09 PM, Vance Shipley wrote:
>
> C
>
> On Mar 5, 2013 6:56 AM, "Garret Smith" <garret.smith@REDACTED 
> <mailto:garret.smith@REDACTED>> wrote:
>
>     I have been beating my head against a wall for weeks tracking down
>     spooky behaviour[sic] in one of our production systems.  I finally
>     tracked it down to "jumps" in the times returned by erlang:now(),
>     causing all timers in the system to expire at once.  I have
>     witnessed this bug on R15B01, both 64 and 32-bit versions running
>     on Windows Server 2008 R2, both on bare metal and VirtualBox VM.
>
>     The time jump is always around 2126000 seconds, or a little over
>     24 days.  The now() time does not try to converge with
>     os:timestamp() as the documentation suggests, and as I confirmed
>     it does if you just change the system clock.
>
>     Another VM running concurrently on the same machine but with
>     little load (diagnostic node & production node) did not time jump.
>
>     Higher load seems to make the time jumps happen more often.
>
>     Frequency between time jumps varies between seconds and hours, but
>     when a jump occurs, it is always 2126000 + (9 to 26) seconds.
>
>     I never see the jump in logfile timestamps that use os:timestamp()
>     for tagging log messages.  I had to start tracing a production
>     node before I caught the jump.  Here are some lines from a trace,
>     where the timestamp in trace_ts is printed using
>     calendar:now_to_local_time() and then in raw tuple format:
>
>     2013-4-16 21:40:1.993399|{1366,173601,993399}
>     2013-4-16 21:40:1.993400|{1366,173601,993400}
>     2013-5-11 12:13:41.986961|{1368,299621,986961}
>     2013-5-11 12:13:41.986962|{1368,299621,986962}
>
>     then a bit later...
>
>     2013-5-11 12:36:19.955129|{1368,300979,955129}
>     2013-5-11 12:36:19.955130|{1368,300979,955130}
>     2013-6-5 3:9:49.538830|{1370,426989,538830}
>     2013-6-5 3:9:49.538833|{1370,426989,538833}
>
>     I captured many such jumps over the course of a day or so. 
>     Obviously from the dates, 2 jumps happened before I started tracing.
>
>     I was able to reproduce the bug, though not as efficiently as my
>     production system, with the following sample program:
>     https://gist.github.com/garret-smith/5087169
>
>     It took over an hour of runtime before the first time jump.  I am
>     working on a better way to reproduce it at the moment, but it's
>     hard to test the test with a bug so intermittent.
>
>     I am also testing various other VM versions.  My first hope was
>     that this was limited to the 64-bit version where we first
>     encountered the problem, but a change to the 32-bit version has
>     only made the problem happen less often, not eliminated it.
>
>     We never saw this bug with R14B03 which we were running previously
>     to R15B01.  However, system load is different so I can't make a
>     direct comparison.  I did notice a few significant updates to the
>     Windows time related code between R14B03 and R15:
>
>     git log sys_time.c
>
>     commit 46eb4359b05b220861453a869dc734480ec045a6
>     Author: Patrik Nyblom <pan@REDACTED <mailto:pan@REDACTED>>
>     Date:   Tue Dec 6 19:07:16 2011 +0100
>
>         Emulate localtime, gmtime and mktime to enable negative time_t
>
>     commit 913f05af100e98a8665bbb6168e89fbcfe4ece75
>     Author: Bj<C3><B6>rn-Egil Dahlberg <egil@REDACTED
>     <mailto:egil@REDACTED>>
>     Date:   Fri Dec 2 15:25:06 2011 +0100
>
>         Teach windows sys_localtime_r
>
>
>     I am completely stumped.  What can I do next to help track down
>     the source of the bug?
>
>     Thanks,
>     Garret Smith
>
>     _______________________________________________
>     erlang-bugs mailing list
>     erlang-bugs@REDACTED <mailto:erlang-bugs@REDACTED>
>     http://erlang.org/mailman/listinfo/erlang-bugs
>
>
>
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://erlang.org/mailman/listinfo/erlang-bugs

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130312/394bad4d/attachment.htm>


More information about the erlang-bugs mailing list