[erlang-bugs] R15B01 erlang:now() jumping ~24 days into the future

Garret Smith garret.smith@REDACTED
Thu Apr 4 19:24:22 CEST 2013


Your patch fixes the problem in our duration testing, but unfortunately I
haven't got it on a production site yet.  Hopefully "real soon now".


On Thu, Apr 4, 2013 at 7:43 AM, Patrik Nyblom <pan@REDACTED> wrote:

>  Hi!
>
>
> On 03/12/2013 04:36 PM, Garret Smith wrote:
>
>
> On Mar 12, 2013 6:38 AM, "Patrik Nyblom" <pan@REDACTED> wrote:
> >
> > Hi!
> >
> > There's a patched version of the R15B02 dll in my public dropbox, under
> the name r15.beam.smp.dll:
> >
> > http://dl.dropbox.com/u/17212223/r15.beam.smp.dll
>
> R15B02 will work.  I'll get started but it will take a couple days to get
> everything built, deployed and watch for time jumps.
>
> Thank you for the binary!
>
> Any progress? Does it work (just curious :))
>
> Cheers,
> /Patrik
>
>  >
> > If you replace the R15 beam.smp.dll with this one, the werl slogan
> should contain the version erts-5.9.2.0.1, if you could try that on the
> real app, I would be immensely grateful!
> >
> > Cheers,
> > /Patrik
> >
> > On 03/12/2013 02:09 PM, Vance Shipley wrote:
> >>
> >> C
> >>
> >> On Mar 5, 2013 6:56 AM, "Garret Smith" <garret.smith@REDACTED> wrote:
> >>>
> >>> I have been beating my head against a wall for weeks tracking down
> spooky behaviour[sic] in one of our production systems.  I finally tracked
> it down to "jumps" in the times returned by erlang:now(), causing all
> timers in the system to expire at once.  I have witnessed this bug on
> R15B01, both 64 and 32-bit versions running on Windows Server 2008 R2, both
> on bare metal and VirtualBox VM.
> >>>
> >>> The time jump is always around 2126000 seconds, or a little over 24
> days.  The now() time does not try to converge with os:timestamp() as the
> documentation suggests, and as I confirmed it does if you just change the
> system clock.
> >>>
> >>> Another VM running concurrently on the same machine but with little
> load (diagnostic node & production node) did not time jump.
> >>>
> >>> Higher load seems to make the time jumps happen more often.
> >>>
> >>> Frequency between time jumps varies between seconds and hours, but
> when a jump occurs, it is always 2126000 + (9 to 26) seconds.
> >>>
> >>> I never see the jump in logfile timestamps that use os:timestamp() for
> tagging log messages.  I had to start tracing a production node before I
> caught the jump.  Here are some lines from a trace, where the timestamp in
> trace_ts is printed using calendar:now_to_local_time() and then in raw
> tuple format:
> >>>
> >>> 2013-4-16 21:40:1.993399|{1366,173601,993399}
> >>> 2013-4-16 21:40:1.993400|{1366,173601,993400}
> >>> 2013-5-11 12:13:41.986961|{1368,299621,986961}
> >>> 2013-5-11 12:13:41.986962|{1368,299621,986962}
> >>>
> >>> then a bit later...
> >>>
> >>> 2013-5-11 12:36:19.955129|{1368,300979,955129}
> >>> 2013-5-11 12:36:19.955130|{1368,300979,955130}
> >>> 2013-6-5 3:9:49.538830|{1370,426989,538830}
> >>> 2013-6-5 3:9:49.538833|{1370,426989,538833}
> >>>
> >>> I captured many such jumps over the course of a day or so.  Obviously
> from the dates, 2 jumps happened before I started tracing.
> >>>
> >>> I was able to reproduce the bug, though not as efficiently as my
> production system, with the following sample program:
> https://gist.github.com/garret-smith/5087169
> >>>
> >>> It took over an hour of runtime before the first time jump.  I am
> working on a better way to reproduce it at the moment, but it's hard to
> test the test with a bug so intermittent.
> >>>
> >>> I am also testing various other VM versions.  My first hope was that
> this was limited to the 64-bit version where we first encountered the
> problem, but a change to the 32-bit version has only made the problem
> happen less often, not eliminated it.
> >>>
> >>> We never saw this bug with R14B03 which we were running previously to
> R15B01.  However, system load is different so I can't make a direct
> comparison.  I did notice a few significant updates to the Windows time
> related code between R14B03 and R15:
> >>>
> >>> git log sys_time.c
> >>>
> >>> commit 46eb4359b05b220861453a869dc734480ec045a6
> >>> Author: Patrik Nyblom <pan@REDACTED>
> >>> Date:   Tue Dec 6 19:07:16 2011 +0100
> >>>
> >>>     Emulate localtime, gmtime and mktime to enable negative time_t
> >>>
> >>> commit 913f05af100e98a8665bbb6168e89fbcfe4ece75
> >>> Author: Bj<C3><B6>rn-Egil Dahlberg <egil@REDACTED>
> >>> Date:   Fri Dec 2 15:25:06 2011 +0100
> >>>
> >>>     Teach windows sys_localtime_r
> >>>
> >>>
> >>> I am completely stumped.  What can I do next to help track down the
> source of the bug?
> >>>
> >>> Thanks,
> >>> Garret Smith
> >>>
> >>> _______________________________________________
> >>> erlang-bugs mailing list
> >>> erlang-bugs@REDACTED
> >>> http://erlang.org/mailman/listinfo/erlang-bugs
> >>>
> >>
> >>
> >> _______________________________________________
> >> erlang-bugs mailing list
> >> erlang-bugs@REDACTED
> >> http://erlang.org/mailman/listinfo/erlang-bugs
> >
> >
> >
> > _______________________________________________
> > erlang-bugs mailing list
> > erlang-bugs@REDACTED
> > http://erlang.org/mailman/listinfo/erlang-bugs
> >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130404/f9ae7883/attachment.htm>


More information about the erlang-bugs mailing list