From ferenc.holzhauser@REDACTED  Fri Mar  1 14:07:08 2013
From: ferenc.holzhauser@REDACTED (Ferenc Holzhauser)
Date: Fri, 1 Mar 2013 14:07:08 +0100
Subject: [erlang-bugs] R16B asn1 incompatibility could be more explicitly
	stated in readme
Message-ID: <CAPdcgvvDfO37sLpvKGC4nSZbAD225sQrWEiVwmMWLAC7mfnKug@mail.gmail.com>

Hi,

After updating my development machine to R16B a project that uses ASN1
encoding/decoding stopped working.
I need to recompile the ASN1 files (old generated modules try to use
asn1rt_ber_bin_v2 which disappeared in R16B) and also change code for the
new binary return of encode.

Although I'd love to have them backward compatible so I can try the new
nice things "for free" I'm not complaining at about these obvious
improvements.
After a certain level of refactoring, backward compatibility is difficult
to keep.

There are hints in the readme but IMO this incompatibility should be a bit
more explicitly mentioned.

BR,
Ferenc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130301/9c7ef715/attachment.htm>

From pan@REDACTED  Fri Mar  1 15:28:49 2013
From: pan@REDACTED (Patrik Nyblom)
Date: Fri, 1 Mar 2013 15:28:49 +0100
Subject: [erlang-bugs] make -j 16 fails
	when	./configure	--with-dynamic-trace=systemtap
In-Reply-To: <512F7A2A.9030008@softlab.ntua.gr>
References: <510BD9E6.5040604@softlab.ntua.gr>
 <512E39E5.7030002@softlab.ntua.gr> <512F23E4.3070607@erlang.org>
 <512F7A2A.9030008@softlab.ntua.gr>
Message-ID: <5130BB21.9050504@erlang.org>

On 02/28/2013 04:39 PM, Yiannis Tsiouris wrote:
> Hi Patrik,
>
> On 02/28/2013 11:31 AM, Patrik Nyblom wrote:
>> On 02/27/2013 05:52 PM, Yiannis Tsiouris wrote:
>>> On 02/01/2013 05:06 PM, Yiannis Tsiouris wrote:
>>>> I'm trying to build an Erlang/OTP system configured with
>>>> --with-dynamic-trace=systemtap and it fails with:
>>>>> beam/dtrace-wrapper.h:49:27: error: erlang_dtrace.h: No such file or
>>>> directory
>>>>
>>>> I attach the full log for details.
>>>>
>>>> Let me state that this works well when I do a simple make (without the
>>>> -j flag). Is this a known issue?
>>>
>>> Has anyone done anothing for this? Because it's still failing...
>> Nope. Wasn't severe enough to make it into the last sprint for R16B.
>>
>> Can you try the attached patch and see if it works for you? 
> It works great! Thanks for finding some time to work on this! I 
> suspect that this is going to be committed in the erlang/otp/master 
> branch (and not included in R16B), right?
Yes, or rather in the maint branch (future R16B01), and via that into 
master master.
>
> yiannis
>
/Patrik


From essen@REDACTED  Sat Mar  2 21:17:35 2013
From: essen@REDACTED (=?ISO-8859-1?Q?Lo=EFc_Hoguin?=)
Date: Sat, 02 Mar 2013 21:17:35 +0100
Subject: [erlang-bugs] More HiPE issues with binaries
Message-ID: <51325E5F.7090000@ninenines.eu>

Hello,

Cowboy doesn't work when compiled with HiPE. When using curl on a simple 
hello world example, it sometimes work as expected, sometimes return a 
408 timeout error. When using http_load 
(http://acme.com/software/http_load/) on the same example, it sometimes 
work and sometimes throws a weird function_clause error.

=ERROR REPORT==== 2-Mar-2013::21:13:54 ===
Error in process <0.26124.0> with exit value: 
{function_clause,[{cowboy_protocol,parse_hd_name,9,[]},{lists,zip,2,[]}]}

As you can guess, lists:zip/2 doesn't call 
cowboy_protocol:parse_hd_name/9. Someone else reported a similar issue 
with the stacktrace in another project on IRC.

Same result with R15B03 and R16B.

Here are the steps to reproduce. Sorry it's not the smallest download, I 
can't isolate:

   git clone git://github.com/extend/cowboy.git
   cd cowboy/examples/hello_world
   rebar get-deps compile
   cd deps/cowboy
   ERLC_OPTS=+native make clean app
   cd -
   ./start.sh

Then with curl:

   curl -i http://localhost:8080

It will intermittently return 200 or 408.

With http_load:

   echo "http://localhost:8080" > urls.txt
   http_load -parallel 500 -seconds 10 urls.txt

It will print a lot of these weird errors:

=ERROR REPORT==== 2-Mar-2013::21:13:54 ===
Error in process <0.26098.0> with exit value: 
{function_clause,[{cowboy_protocol,parse_hd_name,9,[]},{lists,zip,2,[]}]}

Tell me how I can help get this fixed.

-- 
Lo?c Hoguin
Erlang Cowboy
Nine Nines
http://ninenines.eu


From kostis@REDACTED  Sat Mar  2 21:59:32 2013
From: kostis@REDACTED (Kostis Sagonas)
Date: Sat, 02 Mar 2013 21:59:32 +0100
Subject: [erlang-bugs] More HiPE issues with binaries
In-Reply-To: <51325E5F.7090000@ninenines.eu>
References: <51325E5F.7090000@ninenines.eu>
Message-ID: <51326834.5070901@cs.ntua.gr>

On 03/02/2013 09:17 PM, Lo?c Hoguin wrote:
> Hello,
>
> Cowboy doesn't work when compiled with HiPE. When using curl on a simple
> hello world example, it sometimes work as expected, sometimes return a
> 408 timeout error. When using http_load
> (http://acme.com/software/http_load/) on the same example, it sometimes
> work and sometimes throws a weird function_clause error.
>
> =ERROR REPORT==== 2-Mar-2013::21:13:54 ===
> Error in process <0.26124.0> with exit value:
> {function_clause,[{cowboy_protocol,parse_hd_name,9,[]},{lists,zip,2,[]}]}
>
> As you can guess, lists:zip/2 doesn't call
> cowboy_protocol:parse_hd_name/9. Someone else reported a similar issue
> with the stacktrace in another project on IRC.
>
> Same result with R15B03 and R16B.
>
> Here are the steps to reproduce. Sorry it's not the smallest download, I
> can't isolate:
>
> ...
>
> Tell me how I can help get this fixed.

Hi Lo?c,

One thing to know is that the stack traces that are produced when 
running native code are not as precise as those when running BEAM byte 
code. In particular, the stack (naturally) does not contain frames for 
tail calls and the stack walking component may occasionally be confused 
by mode-switches (e.g. byte code calling native code and vice versa). 
The latter is what most probably is happening here: you are most 
probably running with the 'lists' module not natively compiled.

Anyway, I'll put it on my TODO list to look at it but this period I am 
swamped. It would help to see if the bug persists if you configure with 
--enable-native-libs (if it does not then it's most probably something 
in the mode switch part) or if you can minimize it further to something 
with fewer cowboy files compiled to native code or at least something 
that always exhibits the error.

Cheers,
Kostis


From kostis@REDACTED  Sun Mar  3 15:16:27 2013
From: kostis@REDACTED (Kostis Sagonas)
Date: Sun, 03 Mar 2013 15:16:27 +0100
Subject: [erlang-bugs] More HiPE issues with binaries
In-Reply-To: <51325E5F.7090000@ninenines.eu>
References: <51325E5F.7090000@ninenines.eu>
Message-ID: <51335B3B.2070206@cs.ntua.gr>

On 03/02/2013 09:17 PM, Lo?c Hoguin wrote:
> Hello,
>
> Cowboy doesn't work when compiled with HiPE. When using curl on a simple
> hello world example, it sometimes work as expected, sometimes return a
> 408 timeout error. When using http_load
> (http://acme.com/software/http_load/) on the same example, it sometimes
> work and sometimes throws a weird function_clause error.
>
> ...
>
> Here are the steps to reproduce. Sorry it's not the smallest download, I
> can't isolate:

OK, I've spent two hours on this and was able to minimize down to file 
cowboy_protocol.erl, which seems to be responsible for the behavior you 
are reporting. With this file compiled to BEAM byte code and everything 
else compiled to native code, cowboy seems to be working fine on my 
tests. Can you please confirm?

If this file is the problematic one, perhaps you can disable native code 
compilation just for it for the time being. Also, it would help me if 
you trace all the calls to its functions and check whether their returns 
differ between byte code and native code execution.

I will look more into it when I find some time...

Kostis


From essen@REDACTED  Mon Mar  4 22:57:12 2013
From: essen@REDACTED (=?ISO-8859-1?Q?Lo=EFc_Hoguin?=)
Date: Mon, 04 Mar 2013 22:57:12 +0100
Subject: [erlang-bugs] More HiPE issues with binaries
In-Reply-To: <51335B3B.2070206@cs.ntua.gr>
References: <51325E5F.7090000@ninenines.eu> <51335B3B.2070206@cs.ntua.gr>
Message-ID: <513518B8.8000603@ninenines.eu>

On 03/03/2013 03:16 PM, Kostis Sagonas wrote:
> On 03/02/2013 09:17 PM, Lo?c Hoguin wrote:
>> Hello,
>>
>> Cowboy doesn't work when compiled with HiPE. When using curl on a simple
>> hello world example, it sometimes work as expected, sometimes return a
>> 408 timeout error. When using http_load
>> (http://acme.com/software/http_load/) on the same example, it sometimes
>> work and sometimes throws a weird function_clause error.
>>
>> ...
>>
>> Here are the steps to reproduce. Sorry it's not the smallest download, I
>> can't isolate:
>
> OK, I've spent two hours on this and was able to minimize down to file
> cowboy_protocol.erl, which seems to be responsible for the behavior you
> are reporting. With this file compiled to BEAM byte code and everything
> else compiled to native code, cowboy seems to be working fine on my
> tests. Can you please confirm?

Confirmed.

> If this file is the problematic one, perhaps you can disable native code
> compilation just for it for the time being. Also, it would help me if

I'm not using native, it was just an experiment. I would like to make it 
work for future users though.

> you trace all the calls to its functions and check whether their returns
> differ between byte code and native code execution.

We're investigating.

While doing so I found that erlang:display(binary_to_list(Buffer)) 
didn't work as expected (with just cowboy_protocol natively compiled). 
Perhaps you can add that to your todo list. io:format works fine but 
seems to reduce the probability that the bug happens (as does calling gc 
directly).

> I will look more into it when I find some time...

No worries. It's mostly just an interesting bug, I'm looking into it on 
my spare time too.

Thanks for the pointers.

-- 
Lo?c Hoguin
Erlang Cowboy
Nine Nines
http://ninenines.eu


From garret.smith@REDACTED  Tue Mar  5 02:26:49 2013
From: garret.smith@REDACTED (Garret Smith)
Date: Mon, 4 Mar 2013 17:26:49 -0800
Subject: [erlang-bugs] R15B01 erlang:now() jumping ~24 days into the future
Message-ID: <CAHmviK9hbY9peOE25=SDEpN1=G==GUPzAmmYq0YX-c5yL4KhTQ@mail.gmail.com>

I have been beating my head against a wall for weeks tracking down spooky
behaviour[sic] in one of our production systems.  I finally tracked it down
to "jumps" in the times returned by erlang:now(), causing all timers in the
system to expire at once.  I have witnessed this bug on R15B01, both 64 and
32-bit versions running on Windows Server 2008 R2, both on bare metal and
VirtualBox VM.

The time jump is always around 2126000 seconds, or a little over 24 days.
The now() time does not try to converge with os:timestamp() as the
documentation suggests, and as I confirmed it does if you just change the
system clock.

Another VM running concurrently on the same machine but with little load
(diagnostic node & production node) did not time jump.

Higher load seems to make the time jumps happen more often.

Frequency between time jumps varies between seconds and hours, but when a
jump occurs, it is always 2126000 + (9 to 26) seconds.

I never see the jump in logfile timestamps that use os:timestamp() for
tagging log messages.  I had to start tracing a production node before I
caught the jump.  Here are some lines from a trace, where the timestamp in
trace_ts is printed using calendar:now_to_local_time() and then in raw
tuple format:

2013-4-16 21:40:1.993399|{1366,173601,993399}
2013-4-16 21:40:1.993400|{1366,173601,993400}
2013-5-11 12:13:41.986961|{1368,299621,986961}
2013-5-11 12:13:41.986962|{1368,299621,986962}

then a bit later...

2013-5-11 12:36:19.955129|{1368,300979,955129}
2013-5-11 12:36:19.955130|{1368,300979,955130}
2013-6-5 3:9:49.538830|{1370,426989,538830}
2013-6-5 3:9:49.538833|{1370,426989,538833}

I captured many such jumps over the course of a day or so.  Obviously from
the dates, 2 jumps happened before I started tracing.

I was able to reproduce the bug, though not as efficiently as my production
system, with the following sample program:
https://gist.github.com/garret-smith/5087169

It took over an hour of runtime before the first time jump.  I am working
on a better way to reproduce it at the moment, but it's hard to test the
test with a bug so intermittent.

I am also testing various other VM versions.  My first hope was that this
was limited to the 64-bit version where we first encountered the problem,
but a change to the 32-bit version has only made the problem happen less
often, not eliminated it.

We never saw this bug with R14B03 which we were running previously to
R15B01.  However, system load is different so I can't make a direct
comparison.  I did notice a few significant updates to the Windows time
related code between R14B03 and R15:

git log sys_time.c

commit 46eb4359b05b220861453a869dc734480ec045a6
Author: Patrik Nyblom <pan@REDACTED>
Date:   Tue Dec 6 19:07:16 2011 +0100

    Emulate localtime, gmtime and mktime to enable negative time_t

commit 913f05af100e98a8665bbb6168e89fbcfe4ece75
Author: Bj<C3><B6>rn-Egil Dahlberg <egil@REDACTED>
Date:   Fri Dec 2 15:25:06 2011 +0100

    Teach windows sys_localtime_r


I am completely stumped.  What can I do next to help track down the source
of the bug?

Thanks,
Garret Smith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130304/9a32d5fa/attachment.htm>

From pan@REDACTED  Tue Mar  5 08:50:42 2013
From: pan@REDACTED (Patrik Nyblom)
Date: Tue, 5 Mar 2013 08:50:42 +0100
Subject: [erlang-bugs] R15B01 erlang:now() jumping ~24 days into the
	future
In-Reply-To: <CAHmviK9hbY9peOE25=SDEpN1=G==GUPzAmmYq0YX-c5yL4KhTQ@mail.gmail.com>
References: <CAHmviK9hbY9peOE25=SDEpN1=G==GUPzAmmYq0YX-c5yL4KhTQ@mail.gmail.com>
Message-ID: <5135A3D2.4080305@erlang.org>

Hi!

On 03/05/2013 02:26 AM, Garret Smith wrote:
> I have been beating my head against a wall for weeks tracking down 
> spooky behaviour[sic] in one of our production systems.  I finally 
> tracked it down to "jumps" in the times returned by erlang:now(), 
> causing all timers in the system to expire at once.  I have witnessed 
> this bug on R15B01, both 64 and 32-bit versions running on Windows 
> Server 2008 R2, both on bare metal and VirtualBox VM.
>
> The time jump is always around 2126000 seconds, or a little over 24 
> days.  The now() time does not try to converge with os:timestamp() as 
> the documentation suggests, and as I confirmed it does if you just 
> change the system clock.
>
> Another VM running concurrently on the same machine but with little 
> load (diagnostic node & production node) did not time jump.
>
> Higher load seems to make the time jumps happen more often.
>
> Frequency between time jumps varies between seconds and hours, but 
> when a jump occurs, it is always 2126000 + (9 to 26) seconds.
>
> I never see the jump in logfile timestamps that use os:timestamp() for 
> tagging log messages.  I had to start tracing a production node before 
> I caught the jump.  Here are some lines from a trace, where the 
> timestamp in trace_ts is printed using calendar:now_to_local_time() 
> and then in raw tuple format:
>
> 2013-4-16 21:40:1.993399|{1366,173601,993399}
> 2013-4-16 21:40:1.993400|{1366,173601,993400}
> 2013-5-11 12:13:41.986961|{1368,299621,986961}
> 2013-5-11 12:13:41.986962|{1368,299621,986962}
>
> then a bit later...
>
> 2013-5-11 12:36:19.955129|{1368,300979,955129}
> 2013-5-11 12:36:19.955130|{1368,300979,955130}
> 2013-6-5 3:9:49.538830|{1370,426989,538830}
> 2013-6-5 3:9:49.538833|{1370,426989,538833}
>
Gah! That's obviously not supposed to happen...
> I captured many such jumps over the course of a day or so. Obviously 
> from the dates, 2 jumps happened before I started tracing.
>
> I was able to reproduce the bug, though not as efficiently as my 
> production system, with the following sample program: 
> https://gist.github.com/garret-smith/5087169
>
> It took over an hour of runtime before the first time jump.  I am 
> working on a better way to reproduce it at the moment, but it's hard 
> to test the test with a bug so intermittent.
>
> I am also testing various other VM versions.  My first hope was that 
> this was limited to the 64-bit version where we first encountered the 
> problem, but a change to the 32-bit version has only made the problem 
> happen less often, not eliminated it.
>
> We never saw this bug with R14B03 which we were running previously to 
> R15B01.  However, system load is different so I can't make a direct 
> comparison.  I did notice a few significant updates to the Windows 
> time related code between R14B03 and R15:
>
> git log sys_time.c
>
> commit 46eb4359b05b220861453a869dc734480ec045a6
> Author: Patrik Nyblom <pan@REDACTED <mailto:pan@REDACTED>>
> Date:   Tue Dec 6 19:07:16 2011 +0100
>
>     Emulate localtime, gmtime and mktime to enable negative time_t
>
> commit 913f05af100e98a8665bbb6168e89fbcfe4ece75
> Author: Bj<C3><B6>rn-Egil Dahlberg <egil@REDACTED 
> <mailto:egil@REDACTED>>
> Date:   Fri Dec 2 15:25:06 2011 +0100
>
>     Teach windows sys_localtime_r
>
>
Yep, that's me... But even if I gave a totally weird time back from 
those, the erlang:now logic should have stopped this from happening. 
I'll try to reproduce using your example program. If nothing else helps, 
I'll instrument a VM that gives som traces in the time code...
> I am completely stumped.  What can I do next to help track down the 
> source of the bug?
>
Unfortunately, so am I. Especially weird that it's load related... Maybe 
something is not locked as it should be...
> Thanks,
> Garret Smith
Thanks for reporting, I'll get back to you!

Cheers,
/Patrik
>
>
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://erlang.org/mailman/listinfo/erlang-bugs

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130305/c911c3c7/attachment.htm>

From pan@REDACTED  Tue Mar  5 12:06:27 2013
From: pan@REDACTED (Patrik Nyblom)
Date: Tue, 5 Mar 2013 12:06:27 +0100
Subject: [erlang-bugs] R15B01 erlang:now() jumping ~24 days into the
	future
In-Reply-To: <5135A3D2.4080305@erlang.org>
References: <CAHmviK9hbY9peOE25=SDEpN1=G==GUPzAmmYq0YX-c5yL4KhTQ@mail.gmail.com>
 <5135A3D2.4080305@erlang.org>
Message-ID: <5135D1B3.8000400@erlang.org>

Hi again...

I'm not sure about one thing. What happens to os:timestamp() during 
these jumps? Does it stay on track or does it also jump around?

I've tried to reproduce it with your program, but has not yet succeeded. 
Have you seen this on the R16B release as well?

Is the hardware in any way fancy (like a lot of cores, some new 
processor I don't have or something else?) or is there anything else 
special about the machine? Also the time zone you're running in would be 
interesting, as there is some time zone specific code there...

I would really like to be able to reproduce it so you don't have to do 
all the tests at your site, it might end up being really time consuming 
for you if I make to many mistakes :)

Cheers,
/Patrik


On 03/05/2013 08:50 AM, Patrik Nyblom wrote:
> Hi!
>
> On 03/05/2013 02:26 AM, Garret Smith wrote:
>> I have been beating my head against a wall for weeks tracking down 
>> spooky behaviour[sic] in one of our production systems.  I finally 
>> tracked it down to "jumps" in the times returned by erlang:now(), 
>> causing all timers in the system to expire at once.  I have witnessed 
>> this bug on R15B01, both 64 and 32-bit versions running on Windows 
>> Server 2008 R2, both on bare metal and VirtualBox VM.
>>
>> The time jump is always around 2126000 seconds, or a little over 24 
>> days.  The now() time does not try to converge with os:timestamp() as 
>> the documentation suggests, and as I confirmed it does if you just 
>> change the system clock.
>>
>> Another VM running concurrently on the same machine but with little 
>> load (diagnostic node & production node) did not time jump.
>>
>> Higher load seems to make the time jumps happen more often.
>>
>> Frequency between time jumps varies between seconds and hours, but 
>> when a jump occurs, it is always 2126000 + (9 to 26) seconds.
>>
>> I never see the jump in logfile timestamps that use os:timestamp() 
>> for tagging log messages.  I had to start tracing a production node 
>> before I caught the jump. Here are some lines from a trace, where the 
>> timestamp in trace_ts is printed using calendar:now_to_local_time() 
>> and then in raw tuple format:
>>
>> 2013-4-16 21:40:1.993399|{1366,173601,993399}
>> 2013-4-16 21:40:1.993400|{1366,173601,993400}
>> 2013-5-11 12:13:41.986961|{1368,299621,986961}
>> 2013-5-11 12:13:41.986962|{1368,299621,986962}
>>
>> then a bit later...
>>
>> 2013-5-11 12:36:19.955129|{1368,300979,955129}
>> 2013-5-11 12:36:19.955130|{1368,300979,955130}
>> 2013-6-5 3:9:49.538830|{1370,426989,538830}
>> 2013-6-5 3:9:49.538833|{1370,426989,538833}
>>
> Gah! That's obviously not supposed to happen...
>> I captured many such jumps over the course of a day or so.  Obviously 
>> from the dates, 2 jumps happened before I started tracing.
>>
>> I was able to reproduce the bug, though not as efficiently as my 
>> production system, with the following sample program: 
>> https://gist.github.com/garret-smith/5087169
>>
>> It took over an hour of runtime before the first time jump. I am 
>> working on a better way to reproduce it at the moment, but it's hard 
>> to test the test with a bug so intermittent.
>>
>> I am also testing various other VM versions.  My first hope was that 
>> this was limited to the 64-bit version where we first encountered the 
>> problem, but a change to the 32-bit version has only made the problem 
>> happen less often, not eliminated it.
>>
>> We never saw this bug with R14B03 which we were running previously to 
>> R15B01.  However, system load is different so I can't make a direct 
>> comparison.  I did notice a few significant updates to the Windows 
>> time related code between R14B03 and R15:
>>
>> git log sys_time.c
>>
>> commit 46eb4359b05b220861453a869dc734480ec045a6
>> Author: Patrik Nyblom <pan@REDACTED <mailto:pan@REDACTED>>
>> Date:   Tue Dec 6 19:07:16 2011 +0100
>>
>>     Emulate localtime, gmtime and mktime to enable negative time_t
>>
>> commit 913f05af100e98a8665bbb6168e89fbcfe4ece75
>> Author: Bj<C3><B6>rn-Egil Dahlberg <egil@REDACTED 
>> <mailto:egil@REDACTED>>
>> Date:   Fri Dec 2 15:25:06 2011 +0100
>>
>>     Teach windows sys_localtime_r
>>
>>
> Yep, that's me... But even if I gave a totally weird time back from 
> those, the erlang:now logic should have stopped this from happening. 
> I'll try to reproduce using your example program. If nothing else 
> helps, I'll instrument a VM that gives som traces in the time code...
>> I am completely stumped.  What can I do next to help track down the 
>> source of the bug?
>>
> Unfortunately, so am I. Especially weird that it's load related... 
> Maybe something is not locked as it should be...
>> Thanks,
>> Garret Smith
> Thanks for reporting, I'll get back to you!
>
> Cheers,
> /Patrik
>>
>>
>> _______________________________________________
>> erlang-bugs mailing list
>> erlang-bugs@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-bugs
>
>
>
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://erlang.org/mailman/listinfo/erlang-bugs

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130305/4afb8b06/attachment.htm>

From garret.smith@REDACTED  Tue Mar  5 17:37:19 2013
From: garret.smith@REDACTED (Garret Smith)
Date: Tue, 5 Mar 2013 08:37:19 -0800
Subject: [erlang-bugs] R15B01 erlang:now() jumping ~24 days into the
	future
In-Reply-To: <5135D1B3.8000400@erlang.org>
References: <CAHmviK9hbY9peOE25=SDEpN1=G==GUPzAmmYq0YX-c5yL4KhTQ@mail.gmail.com>
 <5135A3D2.4080305@erlang.org> <5135D1B3.8000400@erlang.org>
Message-ID: <CAHmviK9xHH9b1aps6fHmCntHPkT2a+TWitzj7ftFEOsgZWuQgw@mail.gmail.com>

I haven't seen anything unexpected in os:timestamp().  No jumps at all.

CPU is an Intel Xeon X3430.

I have reproduced it in the LosAngeles/Pacific Time (GMT -8) and US East
coast time zone (GMT -5).

I have not yet tried R16B.  I'll be starting that today.  I'm also trying
to improve the test program, since it's taking quite a long time between
jumps for me as well.  I'll let you know as soon as I have a better one.

You have no idea how relieved I am that you are looking into this!

Thanks,
Garret Smith


On Tue, Mar 5, 2013 at 3:06 AM, Patrik Nyblom <pan@REDACTED> wrote:

>  Hi again...
>
> I'm not sure about one thing. What happens to os:timestamp() during these
> jumps? Does it stay on track or does it also jump around?
>
> I've tried to reproduce it with your program, but has not yet succeeded.
> Have you seen this on the R16B release as well?
>
> Is the hardware in any way fancy (like a lot of cores, some new processor
> I don't have or something else?) or is there anything else special about
> the machine? Also the time zone you're running in would be interesting, as
> there is some time zone specific code there...
>
> I would really like to be able to reproduce it so you don't have to do all
> the tests at your site, it might end up being really time consuming for you
> if I make to many mistakes :)
>
> Cheers,
> /Patrik
>
>
>
> On 03/05/2013 08:50 AM, Patrik Nyblom wrote:
>
> Hi!
>
> On 03/05/2013 02:26 AM, Garret Smith wrote:
>
>     I have been beating my head against a wall for weeks tracking down
> spooky behaviour[sic] in one of our production systems.  I finally tracked
> it down to "jumps" in the times returned by erlang:now(), causing all
> timers in the system to expire at once.  I have witnessed this bug on
> R15B01, both 64 and 32-bit versions running on Windows Server 2008 R2, both
> on bare metal and VirtualBox VM.
>
>  The time jump is always around 2126000 seconds, or a little over 24
> days.  The now() time does not try to converge with os:timestamp() as the
> documentation suggests, and as I confirmed it does if you just change the
> system clock.
>
>  Another VM running concurrently on the same machine but with little load
> (diagnostic node & production node) did not time jump.
>
>  Higher load seems to make the time jumps happen more often.
>
>  Frequency between time jumps varies between seconds and hours, but when a
> jump occurs, it is always 2126000 + (9 to 26) seconds.
>
>  I never see the jump in logfile timestamps that use os:timestamp() for
> tagging log messages.  I had to start tracing a production node before I
> caught the jump.  Here are some lines from a trace, where the timestamp in
> trace_ts is printed using calendar:now_to_local_time() and then in raw
> tuple format:
>
> 2013-4-16 21:40:1.993399|{1366,173601,993399}
> 2013-4-16 21:40:1.993400|{1366,173601,993400}
> 2013-5-11 12:13:41.986961|{1368,299621,986961}
> 2013-5-11 12:13:41.986962|{1368,299621,986962}
>
>  then a bit later...
>
> 2013-5-11 12:36:19.955129|{1368,300979,955129}
> 2013-5-11 12:36:19.955130|{1368,300979,955130}
> 2013-6-5 3:9:49.538830|{1370,426989,538830}
> 2013-6-5 3:9:49.538833|{1370,426989,538833}
>
>   Gah! That's obviously not supposed to happen...
>
>  I captured many such jumps over the course of a day or so.  Obviously
> from the dates, 2 jumps happened before I started tracing.
>
>  I was able to reproduce the bug, though not as efficiently as my
> production system, with the following sample program:
> https://gist.github.com/garret-smith/5087169
>
> It took over an hour of runtime before the first time jump.  I am working
> on a better way to reproduce it at the moment, but it's hard to test the
> test with a bug so intermittent.
>
>  I am also testing various other VM versions.  My first hope was that
> this was limited to the 64-bit version where we first encountered the
> problem, but a change to the 32-bit version has only made the problem
> happen less often, not eliminated it.
>
>  We never saw this bug with R14B03 which we were running previously to
> R15B01.  However, system load is different so I can't make a direct
> comparison.  I did notice a few significant updates to the Windows time
> related code between R14B03 and R15:
>
>  git log sys_time.c
>
>  commit 46eb4359b05b220861453a869dc734480ec045a6
> Author: Patrik Nyblom <pan@REDACTED>
> Date:   Tue Dec 6 19:07:16 2011 +0100
>
>     Emulate localtime, gmtime and mktime to enable negative time_t
>
> commit 913f05af100e98a8665bbb6168e89fbcfe4ece75
> Author: Bj<C3><B6>rn-Egil Dahlberg <egil@REDACTED>
> Date:   Fri Dec 2 15:25:06 2011 +0100
>
>     Teach windows sys_localtime_r
>
>
>   Yep, that's me... But even if I gave a totally weird time back from
> those, the erlang:now logic should have stopped this from happening. I'll
> try to reproduce using your example program. If nothing else helps, I'll
> instrument a VM that gives som traces in the time code...
>
>  I am completely stumped.  What can I do next to help track down the
> source of the bug?
>
>   Unfortunately, so am I. Especially weird that it's load related...
> Maybe something is not locked as it should be...
>
>  Thanks,
>  Garret Smith
>
> Thanks for reporting, I'll get back to you!
>
> Cheers,
> /Patrik
>
>
>
> _______________________________________________
> erlang-bugs mailing listerlang-bugs@REDACTED://erlang.org/mailman/listinfo/erlang-bugs
>
>
>
>
> _______________________________________________
> erlang-bugs mailing listerlang-bugs@REDACTED://erlang.org/mailman/listinfo/erlang-bugs
>
>
>
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://erlang.org/mailman/listinfo/erlang-bugs
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130305/b84b04c2/attachment.htm>

From garret.smith@REDACTED  Tue Mar  5 20:20:40 2013
From: garret.smith@REDACTED (Garret Smith)
Date: Tue, 5 Mar 2013 11:20:40 -0800
Subject: [erlang-bugs] R15B01 erlang:now() jumping ~24 days into the
	future
In-Reply-To: <CAHmviK9xHH9b1aps6fHmCntHPkT2a+TWitzj7ftFEOsgZWuQgw@mail.gmail.com>
References: <CAHmviK9hbY9peOE25=SDEpN1=G==GUPzAmmYq0YX-c5yL4KhTQ@mail.gmail.com>
 <5135A3D2.4080305@erlang.org> <5135D1B3.8000400@erlang.org>
 <CAHmviK9xHH9b1aps6fHmCntHPkT2a+TWitzj7ftFEOsgZWuQgw@mail.gmail.com>
Message-ID: <CAHmviK8++KrWacA8Kk1Vx4c8w6g4B61+91NQ+5sQwVx0E9YA-Q@mail.gmail.com>

The gist https://gist.github.com/garret-smith/5087169 is updated with a
slightly better version.  I was able to reproduce the jump in less than an
hour.  I also did some more things to perturb the timing code while the
test program was running.

Here is the latest info, everything I can think of that may have the
slightest effect:
 * R15B01 64-bit build
 * Pacific time zone (GMT -8)
 * Xeon E5405 in an HP DL160
 * no arguments to erl.exe
 * bursty, high CPU load, >75% memory used by other software
 * running Observer on the test VM displaying the "Load Charts" tab
 * made some small adjustments (~ 60 seconds) to the system clock while
running the tests - now() and os:timestamp() behaved as expected, initially
showing a delta and slowly converging
 * w32tm /resync to fix the system clock some time after perturbing it

The time jump in now() occurred when now() was ~9 seconds behind
os:timestamp() as reported by the new test program.

I'm starting to look at R16B now.

-Garret Smith


On Tue, Mar 5, 2013 at 8:37 AM, Garret Smith <garret.smith@REDACTED> wrote:

> I haven't seen anything unexpected in os:timestamp().  No jumps at all.
>
> CPU is an Intel Xeon X3430.
>
> I have reproduced it in the LosAngeles/Pacific Time (GMT -8) and US East
> coast time zone (GMT -5).
>
> I have not yet tried R16B.  I'll be starting that today.  I'm also trying
> to improve the test program, since it's taking quite a long time between
> jumps for me as well.  I'll let you know as soon as I have a better one.
>
> You have no idea how relieved I am that you are looking into this!
>
> Thanks,
> Garret Smith
>
>
> On Tue, Mar 5, 2013 at 3:06 AM, Patrik Nyblom <pan@REDACTED> wrote:
>
>>  Hi again...
>>
>> I'm not sure about one thing. What happens to os:timestamp() during these
>> jumps? Does it stay on track or does it also jump around?
>>
>> I've tried to reproduce it with your program, but has not yet succeeded.
>> Have you seen this on the R16B release as well?
>>
>> Is the hardware in any way fancy (like a lot of cores, some new processor
>> I don't have or something else?) or is there anything else special about
>> the machine? Also the time zone you're running in would be interesting, as
>> there is some time zone specific code there...
>>
>> I would really like to be able to reproduce it so you don't have to do
>> all the tests at your site, it might end up being really time consuming for
>> you if I make to many mistakes :)
>>
>> Cheers,
>> /Patrik
>>
>>
>>
>> On 03/05/2013 08:50 AM, Patrik Nyblom wrote:
>>
>> Hi!
>>
>> On 03/05/2013 02:26 AM, Garret Smith wrote:
>>
>>     I have been beating my head against a wall for weeks tracking down
>> spooky behaviour[sic] in one of our production systems.  I finally tracked
>> it down to "jumps" in the times returned by erlang:now(), causing all
>> timers in the system to expire at once.  I have witnessed this bug on
>> R15B01, both 64 and 32-bit versions running on Windows Server 2008 R2, both
>> on bare metal and VirtualBox VM.
>>
>>  The time jump is always around 2126000 seconds, or a little over 24
>> days.  The now() time does not try to converge with os:timestamp() as the
>> documentation suggests, and as I confirmed it does if you just change the
>> system clock.
>>
>>  Another VM running concurrently on the same machine but with little load
>> (diagnostic node & production node) did not time jump.
>>
>>  Higher load seems to make the time jumps happen more often.
>>
>>  Frequency between time jumps varies between seconds and hours, but when
>> a jump occurs, it is always 2126000 + (9 to 26) seconds.
>>
>>  I never see the jump in logfile timestamps that use os:timestamp() for
>> tagging log messages.  I had to start tracing a production node before I
>> caught the jump.  Here are some lines from a trace, where the timestamp in
>> trace_ts is printed using calendar:now_to_local_time() and then in raw
>> tuple format:
>>
>> 2013-4-16 21:40:1.993399|{1366,173601,993399}
>> 2013-4-16 21:40:1.993400|{1366,173601,993400}
>> 2013-5-11 12:13:41.986961|{1368,299621,986961}
>> 2013-5-11 12:13:41.986962|{1368,299621,986962}
>>
>>  then a bit later...
>>
>> 2013-5-11 12:36:19.955129|{1368,300979,955129}
>> 2013-5-11 12:36:19.955130|{1368,300979,955130}
>> 2013-6-5 3:9:49.538830|{1370,426989,538830}
>> 2013-6-5 3:9:49.538833|{1370,426989,538833}
>>
>>   Gah! That's obviously not supposed to happen...
>>
>>  I captured many such jumps over the course of a day or so.  Obviously
>> from the dates, 2 jumps happened before I started tracing.
>>
>>  I was able to reproduce the bug, though not as efficiently as my
>> production system, with the following sample program:
>> https://gist.github.com/garret-smith/5087169
>>
>> It took over an hour of runtime before the first time jump.  I am working
>> on a better way to reproduce it at the moment, but it's hard to test the
>> test with a bug so intermittent.
>>
>>  I am also testing various other VM versions.  My first hope was that
>> this was limited to the 64-bit version where we first encountered the
>> problem, but a change to the 32-bit version has only made the problem
>> happen less often, not eliminated it.
>>
>>  We never saw this bug with R14B03 which we were running previously to
>> R15B01.  However, system load is different so I can't make a direct
>> comparison.  I did notice a few significant updates to the Windows time
>> related code between R14B03 and R15:
>>
>>  git log sys_time.c
>>
>>  commit 46eb4359b05b220861453a869dc734480ec045a6
>> Author: Patrik Nyblom <pan@REDACTED>
>> Date:   Tue Dec 6 19:07:16 2011 +0100
>>
>>     Emulate localtime, gmtime and mktime to enable negative time_t
>>
>> commit 913f05af100e98a8665bbb6168e89fbcfe4ece75
>> Author: Bj<C3><B6>rn-Egil Dahlberg <egil@REDACTED>
>> Date:   Fri Dec 2 15:25:06 2011 +0100
>>
>>     Teach windows sys_localtime_r
>>
>>
>>   Yep, that's me... But even if I gave a totally weird time back from
>> those, the erlang:now logic should have stopped this from happening. I'll
>> try to reproduce using your example program. If nothing else helps, I'll
>> instrument a VM that gives som traces in the time code...
>>
>>  I am completely stumped.  What can I do next to help track down the
>> source of the bug?
>>
>>   Unfortunately, so am I. Especially weird that it's load related...
>> Maybe something is not locked as it should be...
>>
>>  Thanks,
>>  Garret Smith
>>
>> Thanks for reporting, I'll get back to you!
>>
>> Cheers,
>> /Patrik
>>
>>
>>
>> _______________________________________________
>> erlang-bugs mailing listerlang-bugs@REDACTED://erlang.org/mailman/listinfo/erlang-bugs
>>
>>
>>
>>
>> _______________________________________________
>> erlang-bugs mailing listerlang-bugs@REDACTED://erlang.org/mailman/listinfo/erlang-bugs
>>
>>
>>
>> _______________________________________________
>> erlang-bugs mailing list
>> erlang-bugs@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-bugs
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130305/4d3316ed/attachment.htm>

From garret.smith@REDACTED  Tue Mar  5 21:10:45 2013
From: garret.smith@REDACTED (Garret Smith)
Date: Tue, 5 Mar 2013 12:10:45 -0800
Subject: [erlang-bugs] R15B01 erlang:now() jumping ~24 days into the
	future
In-Reply-To: <CAHmviK8++KrWacA8Kk1Vx4c8w6g4B61+91NQ+5sQwVx0E9YA-Q@mail.gmail.com>
References: <CAHmviK9hbY9peOE25=SDEpN1=G==GUPzAmmYq0YX-c5yL4KhTQ@mail.gmail.com>
 <5135A3D2.4080305@erlang.org> <5135D1B3.8000400@erlang.org>
 <CAHmviK9xHH9b1aps6fHmCntHPkT2a+TWitzj7ftFEOsgZWuQgw@mail.gmail.com>
 <CAHmviK8++KrWacA8Kk1Vx4c8w6g4B61+91NQ+5sQwVx0E9YA-Q@mail.gmail.com>
Message-ID: <CAHmviK9G9kMfo_DAQrH+uXt+NouqgmXBLbtVF7+UQ2WP4fNBOA@mail.gmail.com>

On the same machine with the same steps as previous, I reproduced the time
jump on R16B.
This time the jump happened with a <5 sec delta btw now() and
os:timestamp().
Still jumping ~2126000 seconds.

-Garret


On Tue, Mar 5, 2013 at 11:20 AM, Garret Smith <garret.smith@REDACTED>wrote:

> The gist https://gist.github.com/garret-smith/5087169 is updated with a
> slightly better version.  I was able to reproduce the jump in less than an
> hour.  I also did some more things to perturb the timing code while the
> test program was running.
>
> Here is the latest info, everything I can think of that may have the
> slightest effect:
>  * R15B01 64-bit build
>  * Pacific time zone (GMT -8)
>  * Xeon E5405 in an HP DL160
>  * no arguments to erl.exe
>  * bursty, high CPU load, >75% memory used by other software
>  * running Observer on the test VM displaying the "Load Charts" tab
>  * made some small adjustments (~ 60 seconds) to the system clock while
> running the tests - now() and os:timestamp() behaved as expected, initially
> showing a delta and slowly converging
>  * w32tm /resync to fix the system clock some time after perturbing it
>
> The time jump in now() occurred when now() was ~9 seconds behind
> os:timestamp() as reported by the new test program.
>
> I'm starting to look at R16B now.
>
> -Garret Smith
>
>
> On Tue, Mar 5, 2013 at 8:37 AM, Garret Smith <garret.smith@REDACTED>wrote:
>
>> I haven't seen anything unexpected in os:timestamp().  No jumps at all.
>>
>> CPU is an Intel Xeon X3430.
>>
>> I have reproduced it in the LosAngeles/Pacific Time (GMT -8) and US East
>> coast time zone (GMT -5).
>>
>> I have not yet tried R16B.  I'll be starting that today.  I'm also trying
>> to improve the test program, since it's taking quite a long time between
>> jumps for me as well.  I'll let you know as soon as I have a better one.
>>
>> You have no idea how relieved I am that you are looking into this!
>>
>> Thanks,
>> Garret Smith
>>
>>
>> On Tue, Mar 5, 2013 at 3:06 AM, Patrik Nyblom <pan@REDACTED> wrote:
>>
>>>  Hi again...
>>>
>>> I'm not sure about one thing. What happens to os:timestamp() during
>>> these jumps? Does it stay on track or does it also jump around?
>>>
>>> I've tried to reproduce it with your program, but has not yet succeeded.
>>> Have you seen this on the R16B release as well?
>>>
>>> Is the hardware in any way fancy (like a lot of cores, some new
>>> processor I don't have or something else?) or is there anything else
>>> special about the machine? Also the time zone you're running in would be
>>> interesting, as there is some time zone specific code there...
>>>
>>> I would really like to be able to reproduce it so you don't have to do
>>> all the tests at your site, it might end up being really time consuming for
>>> you if I make to many mistakes :)
>>>
>>> Cheers,
>>> /Patrik
>>>
>>>
>>>
>>> On 03/05/2013 08:50 AM, Patrik Nyblom wrote:
>>>
>>> Hi!
>>>
>>> On 03/05/2013 02:26 AM, Garret Smith wrote:
>>>
>>>     I have been beating my head against a wall for weeks tracking down
>>> spooky behaviour[sic] in one of our production systems.  I finally tracked
>>> it down to "jumps" in the times returned by erlang:now(), causing all
>>> timers in the system to expire at once.  I have witnessed this bug on
>>> R15B01, both 64 and 32-bit versions running on Windows Server 2008 R2, both
>>> on bare metal and VirtualBox VM.
>>>
>>>  The time jump is always around 2126000 seconds, or a little over 24
>>> days.  The now() time does not try to converge with os:timestamp() as the
>>> documentation suggests, and as I confirmed it does if you just change the
>>> system clock.
>>>
>>>  Another VM running concurrently on the same machine but with little
>>> load (diagnostic node & production node) did not time jump.
>>>
>>>  Higher load seems to make the time jumps happen more often.
>>>
>>>  Frequency between time jumps varies between seconds and hours, but when
>>> a jump occurs, it is always 2126000 + (9 to 26) seconds.
>>>
>>>  I never see the jump in logfile timestamps that use os:timestamp() for
>>> tagging log messages.  I had to start tracing a production node before I
>>> caught the jump.  Here are some lines from a trace, where the timestamp in
>>> trace_ts is printed using calendar:now_to_local_time() and then in raw
>>> tuple format:
>>>
>>> 2013-4-16 21:40:1.993399|{1366,173601,993399}
>>> 2013-4-16 21:40:1.993400|{1366,173601,993400}
>>> 2013-5-11 12:13:41.986961|{1368,299621,986961}
>>> 2013-5-11 12:13:41.986962|{1368,299621,986962}
>>>
>>>  then a bit later...
>>>
>>> 2013-5-11 12:36:19.955129|{1368,300979,955129}
>>> 2013-5-11 12:36:19.955130|{1368,300979,955130}
>>> 2013-6-5 3:9:49.538830|{1370,426989,538830}
>>> 2013-6-5 3:9:49.538833|{1370,426989,538833}
>>>
>>>   Gah! That's obviously not supposed to happen...
>>>
>>>  I captured many such jumps over the course of a day or so.  Obviously
>>> from the dates, 2 jumps happened before I started tracing.
>>>
>>>  I was able to reproduce the bug, though not as efficiently as my
>>> production system, with the following sample program:
>>> https://gist.github.com/garret-smith/5087169
>>>
>>> It took over an hour of runtime before the first time jump.  I am
>>> working on a better way to reproduce it at the moment, but it's hard to
>>> test the test with a bug so intermittent.
>>>
>>>  I am also testing various other VM versions.  My first hope was that
>>> this was limited to the 64-bit version where we first encountered the
>>> problem, but a change to the 32-bit version has only made the problem
>>> happen less often, not eliminated it.
>>>
>>>  We never saw this bug with R14B03 which we were running previously to
>>> R15B01.  However, system load is different so I can't make a direct
>>> comparison.  I did notice a few significant updates to the Windows time
>>> related code between R14B03 and R15:
>>>
>>>  git log sys_time.c
>>>
>>>  commit 46eb4359b05b220861453a869dc734480ec045a6
>>> Author: Patrik Nyblom <pan@REDACTED>
>>> Date:   Tue Dec 6 19:07:16 2011 +0100
>>>
>>>     Emulate localtime, gmtime and mktime to enable negative time_t
>>>
>>> commit 913f05af100e98a8665bbb6168e89fbcfe4ece75
>>> Author: Bj<C3><B6>rn-Egil Dahlberg <egil@REDACTED>
>>> Date:   Fri Dec 2 15:25:06 2011 +0100
>>>
>>>     Teach windows sys_localtime_r
>>>
>>>
>>>   Yep, that's me... But even if I gave a totally weird time back from
>>> those, the erlang:now logic should have stopped this from happening. I'll
>>> try to reproduce using your example program. If nothing else helps, I'll
>>> instrument a VM that gives som traces in the time code...
>>>
>>>  I am completely stumped.  What can I do next to help track down the
>>> source of the bug?
>>>
>>>   Unfortunately, so am I. Especially weird that it's load related...
>>> Maybe something is not locked as it should be...
>>>
>>>  Thanks,
>>>  Garret Smith
>>>
>>> Thanks for reporting, I'll get back to you!
>>>
>>> Cheers,
>>> /Patrik
>>>
>>>
>>>
>>> _______________________________________________
>>> erlang-bugs mailing listerlang-bugs@REDACTED://erlang.org/mailman/listinfo/erlang-bugs
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> erlang-bugs mailing listerlang-bugs@REDACTED://erlang.org/mailman/listinfo/erlang-bugs
>>>
>>>
>>>
>>> _______________________________________________
>>> erlang-bugs mailing list
>>> erlang-bugs@REDACTED
>>> http://erlang.org/mailman/listinfo/erlang-bugs
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130305/6ac49118/attachment.htm>

From pan@REDACTED  Wed Mar  6 10:46:01 2013
From: pan@REDACTED (Patrik Nyblom)
Date: Wed, 6 Mar 2013 10:46:01 +0100
Subject: [erlang-bugs] R15B01 erlang:now() jumping ~24 days into the
	future
In-Reply-To: <CAHmviK9G9kMfo_DAQrH+uXt+NouqgmXBLbtVF7+UQ2WP4fNBOA@mail.gmail.com>
References: <CAHmviK9hbY9peOE25=SDEpN1=G==GUPzAmmYq0YX-c5yL4KhTQ@mail.gmail.com>
 <5135A3D2.4080305@erlang.org> <5135D1B3.8000400@erlang.org>
 <CAHmviK9xHH9b1aps6fHmCntHPkT2a+TWitzj7ftFEOsgZWuQgw@mail.gmail.com>
 <CAHmviK8++KrWacA8Kk1Vx4c8w6g4B61+91NQ+5sQwVx0E9YA-Q@mail.gmail.com>
 <CAHmviK9G9kMfo_DAQrH+uXt+NouqgmXBLbtVF7+UQ2WP4fNBOA@mail.gmail.com>
Message-ID: <51371059.2080700@erlang.org>

Thanks for all the extra info. I'll try the updated program, running all 
the steps you've described, on a four-core machine with Win2008 that 
I've setup for this. Hopefully I'll be able to reproduce it now :)

Thanks!

/Patrik

On 03/05/2013 09:10 PM, Garret Smith wrote:
> On the same machine with the same steps as previous, I reproduced the 
> time jump on R16B.
> This time the jump happened with a <5 sec delta btw now() and 
> os:timestamp().
> Still jumping ~2126000 seconds.
>
> -Garret
>
>
> On Tue, Mar 5, 2013 at 11:20 AM, Garret Smith <garret.smith@REDACTED 
> <mailto:garret.smith@REDACTED>> wrote:
>
>     The gist https://gist.github.com/garret-smith/5087169 is updated
>     with a slightly better version.  I was able to reproduce the jump
>     in less than an hour. I also did some more things to perturb the
>     timing code while the test program was running.
>
>     Here is the latest info, everything I can think of that may have
>     the slightest effect:
>      * R15B01 64-bit build
>      * Pacific time zone (GMT -8)
>      * Xeon E5405 in an HP DL160
>      * no arguments to erl.exe
>      * bursty, high CPU load, >75% memory used by other software
>      * running Observer on the test VM displaying the "Load Charts" tab
>      * made some small adjustments (~ 60 seconds) to the system clock
>     while running the tests - now() and os:timestamp() behaved as
>     expected, initially showing a delta and slowly converging
>      * w32tm /resync to fix the system clock some time after perturbing it
>
>     The time jump in now() occurred when now() was ~9 seconds behind
>     os:timestamp() as reported by the new test program.
>
>     I'm starting to look at R16B now.
>
>     -Garret Smith
>
>
>     On Tue, Mar 5, 2013 at 8:37 AM, Garret Smith
>     <garret.smith@REDACTED <mailto:garret.smith@REDACTED>> wrote:
>
>         I haven't seen anything unexpected in os:timestamp().  No
>         jumps at all.
>
>         CPU is an Intel Xeon X3430.
>
>         I have reproduced it in the LosAngeles/Pacific Time (GMT -8)
>         and US East coast time zone (GMT -5).
>
>         I have not yet tried R16B.  I'll be starting that today.  I'm
>         also trying to improve the test program, since it's taking
>         quite a long time between jumps for me as well.  I'll let you
>         know as soon as I have a better one.
>
>         You have no idea how relieved I am that you are looking into this!
>
>         Thanks,
>         Garret Smith
>
>
>         On Tue, Mar 5, 2013 at 3:06 AM, Patrik Nyblom <pan@REDACTED
>         <mailto:pan@REDACTED>> wrote:
>
>             Hi again...
>
>             I'm not sure about one thing. What happens to
>             os:timestamp() during these jumps? Does it stay on track
>             or does it also jump around?
>
>             I've tried to reproduce it with your program, but has not
>             yet succeeded. Have you seen this on the R16B release as well?
>
>             Is the hardware in any way fancy (like a lot of cores,
>             some new processor I don't have or something else?) or is
>             there anything else special about the machine? Also the
>             time zone you're running in would be interesting, as there
>             is some time zone specific code there...
>
>             I would really like to be able to reproduce it so you
>             don't have to do all the tests at your site, it might end
>             up being really time consuming for you if I make to many
>             mistakes :)
>
>             Cheers,
>             /Patrik
>
>
>
>             On 03/05/2013 08:50 AM, Patrik Nyblom wrote:
>>             Hi!
>>
>>             On 03/05/2013 02:26 AM, Garret Smith wrote:
>>>             I have been beating my head against a wall for weeks
>>>             tracking down spooky behaviour[sic] in one of our
>>>             production systems.  I finally tracked it down to
>>>             "jumps" in the times returned by erlang:now(), causing
>>>             all timers in the system to expire at once.  I have
>>>             witnessed this bug on R15B01, both 64 and 32-bit
>>>             versions running on Windows Server 2008 R2, both on bare
>>>             metal and VirtualBox VM.
>>>
>>>             The time jump is always around 2126000 seconds, or a
>>>             little over 24 days.  The now() time does not try to
>>>             converge with os:timestamp() as the documentation
>>>             suggests, and as I confirmed it does if you just change
>>>             the system clock.
>>>
>>>             Another VM running concurrently on the same machine but
>>>             with little load (diagnostic node & production node) did
>>>             not time jump.
>>>
>>>             Higher load seems to make the time jumps happen more often.
>>>
>>>             Frequency between time jumps varies between seconds and
>>>             hours, but when a jump occurs, it is always 2126000 + (9
>>>             to 26) seconds.
>>>
>>>             I never see the jump in logfile timestamps that use
>>>             os:timestamp() for tagging log messages. I had to start
>>>             tracing a production node before I caught the jump. 
>>>             Here are some lines from a trace, where the timestamp in
>>>             trace_ts is printed using calendar:now_to_local_time()
>>>             and then in raw tuple format:
>>>
>>>             2013-4-16 21:40:1.993399|{1366,173601,993399}
>>>             2013-4-16 21:40:1.993400|{1366,173601,993400}
>>>             2013-5-11 12:13:41.986961|{1368,299621,986961}
>>>             2013-5-11 12:13:41.986962|{1368,299621,986962}
>>>
>>>             then a bit later...
>>>
>>>             2013-5-11 12:36:19.955129|{1368,300979,955129}
>>>             2013-5-11 12:36:19.955130|{1368,300979,955130}
>>>             2013-6-5 3:9:49.538830|{1370,426989,538830}
>>>             2013-6-5 3:9:49.538833|{1370,426989,538833}
>>>
>>             Gah! That's obviously not supposed to happen...
>>>             I captured many such jumps over the course of a day or
>>>             so.  Obviously from the dates, 2 jumps happened before I
>>>             started tracing.
>>>
>>>             I was able to reproduce the bug, though not as
>>>             efficiently as my production system, with the following
>>>             sample program: https://gist.github.com/garret-smith/5087169
>>>
>>>             It took over an hour of runtime before the first time
>>>             jump.  I am working on a better way to reproduce it at
>>>             the moment, but it's hard to test the test with a bug so
>>>             intermittent.
>>>
>>>             I am also testing various other VM versions.  My first
>>>             hope was that this was limited to the 64-bit version
>>>             where we first encountered the problem, but a change to
>>>             the 32-bit version has only made the problem happen less
>>>             often, not eliminated it.
>>>
>>>             We never saw this bug with R14B03 which we were running
>>>             previously to R15B01.  However, system load is different
>>>             so I can't make a direct comparison.  I did notice a few
>>>             significant updates to the Windows time related code
>>>             between R14B03 and R15:
>>>
>>>             git log sys_time.c
>>>
>>>             commit 46eb4359b05b220861453a869dc734480ec045a6
>>>             Author: Patrik Nyblom <pan@REDACTED
>>>             <mailto:pan@REDACTED>>
>>>             Date:   Tue Dec 6 19:07:16 2011 +0100
>>>
>>>                 Emulate localtime, gmtime and mktime to enable
>>>             negative time_t
>>>
>>>             commit 913f05af100e98a8665bbb6168e89fbcfe4ece75
>>>             Author: Bj<C3><B6>rn-Egil Dahlberg <egil@REDACTED
>>>             <mailto:egil@REDACTED>>
>>>             Date:   Fri Dec 2 15:25:06 2011 +0100
>>>
>>>                 Teach windows sys_localtime_r
>>>
>>>
>>             Yep, that's me... But even if I gave a totally weird time
>>             back from those, the erlang:now logic should have stopped
>>             this from happening. I'll try to reproduce using your
>>             example program. If nothing else helps, I'll instrument a
>>             VM that gives som traces in the time code...
>>>             I am completely stumped.  What can I do next to help
>>>             track down the source of the bug?
>>>
>>             Unfortunately, so am I. Especially weird that it's load
>>             related... Maybe something is not locked as it should be...
>>>             Thanks,
>>>             Garret Smith
>>             Thanks for reporting, I'll get back to you!
>>
>>             Cheers,
>>             /Patrik
>>>
>>>
>>>             _______________________________________________
>>>             erlang-bugs mailing list
>>>             erlang-bugs@REDACTED  <mailto:erlang-bugs@REDACTED>
>>>             http://erlang.org/mailman/listinfo/erlang-bugs
>>
>>
>>
>>             _______________________________________________
>>             erlang-bugs mailing list
>>             erlang-bugs@REDACTED  <mailto:erlang-bugs@REDACTED>
>>             http://erlang.org/mailman/listinfo/erlang-bugs
>
>
>             _______________________________________________
>             erlang-bugs mailing list
>             erlang-bugs@REDACTED <mailto:erlang-bugs@REDACTED>
>             http://erlang.org/mailman/listinfo/erlang-bugs
>
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130306/2d94d4af/attachment.htm>

From pan@REDACTED  Thu Mar  7 16:37:13 2013
From: pan@REDACTED (Patrik Nyblom)
Date: Thu, 7 Mar 2013 16:37:13 +0100
Subject: [erlang-bugs] R15B01 erlang:now() jumping ~24 days into the
	future
In-Reply-To: <CAHmviK9G9kMfo_DAQrH+uXt+NouqgmXBLbtVF7+UQ2WP4fNBOA@mail.gmail.com>
References: <CAHmviK9hbY9peOE25=SDEpN1=G==GUPzAmmYq0YX-c5yL4KhTQ@mail.gmail.com>
 <5135A3D2.4080305@erlang.org> <5135D1B3.8000400@erlang.org>
 <CAHmviK9xHH9b1aps6fHmCntHPkT2a+TWitzj7ftFEOsgZWuQgw@mail.gmail.com>
 <CAHmviK8++KrWacA8Kk1Vx4c8w6g4B61+91NQ+5sQwVx0E9YA-Q@mail.gmail.com>
 <CAHmviK9G9kMfo_DAQrH+uXt+NouqgmXBLbtVF7+UQ2WP4fNBOA@mail.gmail.com>
Message-ID: <5138B429.4040605@erlang.org>

Hi Garret!

I've been able to reproduce it on my freshly installed Win2008 machine! 
Great, now I only need to debug it and find the error :)

I'll get back to you as soon as I feel I have a fix - it might take a 
few days, given the relatively long turn around time, but we'll get there!

Thank you for all the help and information!

Cheers,
/Patrik

On 03/05/2013 09:10 PM, Garret Smith wrote:
> On the same machine with the same steps as previous, I reproduced the 
> time jump on R16B.
> This time the jump happened with a <5 sec delta btw now() and 
> os:timestamp().
> Still jumping ~2126000 seconds.
>
> -Garret
>
>
> On Tue, Mar 5, 2013 at 11:20 AM, Garret Smith <garret.smith@REDACTED 
> <mailto:garret.smith@REDACTED>> wrote:
>
>     The gist https://gist.github.com/garret-smith/5087169 is updated
>     with a slightly better version.  I was able to reproduce the jump
>     in less than an hour. I also did some more things to perturb the
>     timing code while the test program was running.
>
>     Here is the latest info, everything I can think of that may have
>     the slightest effect:
>      * R15B01 64-bit build
>      * Pacific time zone (GMT -8)
>      * Xeon E5405 in an HP DL160
>      * no arguments to erl.exe
>      * bursty, high CPU load, >75% memory used by other software
>      * running Observer on the test VM displaying the "Load Charts" tab
>      * made some small adjustments (~ 60 seconds) to the system clock
>     while running the tests - now() and os:timestamp() behaved as
>     expected, initially showing a delta and slowly converging
>      * w32tm /resync to fix the system clock some time after perturbing it
>
>     The time jump in now() occurred when now() was ~9 seconds behind
>     os:timestamp() as reported by the new test program.
>
>     I'm starting to look at R16B now.
>
>     -Garret Smith
>
>
>     On Tue, Mar 5, 2013 at 8:37 AM, Garret Smith
>     <garret.smith@REDACTED <mailto:garret.smith@REDACTED>> wrote:
>
>         I haven't seen anything unexpected in os:timestamp().  No
>         jumps at all.
>
>         CPU is an Intel Xeon X3430.
>
>         I have reproduced it in the LosAngeles/Pacific Time (GMT -8)
>         and US East coast time zone (GMT -5).
>
>         I have not yet tried R16B.  I'll be starting that today.  I'm
>         also trying to improve the test program, since it's taking
>         quite a long time between jumps for me as well.  I'll let you
>         know as soon as I have a better one.
>
>         You have no idea how relieved I am that you are looking into this!
>
>         Thanks,
>         Garret Smith
>
>
>         On Tue, Mar 5, 2013 at 3:06 AM, Patrik Nyblom <pan@REDACTED
>         <mailto:pan@REDACTED>> wrote:
>
>             Hi again...
>
>             I'm not sure about one thing. What happens to
>             os:timestamp() during these jumps? Does it stay on track
>             or does it also jump around?
>
>             I've tried to reproduce it with your program, but has not
>             yet succeeded. Have you seen this on the R16B release as well?
>
>             Is the hardware in any way fancy (like a lot of cores,
>             some new processor I don't have or something else?) or is
>             there anything else special about the machine? Also the
>             time zone you're running in would be interesting, as there
>             is some time zone specific code there...
>
>             I would really like to be able to reproduce it so you
>             don't have to do all the tests at your site, it might end
>             up being really time consuming for you if I make to many
>             mistakes :)
>
>             Cheers,
>             /Patrik
>
>
>
>             On 03/05/2013 08:50 AM, Patrik Nyblom wrote:
>>             Hi!
>>
>>             On 03/05/2013 02:26 AM, Garret Smith wrote:
>>>             I have been beating my head against a wall for weeks
>>>             tracking down spooky behaviour[sic] in one of our
>>>             production systems.  I finally tracked it down to
>>>             "jumps" in the times returned by erlang:now(), causing
>>>             all timers in the system to expire at once.  I have
>>>             witnessed this bug on R15B01, both 64 and 32-bit
>>>             versions running on Windows Server 2008 R2, both on bare
>>>             metal and VirtualBox VM.
>>>
>>>             The time jump is always around 2126000 seconds, or a
>>>             little over 24 days.  The now() time does not try to
>>>             converge with os:timestamp() as the documentation
>>>             suggests, and as I confirmed it does if you just change
>>>             the system clock.
>>>
>>>             Another VM running concurrently on the same machine but
>>>             with little load (diagnostic node & production node) did
>>>             not time jump.
>>>
>>>             Higher load seems to make the time jumps happen more often.
>>>
>>>             Frequency between time jumps varies between seconds and
>>>             hours, but when a jump occurs, it is always 2126000 + (9
>>>             to 26) seconds.
>>>
>>>             I never see the jump in logfile timestamps that use
>>>             os:timestamp() for tagging log messages. I had to start
>>>             tracing a production node before I caught the jump. 
>>>             Here are some lines from a trace, where the timestamp in
>>>             trace_ts is printed using calendar:now_to_local_time()
>>>             and then in raw tuple format:
>>>
>>>             2013-4-16 21:40:1.993399|{1366,173601,993399}
>>>             2013-4-16 21:40:1.993400|{1366,173601,993400}
>>>             2013-5-11 12:13:41.986961|{1368,299621,986961}
>>>             2013-5-11 12:13:41.986962|{1368,299621,986962}
>>>
>>>             then a bit later...
>>>
>>>             2013-5-11 12:36:19.955129|{1368,300979,955129}
>>>             2013-5-11 12:36:19.955130|{1368,300979,955130}
>>>             2013-6-5 3:9:49.538830|{1370,426989,538830}
>>>             2013-6-5 3:9:49.538833|{1370,426989,538833}
>>>
>>             Gah! That's obviously not supposed to happen...
>>>             I captured many such jumps over the course of a day or
>>>             so.  Obviously from the dates, 2 jumps happened before I
>>>             started tracing.
>>>
>>>             I was able to reproduce the bug, though not as
>>>             efficiently as my production system, with the following
>>>             sample program: https://gist.github.com/garret-smith/5087169
>>>
>>>             It took over an hour of runtime before the first time
>>>             jump.  I am working on a better way to reproduce it at
>>>             the moment, but it's hard to test the test with a bug so
>>>             intermittent.
>>>
>>>             I am also testing various other VM versions.  My first
>>>             hope was that this was limited to the 64-bit version
>>>             where we first encountered the problem, but a change to
>>>             the 32-bit version has only made the problem happen less
>>>             often, not eliminated it.
>>>
>>>             We never saw this bug with R14B03 which we were running
>>>             previously to R15B01.  However, system load is different
>>>             so I can't make a direct comparison.  I did notice a few
>>>             significant updates to the Windows time related code
>>>             between R14B03 and R15:
>>>
>>>             git log sys_time.c
>>>
>>>             commit 46eb4359b05b220861453a869dc734480ec045a6
>>>             Author: Patrik Nyblom <pan@REDACTED
>>>             <mailto:pan@REDACTED>>
>>>             Date:   Tue Dec 6 19:07:16 2011 +0100
>>>
>>>                 Emulate localtime, gmtime and mktime to enable
>>>             negative time_t
>>>
>>>             commit 913f05af100e98a8665bbb6168e89fbcfe4ece75
>>>             Author: Bj<C3><B6>rn-Egil Dahlberg <egil@REDACTED
>>>             <mailto:egil@REDACTED>>
>>>             Date:   Fri Dec 2 15:25:06 2011 +0100
>>>
>>>                 Teach windows sys_localtime_r
>>>
>>>
>>             Yep, that's me... But even if I gave a totally weird time
>>             back from those, the erlang:now logic should have stopped
>>             this from happening. I'll try to reproduce using your
>>             example program. If nothing else helps, I'll instrument a
>>             VM that gives som traces in the time code...
>>>             I am completely stumped.  What can I do next to help
>>>             track down the source of the bug?
>>>
>>             Unfortunately, so am I. Especially weird that it's load
>>             related... Maybe something is not locked as it should be...
>>>             Thanks,
>>>             Garret Smith
>>             Thanks for reporting, I'll get back to you!
>>
>>             Cheers,
>>             /Patrik
>>>
>>>
>>>             _______________________________________________
>>>             erlang-bugs mailing list
>>>             erlang-bugs@REDACTED  <mailto:erlang-bugs@REDACTED>
>>>             http://erlang.org/mailman/listinfo/erlang-bugs
>>
>>
>>
>>             _______________________________________________
>>             erlang-bugs mailing list
>>             erlang-bugs@REDACTED  <mailto:erlang-bugs@REDACTED>
>>             http://erlang.org/mailman/listinfo/erlang-bugs
>
>
>             _______________________________________________
>             erlang-bugs mailing list
>             erlang-bugs@REDACTED <mailto:erlang-bugs@REDACTED>
>             http://erlang.org/mailman/listinfo/erlang-bugs
>
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130307/63f6815b/attachment.htm>

From kostis@REDACTED  Fri Mar  8 22:06:50 2013
From: kostis@REDACTED (Kostis Sagonas)
Date: Fri, 08 Mar 2013 22:06:50 +0100
Subject: [erlang-bugs] R16: HiPE failure with /bits in funs
In-Reply-To: <511AC8EC.3070507@ninenines.eu>
References: <511AC8EC.3070507@ninenines.eu>
Message-ID: <513A52EA.5070902@cs.ntua.gr>

On 02/12/2013 11:57 PM, Lo?c Hoguin wrote:
> The following module fails to compile with R16. It also fails on R15B03
> and probably previous versions. I do not know HiPE internals so no patch.
>
>
> -module(hipe_error).
> -export([run/0]).
>
> run() ->
> fun (<< $c, _/bits >>) -> ok end.
>
>
> The following errors are reported:
>
> 7> c(hipe_error, [native]).
> <HiPE (v 3.9.3)> EXITED with reason
> {function_clause,[{hipe_rtl_binary_match,gen_rtl,[{bs_match_string,<<99>>,1},[],[...

For archival reasons, I report that this particular problem has been 
fixed. The relevant patch will be sent soon for inclusion in 'pu'.

Kostis

PS. Drop me a mail if you are interested in obtaining the patch sooner.


From pan@REDACTED  Mon Mar 11 17:26:05 2013
From: pan@REDACTED (Patrik Nyblom)
Date: Mon, 11 Mar 2013 17:26:05 +0100
Subject: [erlang-bugs] R15B01 erlang:now() jumping ~24 days into the
	future
In-Reply-To: <5138B429.4040605@erlang.org>
References: <CAHmviK9hbY9peOE25=SDEpN1=G==GUPzAmmYq0YX-c5yL4KhTQ@mail.gmail.com>
 <5135A3D2.4080305@erlang.org> <5135D1B3.8000400@erlang.org>
 <CAHmviK9xHH9b1aps6fHmCntHPkT2a+TWitzj7ftFEOsgZWuQgw@mail.gmail.com>
 <CAHmviK8++KrWacA8Kk1Vx4c8w6g4B61+91NQ+5sQwVx0E9YA-Q@mail.gmail.com>
 <CAHmviK9G9kMfo_DAQrH+uXt+NouqgmXBLbtVF7+UQ2WP4fNBOA@mail.gmail.com>
 <5138B429.4040605@erlang.org>
Message-ID: <513E059D.70005@erlang.org>

Hi again!

I think I've found it. At least I've found one error, hopefully that's 
the one you've also found :)

The sys_gethrtime function has gon new uses in R15 and on, uses where it 
is no longer protected by the  erts_timeofday_mtx. So - it simply needs 
a lock of it's own. This gives a slight performance loss, but that could 
be fixed by using GetTickCount64 on win7 and win2008 at least.

Can you try a version of beam.smp.dll with a lock and see if the error 
is gone on your machines? If that works, I would also like you to try an 
optimized version, but let's first make sure we have the bug nailed down :)

In my dropbox, there's a beam.smp.dll. If you replace 
$ERL_ROOT/erts-5.10.1/bin/beam.smp.dll with that one and then start 
werl, the slogan should contain [source-be0da3e]. It is for 64bit 
windows. The public dropbox URL is:
http://dl.dropbox.com/u/17212223/beam.smp.dll

This should work without any special messages or such, giving a working 
erlang:now/0. If it starts sending strange ERROR REPORT's about ticks 
moving slightly backwards, we have a more complicated bug, but I haven't 
seen any such messages since i added proper locking.

If it's possible for you to test this, I would be immensely grateful!

Cheers,
/Patrik
On 03/07/2013 04:37 PM, Patrik Nyblom wrote:
> Hi Garret!
>
> I've been able to reproduce it on my freshly installed Win2008 
> machine! Great, now I only need to debug it and find the error :)
>
> I'll get back to you as soon as I feel I have a fix - it might take a 
> few days, given the relatively long turn around time, but we'll get there!
>
> Thank you for all the help and information!
>
> Cheers,
> /Patrik
>
> On 03/05/2013 09:10 PM, Garret Smith wrote:
>> On the same machine with the same steps as previous, I reproduced the 
>> time jump on R16B.
>> This time the jump happened with a <5 sec delta btw now() and 
>> os:timestamp().
>> Still jumping ~2126000 seconds.
>>
>> -Garret
>>
>>
>> On Tue, Mar 5, 2013 at 11:20 AM, Garret Smith <garret.smith@REDACTED 
>> <mailto:garret.smith@REDACTED>> wrote:
>>
>>     The gist https://gist.github.com/garret-smith/5087169 is updated
>>     with a slightly better version.  I was able to reproduce the jump
>>     in less than an hour.  I also did some more things to perturb the
>>     timing code while the test program was running.
>>
>>     Here is the latest info, everything I can think of that may have
>>     the slightest effect:
>>      * R15B01 64-bit build
>>      * Pacific time zone (GMT -8)
>>      * Xeon E5405 in an HP DL160
>>      * no arguments to erl.exe
>>      * bursty, high CPU load, >75% memory used by other software
>>      * running Observer on the test VM displaying the "Load Charts" tab
>>      * made some small adjustments (~ 60 seconds) to the system clock
>>     while running the tests - now() and os:timestamp() behaved as
>>     expected, initially showing a delta and slowly converging
>>      * w32tm /resync to fix the system clock some time after
>>     perturbing it
>>
>>     The time jump in now() occurred when now() was ~9 seconds behind
>>     os:timestamp() as reported by the new test program.
>>
>>     I'm starting to look at R16B now.
>>
>>     -Garret Smith
>>
>>
>>     On Tue, Mar 5, 2013 at 8:37 AM, Garret Smith
>>     <garret.smith@REDACTED <mailto:garret.smith@REDACTED>> wrote:
>>
>>         I haven't seen anything unexpected in os:timestamp().  No
>>         jumps at all.
>>
>>         CPU is an Intel Xeon X3430.
>>
>>         I have reproduced it in the LosAngeles/Pacific Time (GMT -8)
>>         and US East coast time zone (GMT -5).
>>
>>         I have not yet tried R16B.  I'll be starting that today.  I'm
>>         also trying to improve the test program, since it's taking
>>         quite a long time between jumps for me as well.  I'll let you
>>         know as soon as I have a better one.
>>
>>         You have no idea how relieved I am that you are looking into
>>         this!
>>
>>         Thanks,
>>         Garret Smith
>>
>>
>>         On Tue, Mar 5, 2013 at 3:06 AM, Patrik Nyblom <pan@REDACTED
>>         <mailto:pan@REDACTED>> wrote:
>>
>>             Hi again...
>>
>>             I'm not sure about one thing. What happens to
>>             os:timestamp() during these jumps? Does it stay on track
>>             or does it also jump around?
>>
>>             I've tried to reproduce it with your program, but has not
>>             yet succeeded. Have you seen this on the R16B release as
>>             well?
>>
>>             Is the hardware in any way fancy (like a lot of cores,
>>             some new processor I don't have or something else?) or is
>>             there anything else special about the machine? Also the
>>             time zone you're running in would be interesting, as
>>             there is some time zone specific code there...
>>
>>             I would really like to be able to reproduce it so you
>>             don't have to do all the tests at your site, it might end
>>             up being really time consuming for you if I make to many
>>             mistakes :)
>>
>>             Cheers,
>>             /Patrik
>>
>>
>>
>>             On 03/05/2013 08:50 AM, Patrik Nyblom wrote:
>>>             Hi!
>>>
>>>             On 03/05/2013 02:26 AM, Garret Smith wrote:
>>>>             I have been beating my head against a wall for weeks
>>>>             tracking down spooky behaviour[sic] in one of our
>>>>             production systems.  I finally tracked it down to
>>>>             "jumps" in the times returned by erlang:now(), causing
>>>>             all timers in the system to expire at once.  I have
>>>>             witnessed this bug on R15B01, both 64 and 32-bit
>>>>             versions running on Windows Server 2008 R2, both on
>>>>             bare metal and VirtualBox VM.
>>>>
>>>>             The time jump is always around 2126000 seconds, or a
>>>>             little over 24 days.  The now() time does not try to
>>>>             converge with os:timestamp() as the documentation
>>>>             suggests, and as I confirmed it does if you just change
>>>>             the system clock.
>>>>
>>>>             Another VM running concurrently on the same machine but
>>>>             with little load (diagnostic node & production node)
>>>>             did not time jump.
>>>>
>>>>             Higher load seems to make the time jumps happen more often.
>>>>
>>>>             Frequency between time jumps varies between seconds and
>>>>             hours, but when a jump occurs, it is always 2126000 +
>>>>             (9 to 26) seconds.
>>>>
>>>>             I never see the jump in logfile timestamps that use
>>>>             os:timestamp() for tagging log messages.  I had to
>>>>             start tracing a production node before I caught the
>>>>             jump.  Here are some lines from a trace, where the
>>>>             timestamp in trace_ts is printed using
>>>>             calendar:now_to_local_time() and then in raw tuple format:
>>>>
>>>>             2013-4-16 21:40:1.993399|{1366,173601,993399}
>>>>             2013-4-16 21:40:1.993400|{1366,173601,993400}
>>>>             2013-5-11 12:13:41.986961|{1368,299621,986961}
>>>>             2013-5-11 12:13:41.986962|{1368,299621,986962}
>>>>
>>>>             then a bit later...
>>>>
>>>>             2013-5-11 12:36:19.955129|{1368,300979,955129}
>>>>             2013-5-11 12:36:19.955130|{1368,300979,955130}
>>>>             2013-6-5 3:9:49.538830|{1370,426989,538830}
>>>>             2013-6-5 3:9:49.538833|{1370,426989,538833}
>>>>
>>>             Gah! That's obviously not supposed to happen...
>>>>             I captured many such jumps over the course of a day or
>>>>             so.  Obviously from the dates, 2 jumps happened before
>>>>             I started tracing.
>>>>
>>>>             I was able to reproduce the bug, though not as
>>>>             efficiently as my production system, with the following
>>>>             sample program:
>>>>             https://gist.github.com/garret-smith/5087169
>>>>
>>>>             It took over an hour of runtime before the first time
>>>>             jump.  I am working on a better way to reproduce it at
>>>>             the moment, but it's hard to test the test with a bug
>>>>             so intermittent.
>>>>
>>>>             I am also testing various other VM versions.  My first
>>>>             hope was that this was limited to the 64-bit version
>>>>             where we first encountered the problem, but a change to
>>>>             the 32-bit version has only made the problem happen
>>>>             less often, not eliminated it.
>>>>
>>>>             We never saw this bug with R14B03 which we were running
>>>>             previously to R15B01.  However, system load is
>>>>             different so I can't make a direct comparison.  I did
>>>>             notice a few significant updates to the Windows time
>>>>             related code between R14B03 and R15:
>>>>
>>>>             git log sys_time.c
>>>>
>>>>             commit 46eb4359b05b220861453a869dc734480ec045a6
>>>>             Author: Patrik Nyblom <pan@REDACTED
>>>>             <mailto:pan@REDACTED>>
>>>>             Date:   Tue Dec 6 19:07:16 2011 +0100
>>>>
>>>>                 Emulate localtime, gmtime and mktime to enable
>>>>             negative time_t
>>>>
>>>>             commit 913f05af100e98a8665bbb6168e89fbcfe4ece75
>>>>             Author: Bj<C3><B6>rn-Egil Dahlberg <egil@REDACTED
>>>>             <mailto:egil@REDACTED>>
>>>>             Date:   Fri Dec 2 15:25:06 2011 +0100
>>>>
>>>>                 Teach windows sys_localtime_r
>>>>
>>>>
>>>             Yep, that's me... But even if I gave a totally weird
>>>             time back from those, the erlang:now logic should have
>>>             stopped this from happening. I'll try to reproduce using
>>>             your example program. If nothing else helps, I'll
>>>             instrument a VM that gives som traces in the time code...
>>>>             I am completely stumped.  What can I do next to help
>>>>             track down the source of the bug?
>>>>
>>>             Unfortunately, so am I. Especially weird that it's load
>>>             related... Maybe something is not locked as it should be...
>>>>             Thanks,
>>>>             Garret Smith
>>>             Thanks for reporting, I'll get back to you!
>>>
>>>             Cheers,
>>>             /Patrik
>>>>
>>>>
>>>>             _______________________________________________
>>>>             erlang-bugs mailing list
>>>>             erlang-bugs@REDACTED  <mailto:erlang-bugs@REDACTED>
>>>>             http://erlang.org/mailman/listinfo/erlang-bugs
>>>
>>>
>>>
>>>             _______________________________________________
>>>             erlang-bugs mailing list
>>>             erlang-bugs@REDACTED  <mailto:erlang-bugs@REDACTED>
>>>             http://erlang.org/mailman/listinfo/erlang-bugs
>>
>>
>>             _______________________________________________
>>             erlang-bugs mailing list
>>             erlang-bugs@REDACTED <mailto:erlang-bugs@REDACTED>
>>             http://erlang.org/mailman/listinfo/erlang-bugs
>>
>>
>>
>>
>
>
>
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://erlang.org/mailman/listinfo/erlang-bugs

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130311/9c46cc6d/attachment.htm>

From garret.smith@REDACTED  Mon Mar 11 17:34:01 2013
From: garret.smith@REDACTED (Garret Smith)
Date: Mon, 11 Mar 2013 09:34:01 -0700
Subject: [erlang-bugs] R15B01 erlang:now() jumping ~24 days into the
	future
In-Reply-To: <513E059D.70005@erlang.org>
References: <CAHmviK9hbY9peOE25=SDEpN1=G==GUPzAmmYq0YX-c5yL4KhTQ@mail.gmail.com>
 <5135A3D2.4080305@erlang.org> <5135D1B3.8000400@erlang.org>
 <CAHmviK9xHH9b1aps6fHmCntHPkT2a+TWitzj7ftFEOsgZWuQgw@mail.gmail.com>
 <CAHmviK8++KrWacA8Kk1Vx4c8w6g4B61+91NQ+5sQwVx0E9YA-Q@mail.gmail.com>
 <CAHmviK9G9kMfo_DAQrH+uXt+NouqgmXBLbtVF7+UQ2WP4fNBOA@mail.gmail.com>
 <5138B429.4040605@erlang.org> <513E059D.70005@erlang.org>
Message-ID: <CAHmviK8+kjWz8b2ju_WvrZgNuWwbh-Jg9U0nBFSzfQL3d_H8eA@mail.gmail.com>

Patrik,

Our production systems are on R15B1/2, so I won't be able to verify against
that, but I'll let you know what I see running my test program against R16B.

Will you be able to generate a patched R15x version?  If not, I'll try to
set up a build system and apply the patch locally.

-Garret


On Mon, Mar 11, 2013 at 9:26 AM, Patrik Nyblom <pan@REDACTED> wrote:

>  Hi again!
>
> I think I've found it. At least I've found one error, hopefully that's the
> one you've also found :)
>
> The sys_gethrtime function has gon new uses in R15 and on, uses where it
> is no longer protected by the  erts_timeofday_mtx. So - it simply needs a
> lock of it's own. This gives a slight performance loss, but that could be
> fixed by using GetTickCount64 on win7 and win2008 at least.
>
> Can you try a version of beam.smp.dll with a lock and see if the error is
> gone on your machines? If that works, I would also like you to try an
> optimized version, but let's first make sure we have the bug nailed down :)
>
> In my dropbox, there's a beam.smp.dll. If you replace
> $ERL_ROOT/erts-5.10.1/bin/beam.smp.dll with that one and then start werl,
> the slogan should contain [source-be0da3e]. It is for 64bit windows. The
> public dropbox URL is:
> http://dl.dropbox.com/u/17212223/beam.smp.dll
>
> This should work without any special messages or such, giving a working
> erlang:now/0. If it starts sending strange ERROR REPORT's about ticks
> moving slightly backwards, we have a more complicated bug, but I haven't
> seen any such messages since i added proper locking.
>
> If it's possible for you to test this, I would be immensely grateful!
>
> Cheers,
> /Patrik
>
> On 03/07/2013 04:37 PM, Patrik Nyblom wrote:
>
> Hi Garret!
>
> I've been able to reproduce it on my freshly installed Win2008 machine!
> Great, now I only need to debug it and find the error :)
>
> I'll get back to you as soon as I feel I have a fix - it might take a few
> days, given the relatively long turn around time, but we'll get there!
>
> Thank you for all the help and information!
>
> Cheers,
> /Patrik
>
> On 03/05/2013 09:10 PM, Garret Smith wrote:
>
>  On the same machine with the same steps as previous, I reproduced the
> time jump on R16B.
> This time the jump happened with a <5 sec delta btw now() and
> os:timestamp().
> Still jumping ~2126000 seconds.
>
>  -Garret
>
>
> On Tue, Mar 5, 2013 at 11:20 AM, Garret Smith <garret.smith@REDACTED>wrote:
>
>>   The gist https://gist.github.com/garret-smith/5087169 is updated with
>> a slightly better version.  I was able to reproduce the jump in less than
>> an hour.  I also did some more things to perturb the timing code while the
>> test program was running.
>>
>>  Here is the latest info, everything I can think of that may have the
>> slightest effect:
>>   * R15B01 64-bit build
>>   * Pacific time zone (GMT -8)
>>   * Xeon E5405 in an HP DL160
>>   * no arguments to erl.exe
>>   * bursty, high CPU load, >75% memory used by other software
>>   * running Observer on the test VM displaying the "Load Charts" tab
>>   * made some small adjustments (~ 60 seconds) to the system clock while
>> running the tests - now() and os:timestamp() behaved as expected, initially
>> showing a delta and slowly converging
>>   * w32tm /resync to fix the system clock some time after perturbing it
>>
>>  The time jump in now() occurred when now() was ~9 seconds behind
>> os:timestamp() as reported by the new test program.
>>
>>  I'm starting to look at R16B now.
>>
>>  -Garret Smith
>>
>>
>> On Tue, Mar 5, 2013 at 8:37 AM, Garret Smith <garret.smith@REDACTED>wrote:
>>
>>>   I haven't seen anything unexpected in os:timestamp().  No jumps at
>>> all.
>>>
>>>  CPU is an Intel Xeon X3430.
>>>
>>> I have reproduced it in the LosAngeles/Pacific Time (GMT -8) and US East
>>> coast time zone (GMT -5).
>>>
>>>  I have not yet tried R16B.  I'll be starting that today.  I'm also
>>> trying to improve the test program, since it's taking quite a long time
>>> between jumps for me as well.  I'll let you know as soon as I have a better
>>> one.
>>>
>>>  You have no idea how relieved I am that you are looking into this!
>>>
>>>  Thanks,
>>>  Garret Smith
>>>
>>>
>>>  On Tue, Mar 5, 2013 at 3:06 AM, Patrik Nyblom <pan@REDACTED> wrote:
>>>
>>>>  Hi again...
>>>>
>>>> I'm not sure about one thing. What happens to os:timestamp() during
>>>> these jumps? Does it stay on track or does it also jump around?
>>>>
>>>> I've tried to reproduce it with your program, but has not yet
>>>> succeeded. Have you seen this on the R16B release as well?
>>>>
>>>> Is the hardware in any way fancy (like a lot of cores, some new
>>>> processor I don't have or something else?) or is there anything else
>>>> special about the machine? Also the time zone you're running in would be
>>>> interesting, as there is some time zone specific code there...
>>>>
>>>> I would really like to be able to reproduce it so you don't have to do
>>>> all the tests at your site, it might end up being really time consuming for
>>>> you if I make to many mistakes :)
>>>>
>>>> Cheers,
>>>> /Patrik
>>>>
>>>>
>>>>
>>>> On 03/05/2013 08:50 AM, Patrik Nyblom wrote:
>>>>
>>>> Hi!
>>>>
>>>> On 03/05/2013 02:26 AM, Garret Smith wrote:
>>>>
>>>>     I have been beating my head against a wall for weeks tracking down
>>>> spooky behaviour[sic] in one of our production systems.  I finally tracked
>>>> it down to "jumps" in the times returned by erlang:now(), causing all
>>>> timers in the system to expire at once.  I have witnessed this bug on
>>>> R15B01, both 64 and 32-bit versions running on Windows Server 2008 R2, both
>>>> on bare metal and VirtualBox VM.
>>>>
>>>>  The time jump is always around 2126000 seconds, or a little over 24
>>>> days.  The now() time does not try to converge with os:timestamp() as the
>>>> documentation suggests, and as I confirmed it does if you just change the
>>>> system clock.
>>>>
>>>>  Another VM running concurrently on the same machine but with little
>>>> load (diagnostic node & production node) did not time jump.
>>>>
>>>>  Higher load seems to make the time jumps happen more often.
>>>>
>>>>  Frequency between time jumps varies between seconds and hours, but
>>>> when a jump occurs, it is always 2126000 + (9 to 26) seconds.
>>>>
>>>>  I never see the jump in logfile timestamps that use os:timestamp() for
>>>> tagging log messages.  I had to start tracing a production node before I
>>>> caught the jump.  Here are some lines from a trace, where the timestamp in
>>>> trace_ts is printed using calendar:now_to_local_time() and then in raw
>>>> tuple format:
>>>>
>>>> 2013-4-16 21:40:1.993399|{1366,173601,993399}
>>>> 2013-4-16 21:40:1.993400|{1366,173601,993400}
>>>> 2013-5-11 12:13:41.986961|{1368,299621,986961}
>>>> 2013-5-11 12:13:41.986962|{1368,299621,986962}
>>>>
>>>>  then a bit later...
>>>>
>>>> 2013-5-11 12:36:19.955129|{1368,300979,955129}
>>>> 2013-5-11 12:36:19.955130|{1368,300979,955130}
>>>> 2013-6-5 3:9:49.538830|{1370,426989,538830}
>>>> 2013-6-5 3:9:49.538833|{1370,426989,538833}
>>>>
>>>>   Gah! That's obviously not supposed to happen...
>>>>
>>>>  I captured many such jumps over the course of a day or so.  Obviously
>>>> from the dates, 2 jumps happened before I started tracing.
>>>>
>>>>  I was able to reproduce the bug, though not as efficiently as my
>>>> production system, with the following sample program:
>>>> https://gist.github.com/garret-smith/5087169
>>>>
>>>> It took over an hour of runtime before the first time jump.  I am
>>>> working on a better way to reproduce it at the moment, but it's hard to
>>>> test the test with a bug so intermittent.
>>>>
>>>>  I am also testing various other VM versions.  My first hope was that
>>>> this was limited to the 64-bit version where we first encountered the
>>>> problem, but a change to the 32-bit version has only made the problem
>>>> happen less often, not eliminated it.
>>>>
>>>>  We never saw this bug with R14B03 which we were running previously to
>>>> R15B01.  However, system load is different so I can't make a direct
>>>> comparison.  I did notice a few significant updates to the Windows time
>>>> related code between R14B03 and R15:
>>>>
>>>>  git log sys_time.c
>>>>
>>>>  commit 46eb4359b05b220861453a869dc734480ec045a6
>>>> Author: Patrik Nyblom <pan@REDACTED>
>>>> Date:   Tue Dec 6 19:07:16 2011 +0100
>>>>
>>>>     Emulate localtime, gmtime and mktime to enable negative time_t
>>>>
>>>> commit 913f05af100e98a8665bbb6168e89fbcfe4ece75
>>>> Author: Bj<C3><B6>rn-Egil Dahlberg <egil@REDACTED>
>>>> Date:   Fri Dec 2 15:25:06 2011 +0100
>>>>
>>>>     Teach windows sys_localtime_r
>>>>
>>>>
>>>>   Yep, that's me... But even if I gave a totally weird time back from
>>>> those, the erlang:now logic should have stopped this from happening. I'll
>>>> try to reproduce using your example program. If nothing else helps, I'll
>>>> instrument a VM that gives som traces in the time code...
>>>>
>>>>  I am completely stumped.  What can I do next to help track down the
>>>> source of the bug?
>>>>
>>>>   Unfortunately, so am I. Especially weird that it's load related...
>>>> Maybe something is not locked as it should be...
>>>>
>>>>  Thanks,
>>>>  Garret Smith
>>>>
>>>> Thanks for reporting, I'll get back to you!
>>>>
>>>> Cheers,
>>>> /Patrik
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> erlang-bugs mailing listerlang-bugs@REDACTED://erlang.org/mailman/listinfo/erlang-bugs
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> erlang-bugs mailing listerlang-bugs@REDACTED://erlang.org/mailman/listinfo/erlang-bugs
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> erlang-bugs mailing list
>>>> erlang-bugs@REDACTED
>>>> http://erlang.org/mailman/listinfo/erlang-bugs
>>>>
>>>>
>>>
>>
>
>
>
> _______________________________________________
> erlang-bugs mailing listerlang-bugs@REDACTED://erlang.org/mailman/listinfo/erlang-bugs
>
>
>
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://erlang.org/mailman/listinfo/erlang-bugs
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130311/e00ecca3/attachment.htm>

From garret.smith@REDACTED  Mon Mar 11 17:51:48 2013
From: garret.smith@REDACTED (Garret Smith)
Date: Mon, 11 Mar 2013 09:51:48 -0700
Subject: [erlang-bugs] R15B01 erlang:now() jumping ~24 days into the
	future
In-Reply-To: <513E059D.70005@erlang.org>
References: <CAHmviK9hbY9peOE25=SDEpN1=G==GUPzAmmYq0YX-c5yL4KhTQ@mail.gmail.com>
 <5135A3D2.4080305@erlang.org> <5135D1B3.8000400@erlang.org>
 <CAHmviK9xHH9b1aps6fHmCntHPkT2a+TWitzj7ftFEOsgZWuQgw@mail.gmail.com>
 <CAHmviK8++KrWacA8Kk1Vx4c8w6g4B61+91NQ+5sQwVx0E9YA-Q@mail.gmail.com>
 <CAHmviK9G9kMfo_DAQrH+uXt+NouqgmXBLbtVF7+UQ2WP4fNBOA@mail.gmail.com>
 <5138B429.4040605@erlang.org> <513E059D.70005@erlang.org>
Message-ID: <CAHmviK-2rESf6Xr=4hf6FwDX5ecaynh13oJxiwZbjQaewHM67Q@mail.gmail.com>

ok, so adding a lock in the Windows-specific implementation of
sys_gethrtime protects the 'wrap' and 'last_tick_count' global variables
which are not required in other platforms?

Thanks for the quick turnaround Patrik!

-Garret


On Mon, Mar 11, 2013 at 9:26 AM, Patrik Nyblom <pan@REDACTED> wrote:

>  Hi again!
>
> I think I've found it. At least I've found one error, hopefully that's the
> one you've also found :)
>
> The sys_gethrtime function has gon new uses in R15 and on, uses where it
> is no longer protected by the  erts_timeofday_mtx. So - it simply needs a
> lock of it's own. This gives a slight performance loss, but that could be
> fixed by using GetTickCount64 on win7 and win2008 at least.
>
> Can you try a version of beam.smp.dll with a lock and see if the error is
> gone on your machines? If that works, I would also like you to try an
> optimized version, but let's first make sure we have the bug nailed down :)
>
> In my dropbox, there's a beam.smp.dll. If you replace
> $ERL_ROOT/erts-5.10.1/bin/beam.smp.dll with that one and then start werl,
> the slogan should contain [source-be0da3e]. It is for 64bit windows. The
> public dropbox URL is:
> http://dl.dropbox.com/u/17212223/beam.smp.dll
>
> This should work without any special messages or such, giving a working
> erlang:now/0. If it starts sending strange ERROR REPORT's about ticks
> moving slightly backwards, we have a more complicated bug, but I haven't
> seen any such messages since i added proper locking.
>
> If it's possible for you to test this, I would be immensely grateful!
>
> Cheers,
> /Patrik
>
> On 03/07/2013 04:37 PM, Patrik Nyblom wrote:
>
> Hi Garret!
>
> I've been able to reproduce it on my freshly installed Win2008 machine!
> Great, now I only need to debug it and find the error :)
>
> I'll get back to you as soon as I feel I have a fix - it might take a few
> days, given the relatively long turn around time, but we'll get there!
>
> Thank you for all the help and information!
>
> Cheers,
> /Patrik
>
> On 03/05/2013 09:10 PM, Garret Smith wrote:
>
>  On the same machine with the same steps as previous, I reproduced the
> time jump on R16B.
> This time the jump happened with a <5 sec delta btw now() and
> os:timestamp().
> Still jumping ~2126000 seconds.
>
>  -Garret
>
>
> On Tue, Mar 5, 2013 at 11:20 AM, Garret Smith <garret.smith@REDACTED>wrote:
>
>>   The gist https://gist.github.com/garret-smith/5087169 is updated with
>> a slightly better version.  I was able to reproduce the jump in less than
>> an hour.  I also did some more things to perturb the timing code while the
>> test program was running.
>>
>>  Here is the latest info, everything I can think of that may have the
>> slightest effect:
>>   * R15B01 64-bit build
>>   * Pacific time zone (GMT -8)
>>   * Xeon E5405 in an HP DL160
>>   * no arguments to erl.exe
>>   * bursty, high CPU load, >75% memory used by other software
>>   * running Observer on the test VM displaying the "Load Charts" tab
>>   * made some small adjustments (~ 60 seconds) to the system clock while
>> running the tests - now() and os:timestamp() behaved as expected, initially
>> showing a delta and slowly converging
>>   * w32tm /resync to fix the system clock some time after perturbing it
>>
>>  The time jump in now() occurred when now() was ~9 seconds behind
>> os:timestamp() as reported by the new test program.
>>
>>  I'm starting to look at R16B now.
>>
>>  -Garret Smith
>>
>>
>> On Tue, Mar 5, 2013 at 8:37 AM, Garret Smith <garret.smith@REDACTED>wrote:
>>
>>>   I haven't seen anything unexpected in os:timestamp().  No jumps at
>>> all.
>>>
>>>  CPU is an Intel Xeon X3430.
>>>
>>> I have reproduced it in the LosAngeles/Pacific Time (GMT -8) and US East
>>> coast time zone (GMT -5).
>>>
>>>  I have not yet tried R16B.  I'll be starting that today.  I'm also
>>> trying to improve the test program, since it's taking quite a long time
>>> between jumps for me as well.  I'll let you know as soon as I have a better
>>> one.
>>>
>>>  You have no idea how relieved I am that you are looking into this!
>>>
>>>  Thanks,
>>>  Garret Smith
>>>
>>>
>>>  On Tue, Mar 5, 2013 at 3:06 AM, Patrik Nyblom <pan@REDACTED> wrote:
>>>
>>>>  Hi again...
>>>>
>>>> I'm not sure about one thing. What happens to os:timestamp() during
>>>> these jumps? Does it stay on track or does it also jump around?
>>>>
>>>> I've tried to reproduce it with your program, but has not yet
>>>> succeeded. Have you seen this on the R16B release as well?
>>>>
>>>> Is the hardware in any way fancy (like a lot of cores, some new
>>>> processor I don't have or something else?) or is there anything else
>>>> special about the machine? Also the time zone you're running in would be
>>>> interesting, as there is some time zone specific code there...
>>>>
>>>> I would really like to be able to reproduce it so you don't have to do
>>>> all the tests at your site, it might end up being really time consuming for
>>>> you if I make to many mistakes :)
>>>>
>>>> Cheers,
>>>> /Patrik
>>>>
>>>>
>>>>
>>>> On 03/05/2013 08:50 AM, Patrik Nyblom wrote:
>>>>
>>>> Hi!
>>>>
>>>> On 03/05/2013 02:26 AM, Garret Smith wrote:
>>>>
>>>>     I have been beating my head against a wall for weeks tracking down
>>>> spooky behaviour[sic] in one of our production systems.  I finally tracked
>>>> it down to "jumps" in the times returned by erlang:now(), causing all
>>>> timers in the system to expire at once.  I have witnessed this bug on
>>>> R15B01, both 64 and 32-bit versions running on Windows Server 2008 R2, both
>>>> on bare metal and VirtualBox VM.
>>>>
>>>>  The time jump is always around 2126000 seconds, or a little over 24
>>>> days.  The now() time does not try to converge with os:timestamp() as the
>>>> documentation suggests, and as I confirmed it does if you just change the
>>>> system clock.
>>>>
>>>>  Another VM running concurrently on the same machine but with little
>>>> load (diagnostic node & production node) did not time jump.
>>>>
>>>>  Higher load seems to make the time jumps happen more often.
>>>>
>>>>  Frequency between time jumps varies between seconds and hours, but
>>>> when a jump occurs, it is always 2126000 + (9 to 26) seconds.
>>>>
>>>>  I never see the jump in logfile timestamps that use os:timestamp() for
>>>> tagging log messages.  I had to start tracing a production node before I
>>>> caught the jump.  Here are some lines from a trace, where the timestamp in
>>>> trace_ts is printed using calendar:now_to_local_time() and then in raw
>>>> tuple format:
>>>>
>>>> 2013-4-16 21:40:1.993399|{1366,173601,993399}
>>>> 2013-4-16 21:40:1.993400|{1366,173601,993400}
>>>> 2013-5-11 12:13:41.986961|{1368,299621,986961}
>>>> 2013-5-11 12:13:41.986962|{1368,299621,986962}
>>>>
>>>>  then a bit later...
>>>>
>>>> 2013-5-11 12:36:19.955129|{1368,300979,955129}
>>>> 2013-5-11 12:36:19.955130|{1368,300979,955130}
>>>> 2013-6-5 3:9:49.538830|{1370,426989,538830}
>>>> 2013-6-5 3:9:49.538833|{1370,426989,538833}
>>>>
>>>>   Gah! That's obviously not supposed to happen...
>>>>
>>>>  I captured many such jumps over the course of a day or so.  Obviously
>>>> from the dates, 2 jumps happened before I started tracing.
>>>>
>>>>  I was able to reproduce the bug, though not as efficiently as my
>>>> production system, with the following sample program:
>>>> https://gist.github.com/garret-smith/5087169
>>>>
>>>> It took over an hour of runtime before the first time jump.  I am
>>>> working on a better way to reproduce it at the moment, but it's hard to
>>>> test the test with a bug so intermittent.
>>>>
>>>>  I am also testing various other VM versions.  My first hope was that
>>>> this was limited to the 64-bit version where we first encountered the
>>>> problem, but a change to the 32-bit version has only made the problem
>>>> happen less often, not eliminated it.
>>>>
>>>>  We never saw this bug with R14B03 which we were running previously to
>>>> R15B01.  However, system load is different so I can't make a direct
>>>> comparison.  I did notice a few significant updates to the Windows time
>>>> related code between R14B03 and R15:
>>>>
>>>>  git log sys_time.c
>>>>
>>>>  commit 46eb4359b05b220861453a869dc734480ec045a6
>>>> Author: Patrik Nyblom <pan@REDACTED>
>>>> Date:   Tue Dec 6 19:07:16 2011 +0100
>>>>
>>>>     Emulate localtime, gmtime and mktime to enable negative time_t
>>>>
>>>> commit 913f05af100e98a8665bbb6168e89fbcfe4ece75
>>>> Author: Bj<C3><B6>rn-Egil Dahlberg <egil@REDACTED>
>>>> Date:   Fri Dec 2 15:25:06 2011 +0100
>>>>
>>>>     Teach windows sys_localtime_r
>>>>
>>>>
>>>>   Yep, that's me... But even if I gave a totally weird time back from
>>>> those, the erlang:now logic should have stopped this from happening. I'll
>>>> try to reproduce using your example program. If nothing else helps, I'll
>>>> instrument a VM that gives som traces in the time code...
>>>>
>>>>  I am completely stumped.  What can I do next to help track down the
>>>> source of the bug?
>>>>
>>>>   Unfortunately, so am I. Especially weird that it's load related...
>>>> Maybe something is not locked as it should be...
>>>>
>>>>  Thanks,
>>>>  Garret Smith
>>>>
>>>> Thanks for reporting, I'll get back to you!
>>>>
>>>> Cheers,
>>>> /Patrik
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> erlang-bugs mailing listerlang-bugs@REDACTED://erlang.org/mailman/listinfo/erlang-bugs
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> erlang-bugs mailing listerlang-bugs@REDACTED://erlang.org/mailman/listinfo/erlang-bugs
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> erlang-bugs mailing list
>>>> erlang-bugs@REDACTED
>>>> http://erlang.org/mailman/listinfo/erlang-bugs
>>>>
>>>>
>>>
>>
>
>
>
> _______________________________________________
> erlang-bugs mailing listerlang-bugs@REDACTED://erlang.org/mailman/listinfo/erlang-bugs
>
>
>
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://erlang.org/mailman/listinfo/erlang-bugs
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130311/234c4e5c/attachment.htm>

From garret.smith@REDACTED  Tue Mar 12 00:48:28 2013
From: garret.smith@REDACTED (Garret Smith)
Date: Mon, 11 Mar 2013 16:48:28 -0700
Subject: [erlang-bugs] R15B01 erlang:now() jumping ~24 days into the
	future
In-Reply-To: <CAHmviK8+kjWz8b2ju_WvrZgNuWwbh-Jg9U0nBFSzfQL3d_H8eA@mail.gmail.com>
References: <CAHmviK9hbY9peOE25=SDEpN1=G==GUPzAmmYq0YX-c5yL4KhTQ@mail.gmail.com>
 <5135A3D2.4080305@erlang.org> <5135D1B3.8000400@erlang.org>
 <CAHmviK9xHH9b1aps6fHmCntHPkT2a+TWitzj7ftFEOsgZWuQgw@mail.gmail.com>
 <CAHmviK8++KrWacA8Kk1Vx4c8w6g4B61+91NQ+5sQwVx0E9YA-Q@mail.gmail.com>
 <CAHmviK9G9kMfo_DAQrH+uXt+NouqgmXBLbtVF7+UQ2WP4fNBOA@mail.gmail.com>
 <5138B429.4040605@erlang.org> <513E059D.70005@erlang.org>
 <CAHmviK8+kjWz8b2ju_WvrZgNuWwbh-Jg9U0nBFSzfQL3d_H8eA@mail.gmail.com>
Message-ID: <CAHmviK-V1WYZhDuFQ4yLR80oVPwtEb6Zd5fTSua5a2b1QRv-cg@mail.gmail.com>

Been running the test program all day in the same scenario as before.  No
time jumps!  Looking good...


On Mon, Mar 11, 2013 at 9:34 AM, Garret Smith <garret.smith@REDACTED>wrote:

> Patrik,
>
> Our production systems are on R15B1/2, so I won't be able to verify
> against that, but I'll let you know what I see running my test program
> against R16B.
>
> Will you be able to generate a patched R15x version?  If not, I'll try to
> set up a build system and apply the patch locally.
>
> -Garret
>
>
> On Mon, Mar 11, 2013 at 9:26 AM, Patrik Nyblom <pan@REDACTED> wrote:
>
>>  Hi again!
>>
>> I think I've found it. At least I've found one error, hopefully that's
>> the one you've also found :)
>>
>> The sys_gethrtime function has gon new uses in R15 and on, uses where it
>> is no longer protected by the  erts_timeofday_mtx. So - it simply needs a
>> lock of it's own. This gives a slight performance loss, but that could be
>> fixed by using GetTickCount64 on win7 and win2008 at least.
>>
>> Can you try a version of beam.smp.dll with a lock and see if the error is
>> gone on your machines? If that works, I would also like you to try an
>> optimized version, but let's first make sure we have the bug nailed down :)
>>
>> In my dropbox, there's a beam.smp.dll. If you replace
>> $ERL_ROOT/erts-5.10.1/bin/beam.smp.dll with that one and then start werl,
>> the slogan should contain [source-be0da3e]. It is for 64bit windows. The
>> public dropbox URL is:
>> http://dl.dropbox.com/u/17212223/beam.smp.dll
>>
>> This should work without any special messages or such, giving a working
>> erlang:now/0. If it starts sending strange ERROR REPORT's about ticks
>> moving slightly backwards, we have a more complicated bug, but I haven't
>> seen any such messages since i added proper locking.
>>
>> If it's possible for you to test this, I would be immensely grateful!
>>
>> Cheers,
>> /Patrik
>>
>> On 03/07/2013 04:37 PM, Patrik Nyblom wrote:
>>
>> Hi Garret!
>>
>> I've been able to reproduce it on my freshly installed Win2008 machine!
>> Great, now I only need to debug it and find the error :)
>>
>> I'll get back to you as soon as I feel I have a fix - it might take a few
>> days, given the relatively long turn around time, but we'll get there!
>>
>> Thank you for all the help and information!
>>
>> Cheers,
>> /Patrik
>>
>> On 03/05/2013 09:10 PM, Garret Smith wrote:
>>
>>  On the same machine with the same steps as previous, I reproduced the
>> time jump on R16B.
>> This time the jump happened with a <5 sec delta btw now() and
>> os:timestamp().
>> Still jumping ~2126000 seconds.
>>
>>  -Garret
>>
>>
>> On Tue, Mar 5, 2013 at 11:20 AM, Garret Smith <garret.smith@REDACTED>wrote:
>>
>>>   The gist https://gist.github.com/garret-smith/5087169 is updated with
>>> a slightly better version.  I was able to reproduce the jump in less than
>>> an hour.  I also did some more things to perturb the timing code while the
>>> test program was running.
>>>
>>>  Here is the latest info, everything I can think of that may have the
>>> slightest effect:
>>>   * R15B01 64-bit build
>>>   * Pacific time zone (GMT -8)
>>>   * Xeon E5405 in an HP DL160
>>>   * no arguments to erl.exe
>>>   * bursty, high CPU load, >75% memory used by other software
>>>   * running Observer on the test VM displaying the "Load Charts" tab
>>>   * made some small adjustments (~ 60 seconds) to the system clock
>>> while running the tests - now() and os:timestamp() behaved as expected,
>>> initially showing a delta and slowly converging
>>>   * w32tm /resync to fix the system clock some time after perturbing it
>>>
>>>  The time jump in now() occurred when now() was ~9 seconds behind
>>> os:timestamp() as reported by the new test program.
>>>
>>>  I'm starting to look at R16B now.
>>>
>>>  -Garret Smith
>>>
>>>
>>> On Tue, Mar 5, 2013 at 8:37 AM, Garret Smith <garret.smith@REDACTED>wrote:
>>>
>>>>   I haven't seen anything unexpected in os:timestamp().  No jumps at
>>>> all.
>>>>
>>>>  CPU is an Intel Xeon X3430.
>>>>
>>>> I have reproduced it in the LosAngeles/Pacific Time (GMT -8) and US
>>>> East coast time zone (GMT -5).
>>>>
>>>>  I have not yet tried R16B.  I'll be starting that today.  I'm also
>>>> trying to improve the test program, since it's taking quite a long time
>>>> between jumps for me as well.  I'll let you know as soon as I have a better
>>>> one.
>>>>
>>>>  You have no idea how relieved I am that you are looking into this!
>>>>
>>>>  Thanks,
>>>>  Garret Smith
>>>>
>>>>
>>>>  On Tue, Mar 5, 2013 at 3:06 AM, Patrik Nyblom <pan@REDACTED> wrote:
>>>>
>>>>>  Hi again...
>>>>>
>>>>> I'm not sure about one thing. What happens to os:timestamp() during
>>>>> these jumps? Does it stay on track or does it also jump around?
>>>>>
>>>>> I've tried to reproduce it with your program, but has not yet
>>>>> succeeded. Have you seen this on the R16B release as well?
>>>>>
>>>>> Is the hardware in any way fancy (like a lot of cores, some new
>>>>> processor I don't have or something else?) or is there anything else
>>>>> special about the machine? Also the time zone you're running in would be
>>>>> interesting, as there is some time zone specific code there...
>>>>>
>>>>> I would really like to be able to reproduce it so you don't have to do
>>>>> all the tests at your site, it might end up being really time consuming for
>>>>> you if I make to many mistakes :)
>>>>>
>>>>> Cheers,
>>>>> /Patrik
>>>>>
>>>>>
>>>>>
>>>>> On 03/05/2013 08:50 AM, Patrik Nyblom wrote:
>>>>>
>>>>> Hi!
>>>>>
>>>>> On 03/05/2013 02:26 AM, Garret Smith wrote:
>>>>>
>>>>>     I have been beating my head against a wall for weeks tracking
>>>>> down spooky behaviour[sic] in one of our production systems.  I finally
>>>>> tracked it down to "jumps" in the times returned by erlang:now(), causing
>>>>> all timers in the system to expire at once.  I have witnessed this bug on
>>>>> R15B01, both 64 and 32-bit versions running on Windows Server 2008 R2, both
>>>>> on bare metal and VirtualBox VM.
>>>>>
>>>>>  The time jump is always around 2126000 seconds, or a little over 24
>>>>> days.  The now() time does not try to converge with os:timestamp() as the
>>>>> documentation suggests, and as I confirmed it does if you just change the
>>>>> system clock.
>>>>>
>>>>>  Another VM running concurrently on the same machine but with little
>>>>> load (diagnostic node & production node) did not time jump.
>>>>>
>>>>>  Higher load seems to make the time jumps happen more often.
>>>>>
>>>>>  Frequency between time jumps varies between seconds and hours, but
>>>>> when a jump occurs, it is always 2126000 + (9 to 26) seconds.
>>>>>
>>>>>  I never see the jump in logfile timestamps that use os:timestamp()
>>>>> for tagging log messages.  I had to start tracing a production node before
>>>>> I caught the jump.  Here are some lines from a trace, where the timestamp
>>>>> in trace_ts is printed using calendar:now_to_local_time() and then in raw
>>>>> tuple format:
>>>>>
>>>>> 2013-4-16 21:40:1.993399|{1366,173601,993399}
>>>>> 2013-4-16 21:40:1.993400|{1366,173601,993400}
>>>>> 2013-5-11 12:13:41.986961|{1368,299621,986961}
>>>>> 2013-5-11 12:13:41.986962|{1368,299621,986962}
>>>>>
>>>>>  then a bit later...
>>>>>
>>>>> 2013-5-11 12:36:19.955129|{1368,300979,955129}
>>>>> 2013-5-11 12:36:19.955130|{1368,300979,955130}
>>>>> 2013-6-5 3:9:49.538830|{1370,426989,538830}
>>>>> 2013-6-5 3:9:49.538833|{1370,426989,538833}
>>>>>
>>>>>   Gah! That's obviously not supposed to happen...
>>>>>
>>>>>  I captured many such jumps over the course of a day or so.
>>>>> Obviously from the dates, 2 jumps happened before I started tracing.
>>>>>
>>>>>  I was able to reproduce the bug, though not as efficiently as my
>>>>> production system, with the following sample program:
>>>>> https://gist.github.com/garret-smith/5087169
>>>>>
>>>>> It took over an hour of runtime before the first time jump.  I am
>>>>> working on a better way to reproduce it at the moment, but it's hard to
>>>>> test the test with a bug so intermittent.
>>>>>
>>>>>  I am also testing various other VM versions.  My first hope was that
>>>>> this was limited to the 64-bit version where we first encountered the
>>>>> problem, but a change to the 32-bit version has only made the problem
>>>>> happen less often, not eliminated it.
>>>>>
>>>>>  We never saw this bug with R14B03 which we were running previously
>>>>> to R15B01.  However, system load is different so I can't make a direct
>>>>> comparison.  I did notice a few significant updates to the Windows time
>>>>> related code between R14B03 and R15:
>>>>>
>>>>>  git log sys_time.c
>>>>>
>>>>>  commit 46eb4359b05b220861453a869dc734480ec045a6
>>>>> Author: Patrik Nyblom <pan@REDACTED>
>>>>> Date:   Tue Dec 6 19:07:16 2011 +0100
>>>>>
>>>>>     Emulate localtime, gmtime and mktime to enable negative time_t
>>>>>
>>>>> commit 913f05af100e98a8665bbb6168e89fbcfe4ece75
>>>>> Author: Bj<C3><B6>rn-Egil Dahlberg <egil@REDACTED>
>>>>> Date:   Fri Dec 2 15:25:06 2011 +0100
>>>>>
>>>>>     Teach windows sys_localtime_r
>>>>>
>>>>>
>>>>>   Yep, that's me... But even if I gave a totally weird time back from
>>>>> those, the erlang:now logic should have stopped this from happening. I'll
>>>>> try to reproduce using your example program. If nothing else helps, I'll
>>>>> instrument a VM that gives som traces in the time code...
>>>>>
>>>>>  I am completely stumped.  What can I do next to help track down the
>>>>> source of the bug?
>>>>>
>>>>>   Unfortunately, so am I. Especially weird that it's load related...
>>>>> Maybe something is not locked as it should be...
>>>>>
>>>>>  Thanks,
>>>>>  Garret Smith
>>>>>
>>>>> Thanks for reporting, I'll get back to you!
>>>>>
>>>>> Cheers,
>>>>> /Patrik
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> erlang-bugs mailing listerlang-bugs@REDACTED://erlang.org/mailman/listinfo/erlang-bugs
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> erlang-bugs mailing listerlang-bugs@REDACTED://erlang.org/mailman/listinfo/erlang-bugs
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> erlang-bugs mailing list
>>>>> erlang-bugs@REDACTED
>>>>> http://erlang.org/mailman/listinfo/erlang-bugs
>>>>>
>>>>>
>>>>
>>>
>>
>>
>>
>> _______________________________________________
>> erlang-bugs mailing listerlang-bugs@REDACTED://erlang.org/mailman/listinfo/erlang-bugs
>>
>>
>>
>> _______________________________________________
>> erlang-bugs mailing list
>> erlang-bugs@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-bugs
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130311/a0ef84f5/attachment.htm>

From pan@REDACTED  Tue Mar 12 10:57:11 2013
From: pan@REDACTED (Patrik Nyblom)
Date: Tue, 12 Mar 2013 10:57:11 +0100
Subject: [erlang-bugs] R15B01 erlang:now() jumping ~24 days into the
	future
In-Reply-To: <CAHmviK-V1WYZhDuFQ4yLR80oVPwtEb6Zd5fTSua5a2b1QRv-cg@mail.gmail.com>
References: <CAHmviK9hbY9peOE25=SDEpN1=G==GUPzAmmYq0YX-c5yL4KhTQ@mail.gmail.com>
 <5135A3D2.4080305@erlang.org> <5135D1B3.8000400@erlang.org>
 <CAHmviK9xHH9b1aps6fHmCntHPkT2a+TWitzj7ftFEOsgZWuQgw@mail.gmail.com>
 <CAHmviK8++KrWacA8Kk1Vx4c8w6g4B61+91NQ+5sQwVx0E9YA-Q@mail.gmail.com>
 <CAHmviK9G9kMfo_DAQrH+uXt+NouqgmXBLbtVF7+UQ2WP4fNBOA@mail.gmail.com>
 <5138B429.4040605@erlang.org> <513E059D.70005@erlang.org>
 <CAHmviK8+kjWz8b2ju_WvrZgNuWwbh-Jg9U0nBFSzfQL3d_H8eA@mail.gmail.com>
 <CAHmviK-V1WYZhDuFQ4yLR80oVPwtEb6Zd5fTSua5a2b1QRv-cg@mail.gmail.com>
Message-ID: <513EFBF7.7000406@erlang.org>

Hi!

Good! Thanks!

I can build a patched R15 beam.dll for you, easiest is R15B03, but i can 
do a patched beam for R15B02 if that's really needed. In the end I'll 
probably build some kind of R15B03-2 and a R16B00-1 or something, so 
whoever wants the patch can get binaries. However I would like to have 
something tested in your real system, if that's OK with you. So - which 
version is best to patch for? R15B02?

/Patrik

On 03/12/2013 12:48 AM, Garret Smith wrote:
> Been running the test program all day in the same scenario as before.  
> No time jumps!  Looking good...
>
>
> On Mon, Mar 11, 2013 at 9:34 AM, Garret Smith <garret.smith@REDACTED 
> <mailto:garret.smith@REDACTED>> wrote:
>
>     Patrik,
>
>     Our production systems are on R15B1/2, so I won't be able to
>     verify against that, but I'll let you know what I see running my
>     test program against R16B.
>
>     Will you be able to generate a patched R15x version?  If not, I'll
>     try to set up a build system and apply the patch locally.
>
>     -Garret
>
>
>     On Mon, Mar 11, 2013 at 9:26 AM, Patrik Nyblom <pan@REDACTED
>     <mailto:pan@REDACTED>> wrote:
>
>         Hi again!
>
>         I think I've found it. At least I've found one error,
>         hopefully that's the one you've also found :)
>
>         The sys_gethrtime function has gon new uses in R15 and on,
>         uses where it is no longer protected by the 
>         erts_timeofday_mtx. So - it simply needs a lock of it's own.
>         This gives a slight performance loss, but that could be fixed
>         by using GetTickCount64 on win7 and win2008 at least.
>
>         Can you try a version of beam.smp.dll with a lock and see if
>         the error is gone on your machines? If that works, I would
>         also like you to try an optimized version, but let's first
>         make sure we have the bug nailed down :)
>
>         In my dropbox, there's a beam.smp.dll. If you replace
>         $ERL_ROOT/erts-5.10.1/bin/beam.smp.dll with that one and then
>         start werl, the slogan should contain [source-be0da3e]. It is
>         for 64bit windows. The public dropbox URL is:
>         http://dl.dropbox.com/u/17212223/beam.smp.dll
>
>         This should work without any special messages or such, giving
>         a working erlang:now/0. If it starts sending strange ERROR
>         REPORT's about ticks moving slightly backwards, we have a more
>         complicated bug, but I haven't seen any such messages since i
>         added proper locking.
>
>         If it's possible for you to test this, I would be immensely
>         grateful!
>
>         Cheers,
>         /Patrik
>
>         On 03/07/2013 04:37 PM, Patrik Nyblom wrote:
>>         Hi Garret!
>>
>>         I've been able to reproduce it on my freshly installed
>>         Win2008 machine! Great, now I only need to debug it and find
>>         the error :)
>>
>>         I'll get back to you as soon as I feel I have a fix - it
>>         might take a few days, given the relatively long turn around
>>         time, but we'll get there!
>>
>>         Thank you for all the help and information!
>>
>>         Cheers,
>>         /Patrik
>>
>>         On 03/05/2013 09:10 PM, Garret Smith wrote:
>>>         On the same machine with the same steps as previous, I
>>>         reproduced the time jump on R16B.
>>>         This time the jump happened with a <5 sec delta btw now()
>>>         and os:timestamp().
>>>         Still jumping ~2126000 seconds.
>>>
>>>         -Garret
>>>
>>>
>>>         On Tue, Mar 5, 2013 at 11:20 AM, Garret Smith
>>>         <garret.smith@REDACTED <mailto:garret.smith@REDACTED>> wrote:
>>>
>>>             The gist https://gist.github.com/garret-smith/5087169 is
>>>             updated with a slightly better version.  I was able to
>>>             reproduce the jump in less than an hour.  I also did
>>>             some more things to perturb the timing code while the
>>>             test program was running.
>>>
>>>             Here is the latest info, everything I can think of that
>>>             may have the slightest effect:
>>>              * R15B01 64-bit build
>>>              * Pacific time zone (GMT -8)
>>>              * Xeon E5405 in an HP DL160
>>>              * no arguments to erl.exe
>>>              * bursty, high CPU load, >75% memory used by other software
>>>              * running Observer on the test VM displaying the "Load
>>>             Charts" tab
>>>              * made some small adjustments (~ 60 seconds) to the
>>>             system clock while running the tests - now() and
>>>             os:timestamp() behaved as expected, initially showing a
>>>             delta and slowly converging
>>>              * w32tm /resync to fix the system clock some time after
>>>             perturbing it
>>>
>>>             The time jump in now() occurred when now() was ~9
>>>             seconds behind os:timestamp() as reported by the new
>>>             test program.
>>>
>>>             I'm starting to look at R16B now.
>>>
>>>             -Garret Smith
>>>
>>>
>>>             On Tue, Mar 5, 2013 at 8:37 AM, Garret Smith
>>>             <garret.smith@REDACTED <mailto:garret.smith@REDACTED>>
>>>             wrote:
>>>
>>>                 I haven't seen anything unexpected in
>>>                 os:timestamp(). No jumps at all.
>>>
>>>                 CPU is an Intel Xeon X3430.
>>>
>>>                 I have reproduced it in the LosAngeles/Pacific Time
>>>                 (GMT -8) and US East coast time zone (GMT -5).
>>>
>>>                 I have not yet tried R16B.  I'll be starting that
>>>                 today.  I'm also trying to improve the test program,
>>>                 since it's taking quite a long time between jumps
>>>                 for me as well.  I'll let you know as soon as I have
>>>                 a better one.
>>>
>>>                 You have no idea how relieved I am that you are
>>>                 looking into this!
>>>
>>>                 Thanks,
>>>                 Garret Smith
>>>
>>>
>>>                 On Tue, Mar 5, 2013 at 3:06 AM, Patrik Nyblom
>>>                 <pan@REDACTED <mailto:pan@REDACTED>> wrote:
>>>
>>>                     Hi again...
>>>
>>>                     I'm not sure about one thing. What happens to
>>>                     os:timestamp() during these jumps? Does it stay
>>>                     on track or does it also jump around?
>>>
>>>                     I've tried to reproduce it with your program,
>>>                     but has not yet succeeded. Have you seen this on
>>>                     the R16B release as well?
>>>
>>>                     Is the hardware in any way fancy (like a lot of
>>>                     cores, some new processor I don't have or
>>>                     something else?) or is there anything else
>>>                     special about the machine? Also the time zone
>>>                     you're running in would be interesting, as there
>>>                     is some time zone specific code there...
>>>
>>>                     I would really like to be able to reproduce it
>>>                     so you don't have to do all the tests at your
>>>                     site, it might end up being really time
>>>                     consuming for you if I make to many mistakes :)
>>>
>>>                     Cheers,
>>>                     /Patrik
>>>
>>>
>>>
>>>                     On 03/05/2013 08:50 AM, Patrik Nyblom wrote:
>>>>                     Hi!
>>>>
>>>>                     On 03/05/2013 02:26 AM, Garret Smith wrote:
>>>>>                     I have been beating my head against a wall for
>>>>>                     weeks tracking down spooky behaviour[sic] in
>>>>>                     one of our production systems.  I finally
>>>>>                     tracked it down to "jumps" in the times
>>>>>                     returned by erlang:now(), causing all timers
>>>>>                     in the system to expire at once.  I have
>>>>>                     witnessed this bug on R15B01, both 64 and
>>>>>                     32-bit versions running on Windows Server 2008
>>>>>                     R2, both on bare metal and VirtualBox VM.
>>>>>
>>>>>                     The time jump is always around 2126000
>>>>>                     seconds, or a little over 24 days.  The now()
>>>>>                     time does not try to converge with
>>>>>                     os:timestamp() as the documentation suggests,
>>>>>                     and as I confirmed it does if you just change
>>>>>                     the system clock.
>>>>>
>>>>>                     Another VM running concurrently on the same
>>>>>                     machine but with little load (diagnostic node
>>>>>                     & production node) did not time jump.
>>>>>
>>>>>                     Higher load seems to make the time jumps
>>>>>                     happen more often.
>>>>>
>>>>>                     Frequency between time jumps varies between
>>>>>                     seconds and hours, but when a jump occurs, it
>>>>>                     is always 2126000 + (9 to 26) seconds.
>>>>>
>>>>>                     I never see the jump in logfile timestamps
>>>>>                     that use os:timestamp() for tagging log
>>>>>                     messages. I had to start tracing a production
>>>>>                     node before I caught the jump.  Here are some
>>>>>                     lines from a trace, where the timestamp in
>>>>>                     trace_ts is printed using
>>>>>                     calendar:now_to_local_time() and then in raw
>>>>>                     tuple format:
>>>>>
>>>>>                     2013-4-16 21:40:1.993399|{1366,173601,993399}
>>>>>                     2013-4-16 21:40:1.993400|{1366,173601,993400}
>>>>>                     2013-5-11 12:13:41.986961|{1368,299621,986961}
>>>>>                     2013-5-11 12:13:41.986962|{1368,299621,986962}
>>>>>
>>>>>                     then a bit later...
>>>>>
>>>>>                     2013-5-11 12:36:19.955129|{1368,300979,955129}
>>>>>                     2013-5-11 12:36:19.955130|{1368,300979,955130}
>>>>>                     2013-6-5 3:9:49.538830|{1370,426989,538830}
>>>>>                     2013-6-5 3:9:49.538833|{1370,426989,538833}
>>>>>
>>>>                     Gah! That's obviously not supposed to happen...
>>>>>                     I captured many such jumps over the course of
>>>>>                     a day or so. Obviously from the dates, 2 jumps
>>>>>                     happened before I started tracing.
>>>>>
>>>>>                     I was able to reproduce the bug, though not as
>>>>>                     efficiently as my production system, with the
>>>>>                     following sample program:
>>>>>                     https://gist.github.com/garret-smith/5087169
>>>>>
>>>>>                     It took over an hour of runtime before the
>>>>>                     first time jump.  I am working on a better way
>>>>>                     to reproduce it at the moment, but it's hard
>>>>>                     to test the test with a bug so intermittent.
>>>>>
>>>>>                     I am also testing various other VM versions.
>>>>>                     My first hope was that this was limited to the
>>>>>                     64-bit version where we first encountered the
>>>>>                     problem, but a change to the 32-bit version
>>>>>                     has only made the problem happen less often,
>>>>>                     not eliminated it.
>>>>>
>>>>>                     We never saw this bug with R14B03 which we
>>>>>                     were running previously to R15B01. However,
>>>>>                     system load is different so I can't make a
>>>>>                     direct comparison.  I did notice a few
>>>>>                     significant updates to the Windows time
>>>>>                     related code between R14B03 and R15:
>>>>>
>>>>>                     git log sys_time.c
>>>>>
>>>>>                     commit 46eb4359b05b220861453a869dc734480ec045a6
>>>>>                     Author: Patrik Nyblom <pan@REDACTED
>>>>>                     <mailto:pan@REDACTED>>
>>>>>                     Date:   Tue Dec 6 19:07:16 2011 +0100
>>>>>
>>>>>                         Emulate localtime, gmtime and mktime to
>>>>>                     enable negative time_t
>>>>>
>>>>>                     commit 913f05af100e98a8665bbb6168e89fbcfe4ece75
>>>>>                     Author: Bj<C3><B6>rn-Egil Dahlberg
>>>>>                     <egil@REDACTED <mailto:egil@REDACTED>>
>>>>>                     Date:   Fri Dec 2 15:25:06 2011 +0100
>>>>>
>>>>>                         Teach windows sys_localtime_r
>>>>>
>>>>>
>>>>                     Yep, that's me... But even if I gave a totally
>>>>                     weird time back from those, the erlang:now
>>>>                     logic should have stopped this from happening.
>>>>                     I'll try to reproduce using your example
>>>>                     program. If nothing else helps, I'll instrument
>>>>                     a VM that gives som traces in the time code...
>>>>>                     I am completely stumped.  What can I do next
>>>>>                     to help track down the source of the bug?
>>>>>
>>>>                     Unfortunately, so am I. Especially weird that
>>>>                     it's load related... Maybe something is not
>>>>                     locked as it should be...
>>>>>                     Thanks,
>>>>>                     Garret Smith
>>>>                     Thanks for reporting, I'll get back to you!
>>>>
>>>>                     Cheers,
>>>>                     /Patrik
>>>>>
>>>>>
>>>>>                     _______________________________________________
>>>>>                     erlang-bugs mailing list
>>>>>                     erlang-bugs@REDACTED  <mailto:erlang-bugs@REDACTED>
>>>>>                     http://erlang.org/mailman/listinfo/erlang-bugs
>>>>
>>>>
>>>>
>>>>                     _______________________________________________
>>>>                     erlang-bugs mailing list
>>>>                     erlang-bugs@REDACTED  <mailto:erlang-bugs@REDACTED>
>>>>                     http://erlang.org/mailman/listinfo/erlang-bugs
>>>
>>>
>>>                     _______________________________________________
>>>                     erlang-bugs mailing list
>>>                     erlang-bugs@REDACTED
>>>                     <mailto:erlang-bugs@REDACTED>
>>>                     http://erlang.org/mailman/listinfo/erlang-bugs
>>>
>>>
>>>
>>>
>>
>>
>>
>>         _______________________________________________
>>         erlang-bugs mailing list
>>         erlang-bugs@REDACTED  <mailto:erlang-bugs@REDACTED>
>>         http://erlang.org/mailman/listinfo/erlang-bugs
>
>
>         _______________________________________________
>         erlang-bugs mailing list
>         erlang-bugs@REDACTED <mailto:erlang-bugs@REDACTED>
>         http://erlang.org/mailman/listinfo/erlang-bugs
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130312/0dfc7cdf/attachment.htm>

From vances@REDACTED  Tue Mar 12 14:09:18 2013
From: vances@REDACTED (Vance Shipley)
Date: Tue, 12 Mar 2013 18:39:18 +0530
Subject: [erlang-bugs] R15B01 erlang:now() jumping ~24 days into the
	future
In-Reply-To: <CAHmviK9hbY9peOE25=SDEpN1=G==GUPzAmmYq0YX-c5yL4KhTQ@mail.gmail.com>
References: <CAHmviK9hbY9peOE25=SDEpN1=G==GUPzAmmYq0YX-c5yL4KhTQ@mail.gmail.com>
Message-ID: <CAMoa0NL4PnNXJRe+1D0hnigeBqbtQrHt8hYiF-5kC9sO1smaPw@mail.gmail.com>

C
On Mar 5, 2013 6:56 AM, "Garret Smith" <garret.smith@REDACTED> wrote:

> I have been beating my head against a wall for weeks tracking down spooky
> behaviour[sic] in one of our production systems.  I finally tracked it down
> to "jumps" in the times returned by erlang:now(), causing all timers in the
> system to expire at once.  I have witnessed this bug on R15B01, both 64 and
> 32-bit versions running on Windows Server 2008 R2, both on bare metal and
> VirtualBox VM.
>
> The time jump is always around 2126000 seconds, or a little over 24 days.
> The now() time does not try to converge with os:timestamp() as the
> documentation suggests, and as I confirmed it does if you just change the
> system clock.
>
> Another VM running concurrently on the same machine but with little load
> (diagnostic node & production node) did not time jump.
>
> Higher load seems to make the time jumps happen more often.
>
> Frequency between time jumps varies between seconds and hours, but when a
> jump occurs, it is always 2126000 + (9 to 26) seconds.
>
> I never see the jump in logfile timestamps that use os:timestamp() for
> tagging log messages.  I had to start tracing a production node before I
> caught the jump.  Here are some lines from a trace, where the timestamp in
> trace_ts is printed using calendar:now_to_local_time() and then in raw
> tuple format:
>
> 2013-4-16 21:40:1.993399|{1366,173601,993399}
> 2013-4-16 21:40:1.993400|{1366,173601,993400}
> 2013-5-11 12:13:41.986961|{1368,299621,986961}
> 2013-5-11 12:13:41.986962|{1368,299621,986962}
>
> then a bit later...
>
> 2013-5-11 12:36:19.955129|{1368,300979,955129}
> 2013-5-11 12:36:19.955130|{1368,300979,955130}
> 2013-6-5 3:9:49.538830|{1370,426989,538830}
> 2013-6-5 3:9:49.538833|{1370,426989,538833}
>
> I captured many such jumps over the course of a day or so.  Obviously from
> the dates, 2 jumps happened before I started tracing.
>
> I was able to reproduce the bug, though not as efficiently as my
> production system, with the following sample program:
> https://gist.github.com/garret-smith/5087169
>
> It took over an hour of runtime before the first time jump.  I am working
> on a better way to reproduce it at the moment, but it's hard to test the
> test with a bug so intermittent.
>
> I am also testing various other VM versions.  My first hope was that this
> was limited to the 64-bit version where we first encountered the problem,
> but a change to the 32-bit version has only made the problem happen less
> often, not eliminated it.
>
> We never saw this bug with R14B03 which we were running previously to
> R15B01.  However, system load is different so I can't make a direct
> comparison.  I did notice a few significant updates to the Windows time
> related code between R14B03 and R15:
>
> git log sys_time.c
>
> commit 46eb4359b05b220861453a869dc734480ec045a6
> Author: Patrik Nyblom <pan@REDACTED>
> Date:   Tue Dec 6 19:07:16 2011 +0100
>
>     Emulate localtime, gmtime and mktime to enable negative time_t
>
> commit 913f05af100e98a8665bbb6168e89fbcfe4ece75
> Author: Bj<C3><B6>rn-Egil Dahlberg <egil@REDACTED>
> Date:   Fri Dec 2 15:25:06 2011 +0100
>
>     Teach windows sys_localtime_r
>
>
> I am completely stumped.  What can I do next to help track down the source
> of the bug?
>
> Thanks,
> Garret Smith
>
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://erlang.org/mailman/listinfo/erlang-bugs
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130312/e4ffbd7a/attachment.htm>

From pan@REDACTED  Tue Mar 12 14:38:28 2013
From: pan@REDACTED (Patrik Nyblom)
Date: Tue, 12 Mar 2013 14:38:28 +0100
Subject: [erlang-bugs] R15B01 erlang:now() jumping ~24 days into the
	future
In-Reply-To: <CAMoa0NL4PnNXJRe+1D0hnigeBqbtQrHt8hYiF-5kC9sO1smaPw@mail.gmail.com>
References: <CAHmviK9hbY9peOE25=SDEpN1=G==GUPzAmmYq0YX-c5yL4KhTQ@mail.gmail.com>
 <CAMoa0NL4PnNXJRe+1D0hnigeBqbtQrHt8hYiF-5kC9sO1smaPw@mail.gmail.com>
Message-ID: <513F2FD4.8010108@erlang.org>

Hi!

There's a patched version of the R15B02 dll in my public dropbox, under 
the name r15.beam.smp.dll:

http://dl.dropbox.com/u/17212223/r15.beam.smp.dll

If you replace the R15 beam.smp.dll with this one, the werl slogan 
should contain the version erts-5.9.2.0.1, if you could try that on the 
real app, I would be immensely grateful!

Cheers,
/Patrik

On 03/12/2013 02:09 PM, Vance Shipley wrote:
>
> C
>
> On Mar 5, 2013 6:56 AM, "Garret Smith" <garret.smith@REDACTED 
> <mailto:garret.smith@REDACTED>> wrote:
>
>     I have been beating my head against a wall for weeks tracking down
>     spooky behaviour[sic] in one of our production systems.  I finally
>     tracked it down to "jumps" in the times returned by erlang:now(),
>     causing all timers in the system to expire at once.  I have
>     witnessed this bug on R15B01, both 64 and 32-bit versions running
>     on Windows Server 2008 R2, both on bare metal and VirtualBox VM.
>
>     The time jump is always around 2126000 seconds, or a little over
>     24 days.  The now() time does not try to converge with
>     os:timestamp() as the documentation suggests, and as I confirmed
>     it does if you just change the system clock.
>
>     Another VM running concurrently on the same machine but with
>     little load (diagnostic node & production node) did not time jump.
>
>     Higher load seems to make the time jumps happen more often.
>
>     Frequency between time jumps varies between seconds and hours, but
>     when a jump occurs, it is always 2126000 + (9 to 26) seconds.
>
>     I never see the jump in logfile timestamps that use os:timestamp()
>     for tagging log messages.  I had to start tracing a production
>     node before I caught the jump.  Here are some lines from a trace,
>     where the timestamp in trace_ts is printed using
>     calendar:now_to_local_time() and then in raw tuple format:
>
>     2013-4-16 21:40:1.993399|{1366,173601,993399}
>     2013-4-16 21:40:1.993400|{1366,173601,993400}
>     2013-5-11 12:13:41.986961|{1368,299621,986961}
>     2013-5-11 12:13:41.986962|{1368,299621,986962}
>
>     then a bit later...
>
>     2013-5-11 12:36:19.955129|{1368,300979,955129}
>     2013-5-11 12:36:19.955130|{1368,300979,955130}
>     2013-6-5 3:9:49.538830|{1370,426989,538830}
>     2013-6-5 3:9:49.538833|{1370,426989,538833}
>
>     I captured many such jumps over the course of a day or so. 
>     Obviously from the dates, 2 jumps happened before I started tracing.
>
>     I was able to reproduce the bug, though not as efficiently as my
>     production system, with the following sample program:
>     https://gist.github.com/garret-smith/5087169
>
>     It took over an hour of runtime before the first time jump.  I am
>     working on a better way to reproduce it at the moment, but it's
>     hard to test the test with a bug so intermittent.
>
>     I am also testing various other VM versions.  My first hope was
>     that this was limited to the 64-bit version where we first
>     encountered the problem, but a change to the 32-bit version has
>     only made the problem happen less often, not eliminated it.
>
>     We never saw this bug with R14B03 which we were running previously
>     to R15B01.  However, system load is different so I can't make a
>     direct comparison.  I did notice a few significant updates to the
>     Windows time related code between R14B03 and R15:
>
>     git log sys_time.c
>
>     commit 46eb4359b05b220861453a869dc734480ec045a6
>     Author: Patrik Nyblom <pan@REDACTED <mailto:pan@REDACTED>>
>     Date:   Tue Dec 6 19:07:16 2011 +0100
>
>         Emulate localtime, gmtime and mktime to enable negative time_t
>
>     commit 913f05af100e98a8665bbb6168e89fbcfe4ece75
>     Author: Bj<C3><B6>rn-Egil Dahlberg <egil@REDACTED
>     <mailto:egil@REDACTED>>
>     Date:   Fri Dec 2 15:25:06 2011 +0100
>
>         Teach windows sys_localtime_r
>
>
>     I am completely stumped.  What can I do next to help track down
>     the source of the bug?
>
>     Thanks,
>     Garret Smith
>
>     _______________________________________________
>     erlang-bugs mailing list
>     erlang-bugs@REDACTED <mailto:erlang-bugs@REDACTED>
>     http://erlang.org/mailman/listinfo/erlang-bugs
>
>
>
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://erlang.org/mailman/listinfo/erlang-bugs

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130312/394bad4d/attachment.htm>

From garret.smith@REDACTED  Tue Mar 12 16:36:37 2013
From: garret.smith@REDACTED (Garret Smith)
Date: Tue, 12 Mar 2013 08:36:37 -0700
Subject: [erlang-bugs] R15B01 erlang:now() jumping ~24 days into the
	future
In-Reply-To: <513F2FD4.8010108@erlang.org>
References: <CAHmviK9hbY9peOE25=SDEpN1=G==GUPzAmmYq0YX-c5yL4KhTQ@mail.gmail.com>
 <CAMoa0NL4PnNXJRe+1D0hnigeBqbtQrHt8hYiF-5kC9sO1smaPw@mail.gmail.com>
 <513F2FD4.8010108@erlang.org>
Message-ID: <CAHmviK-0WhBJK4Of0EXfWxiPLP2Yt3dtBbQQr4PCwQpuQJ==dQ@mail.gmail.com>

On Mar 12, 2013 6:38 AM, "Patrik Nyblom" <pan@REDACTED> wrote:
>
> Hi!
>
> There's a patched version of the R15B02 dll in my public dropbox, under
the name r15.beam.smp.dll:
>
> http://dl.dropbox.com/u/17212223/r15.beam.smp.dll

R15B02 will work.  I'll get started but it will take a couple days to get
everything built, deployed and watch for time jumps.

Thank you for the binary!

>
> If you replace the R15 beam.smp.dll with this one, the werl slogan should
contain the version erts-5.9.2.0.1, if you could try that on the real app,
I would be immensely grateful!
>
> Cheers,
> /Patrik
>
> On 03/12/2013 02:09 PM, Vance Shipley wrote:
>>
>> C
>>
>> On Mar 5, 2013 6:56 AM, "Garret Smith" <garret.smith@REDACTED> wrote:
>>>
>>> I have been beating my head against a wall for weeks tracking down
spooky behaviour[sic] in one of our production systems.  I finally tracked
it down to "jumps" in the times returned by erlang:now(), causing all
timers in the system to expire at once.  I have witnessed this bug on
R15B01, both 64 and 32-bit versions running on Windows Server 2008 R2, both
on bare metal and VirtualBox VM.
>>>
>>> The time jump is always around 2126000 seconds, or a little over 24
days.  The now() time does not try to converge with os:timestamp() as the
documentation suggests, and as I confirmed it does if you just change the
system clock.
>>>
>>> Another VM running concurrently on the same machine but with little
load (diagnostic node & production node) did not time jump.
>>>
>>> Higher load seems to make the time jumps happen more often.
>>>
>>> Frequency between time jumps varies between seconds and hours, but when
a jump occurs, it is always 2126000 + (9 to 26) seconds.
>>>
>>> I never see the jump in logfile timestamps that use os:timestamp() for
tagging log messages.  I had to start tracing a production node before I
caught the jump.  Here are some lines from a trace, where the timestamp in
trace_ts is printed using calendar:now_to_local_time() and then in raw
tuple format:
>>>
>>> 2013-4-16 21:40:1.993399|{1366,173601,993399}
>>> 2013-4-16 21:40:1.993400|{1366,173601,993400}
>>> 2013-5-11 12:13:41.986961|{1368,299621,986961}
>>> 2013-5-11 12:13:41.986962|{1368,299621,986962}
>>>
>>> then a bit later...
>>>
>>> 2013-5-11 12:36:19.955129|{1368,300979,955129}
>>> 2013-5-11 12:36:19.955130|{1368,300979,955130}
>>> 2013-6-5 3:9:49.538830|{1370,426989,538830}
>>> 2013-6-5 3:9:49.538833|{1370,426989,538833}
>>>
>>> I captured many such jumps over the course of a day or so.  Obviously
from the dates, 2 jumps happened before I started tracing.
>>>
>>> I was able to reproduce the bug, though not as efficiently as my
production system, with the following sample program:
https://gist.github.com/garret-smith/5087169
>>>
>>> It took over an hour of runtime before the first time jump.  I am
working on a better way to reproduce it at the moment, but it's hard to
test the test with a bug so intermittent.
>>>
>>> I am also testing various other VM versions.  My first hope was that
this was limited to the 64-bit version where we first encountered the
problem, but a change to the 32-bit version has only made the problem
happen less often, not eliminated it.
>>>
>>> We never saw this bug with R14B03 which we were running previously to
R15B01.  However, system load is different so I can't make a direct
comparison.  I did notice a few significant updates to the Windows time
related code between R14B03 and R15:
>>>
>>> git log sys_time.c
>>>
>>> commit 46eb4359b05b220861453a869dc734480ec045a6
>>> Author: Patrik Nyblom <pan@REDACTED>
>>> Date:   Tue Dec 6 19:07:16 2011 +0100
>>>
>>>     Emulate localtime, gmtime and mktime to enable negative time_t
>>>
>>> commit 913f05af100e98a8665bbb6168e89fbcfe4ece75
>>> Author: Bj<C3><B6>rn-Egil Dahlberg <egil@REDACTED>
>>> Date:   Fri Dec 2 15:25:06 2011 +0100
>>>
>>>     Teach windows sys_localtime_r
>>>
>>>
>>> I am completely stumped.  What can I do next to help track down the
source of the bug?
>>>
>>> Thanks,
>>> Garret Smith
>>>
>>> _______________________________________________
>>> erlang-bugs mailing list
>>> erlang-bugs@REDACTED
>>> http://erlang.org/mailman/listinfo/erlang-bugs
>>>
>>
>>
>> _______________________________________________
>> erlang-bugs mailing list
>> erlang-bugs@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-bugs
>
>
>
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://erlang.org/mailman/listinfo/erlang-bugs
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130312/448fcff2/attachment.htm>

From mamuelle@REDACTED  Tue Mar 12 16:39:48 2013
From: mamuelle@REDACTED (Magnus =?UTF-8?B?TcO8bGxlcg==?=)
Date: Tue, 12 Mar 2013 16:39:48 +0100
Subject: [erlang-bugs] R16B takes long to compile a simple module
Message-ID: <20130312163948.b196047d228b342de33335f9@informatik.hu-berlin.de>

The following small module takes ~10s to compile with R16B (erl +V
below). The code is distilled from indent/erlang_indent.erl in vimerl
[1]. Diagnostics are fast (a small syntax error somewhere crashes the
compilation immediately). The same module compiles quickly (<1s) with R15B.

-------------------------------------------------
-module(erlang_indent).

-export([p/2]).

-define(IS(T, C), (element(1, T) == C)).

-record(state, {stack = []}).

p(T1, #state{stack = [T2 | _]}) when ?IS(T2, a), ?IS(T1, b), ?IS(T1, c)
-> ok; p(T, _) when ?IS(T, a1); ?IS(T, b1); ?IS(T, c1) -> ok;
p(T, _) when ?IS(T, a2); ?IS(T, b2); ?IS(T, c2) -> ok;
p(T, _) when ?IS(T, a) -> ok;
p(T, _) when ?IS(T, a), (?IS(T, b) and ?IS(T, c)) -> ok;
p(_, T) when ?IS(T, a) -> ok;
p(_, T) when ?IS(T, b) -> ok;
p(T, _) when ?IS(T, a) -> ok.
-------------------------------------------------

$ erl +V
Erlang (SMP,ASYNC_THREADS,HIPE) (BEAM) emulator version 5.10.1


The original file[2] in vimerl takes even longer to compile. Note that
that [2] is actually an escript, but the error persists when
it is converted to a module.


[1] https://github.com/jimenezrick/vimerl
[2] https://raw.github.com/jimenezrick/vimerl/master/indent/erlang_indent.erl


From kostis@REDACTED  Tue Mar 12 20:07:46 2013
From: kostis@REDACTED (Kostis Sagonas)
Date: Tue, 12 Mar 2013 20:07:46 +0100
Subject: [erlang-bugs] Native compilation hangs with
	rm-reverse-eta-conversion
In-Reply-To: <EA8474BB-3593-4098-9AAC-750632E8FE40@gmail.com>
References: <CAKJNF0c56fbuEnDEUcW1KnSSE514-nS8gncOJ_exFf2h=DuR+w@mail.gmail.com>
 <CAKJNF0dZX1rveZnnXiCW+dktvYUW7tzvMmSCODyvRm2MDDf+4g@mail.gmail.com>
 <50FBAB1E.2070703@cs.ntua.gr>
 <A0DE1949-B2F2-43E9-95A4-85A35B2B0CBD@gmail.com>
 <EA8474BB-3593-4098-9AAC-750632E8FE40@gmail.com>
Message-ID: <513F7D02.9040907@cs.ntua.gr>

On 01/23/2013 12:49 PM, Anthony Ramine wrote:
> Hi,
>
> The bytecode invariant that I broke is the fact that a function cannot be used as
> a closure and as a normal function both at the same time, thus the eta-abstraction
> is needed by HiPE.
>
> Fredrik, for the time being you should probably revert rm-reverse-eta-conversion
> because I don't think I'll be able to make HiPE work with the eta-abstraction in
> that much time.
>
> Kostis, could you give me directions on how to make HiPE not need the intermediate
> closures when doing fun Name/Arity?

Thanks to Anthony repeatedly prompting me to look into this and sending 
me a minimal example to test and to Bjorn Gustavsson for checking the 
code of hipe_icode_coordinator, today I adapted the assumptions of the 
native code compiler and simplified the code that computes escaping 
functions. The following hipe patch should be included in OTP:


   git fetch git://github.com/kostis/otp.git hipe-cleanup-escaping


After its inclusion, Anthony's patch that removes the automatic 
eta-abstraction for function references from the BEAM compiler can 
probably be included without any problems.

Kostis


From fredrik@REDACTED  Wed Mar 13 10:15:06 2013
From: fredrik@REDACTED (Fredrik)
Date: Wed, 13 Mar 2013 10:15:06 +0100
Subject: [erlang-bugs] [erlang-patches] Native compilation hangs with
 rm-reverse-eta-conversion
In-Reply-To: <513F7D02.9040907@cs.ntua.gr>
References: <CAKJNF0c56fbuEnDEUcW1KnSSE514-nS8gncOJ_exFf2h=DuR+w@mail.gmail.com>
 <CAKJNF0dZX1rveZnnXiCW+dktvYUW7tzvMmSCODyvRm2MDDf+4g@mail.gmail.com>
 <50FBAB1E.2070703@cs.ntua.gr>
 <A0DE1949-B2F2-43E9-95A4-85A35B2B0CBD@gmail.com>
 <EA8474BB-3593-4098-9AAC-750632E8FE40@gmail.com>
 <513F7D02.9040907@cs.ntua.gr>
Message-ID: <5140439A.2030205@erlang.org>

On 03/12/2013 08:07 PM, Kostis Sagonas wrote:
> On 01/23/2013 12:49 PM, Anthony Ramine wrote:
>> Hi,
>>
>> The bytecode invariant that I broke is the fact that a function 
>> cannot be used as
>> a closure and as a normal function both at the same time, thus the 
>> eta-abstraction
>> is needed by HiPE.
>>
>> Fredrik, for the time being you should probably revert 
>> rm-reverse-eta-conversion
>> because I don't think I'll be able to make HiPE work with the 
>> eta-abstraction in
>> that much time.
>>
>> Kostis, could you give me directions on how to make HiPE not need the 
>> intermediate
>> closures when doing fun Name/Arity?
>
> Thanks to Anthony repeatedly prompting me to look into this and 
> sending me a minimal example to test and to Bjorn Gustavsson for 
> checking the code of hipe_icode_coordinator, today I adapted the 
> assumptions of the native code compiler and simplified the code that 
> computes escaping functions. The following hipe patch should be 
> included in OTP:
>
>
>   git fetch git://github.com/kostis/otp.git hipe-cleanup-escaping
>
>
> After its inclusion, Anthony's patch that removes the automatic 
> eta-abstraction for function references from the BEAM compiler can 
> probably be included without any problems.
>
> Kostis
> _______________________________________________
> erlang-patches mailing list
> erlang-patches@REDACTED
> http://erlang.org/mailman/listinfo/erlang-patches
Fetched. It is now in the 'pu' branch.

-- 

BR Fredrik Gustafsson
Erlang OTP Team


From bgustavsson@REDACTED  Wed Mar 13 15:29:43 2013
From: bgustavsson@REDACTED (=?UTF-8?Q?Bj=C3=B6rn_Gustavsson?=)
Date: Wed, 13 Mar 2013 15:29:43 +0100
Subject: [erlang-bugs] R16B takes long to compile a simple module
In-Reply-To: <20130312163948.b196047d228b342de33335f9@informatik.hu-berlin.de>
References: <20130312163948.b196047d228b342de33335f9@informatik.hu-berlin.de>
Message-ID: <CA+yh78RY+t8oknhMY0919Ffp4QSGaaUaucqfydNXucZznRejSg@mail.gmail.com>

Thanks for reporting this issue!

I introduced a new optimization in R16
and failed to optimize it. I will fix it
in the R16B01 release.

/Bjorn


On Tue, Mar 12, 2013 at 4:39 PM, Magnus M?ller <
mamuelle@REDACTED> wrote:

> The following small module takes ~10s to compile with R16B (erl +V
> below). The code is distilled from indent/erlang_indent.erl in vimerl
> [1]. Diagnostics are fast (a small syntax error somewhere crashes the
> compilation immediately). The same module compiles quickly (<1s) with R15B.
>
> -------------------------------------------------
> -module(erlang_indent).
>
> -export([p/2]).
>
> -define(IS(T, C), (element(1, T) == C)).
>
> -record(state, {stack = []}).
>
> p(T1, #state{stack = [T2 | _]}) when ?IS(T2, a), ?IS(T1, b), ?IS(T1, c)
> -> ok; p(T, _) when ?IS(T, a1); ?IS(T, b1); ?IS(T, c1) -> ok;
> p(T, _) when ?IS(T, a2); ?IS(T, b2); ?IS(T, c2) -> ok;
> p(T, _) when ?IS(T, a) -> ok;
> p(T, _) when ?IS(T, a), (?IS(T, b) and ?IS(T, c)) -> ok;
> p(_, T) when ?IS(T, a) -> ok;
> p(_, T) when ?IS(T, b) -> ok;
> p(T, _) when ?IS(T, a) -> ok.
> -------------------------------------------------
>
> $ erl +V
> Erlang (SMP,ASYNC_THREADS,HIPE) (BEAM) emulator version 5.10.1
>
>
> The original file[2] in vimerl takes even longer to compile. Note that
> that [2] is actually an escript, but the error persists when
> it is converted to a module.
>
>
> [1] https://github.com/jimenezrick/vimerl
> [2]
> https://raw.github.com/jimenezrick/vimerl/master/indent/erlang_indent.erl
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://erlang.org/mailman/listinfo/erlang-bugs
>


-- 
Bj?rn Gustavsson, Erlang/OTP, Ericsson AB
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130313/69152c2c/attachment.htm>

From arn@REDACTED  Fri Mar 15 15:19:56 2013
From: arn@REDACTED (Anton Yabchinskiy)
Date: 15 Mar 2013 18:19:56 +0400
Subject: [erlang-bugs] Possible regression in httpc's cookie handling
Message-ID: <Prayer.1.3.4.1303151819560.31955@km21038-04.keymachine.de>

Hello,

I've encountered a difference in behaviour of HTTP client in
R15B01 (Debian build) and in R16B (Erlang Solutions build).
Consider the following code:

#!/usr/bin/env escript

main(_Args) ->
    Profile = some_profile,
    ok = application:start(inets),
    {ok, _Pid} = inets:start(httpc, [{profile, Profile}]),
    ok = httpc:set_option(cookies, enabled, Profile),
    _Ans = httpc:request("http://www.google.ru/", Profile),
    io:format("~p~n", [httpc:which_cookies(Profile)]).

When run with R15B01 it outputs the following:

[{session_cookies,[{http_cookie,".google.ru",false,"PREF",
                                 
"ID=542c81909139855f:FF=0:NW=1:TM=1363356181:LM=1363356181:S=JNFNZBI_nhJC-IIO",
                                undefined,session,"/",false,false,"0"},
                   {http_cookie,".google.ru",false,"NID",
                                 
"67=CkprmSvcQFKD7P0pt1FkRHkXZXTe_geBYXy2gk65yJTJyxvIjqm0Mrc7xErtR4xL5qaKsfUMC4oTWsvJze910qRx79VBf66rivfjmN88bVhg9aDd6YS2M3UohXLXT68t",
                                undefined,session,"/",false,false,"0"}]}]

The output for R16B is:

[{session_cookies,[]}]

There is no difference in behaviour if profile isn't used.

I'm not sure, but probably it's related to commit
9c85ee8b61c24587a228b3644c37b1b4fdfb7dcb, which includes
the following change in lib/inets/src/http_client/httpc_handler.erl
file:

- handle_cookies(Headers, Request, Options, ProfileName), + 
handle_cookies(Headers, Request, Options, httpc_manager), %% FOO 
profile_name


From Ingela.Anderton.Andin@REDACTED  Fri Mar 15 15:53:44 2013
From: Ingela.Anderton.Andin@REDACTED (Ingela Anderton Andin)
Date: Fri, 15 Mar 2013 15:53:44 +0100
Subject: [erlang-bugs] Possible regression in httpc's cookie handling
In-Reply-To: <Prayer.1.3.4.1303151819560.31955@km21038-04.keymachine.de>
References: <Prayer.1.3.4.1303151819560.31955@km21038-04.keymachine.de>
Message-ID: <514335F8.9040308@ericsson.com>

Hi!

Thank you for reporting this. It looks really strange and I must have 
committed it by accident. The change has nothing to do with the with the
rest of the commit. You could try changing it back and see if it helps.
We have a new test case to write.

Regards Ingela Erlang/OTP team - Ericsson AB


Anton Yabchinskiy wrote:
> Hello,
> 
> I've encountered a difference in behaviour of HTTP client in
> R15B01 (Debian build) and in R16B (Erlang Solutions build).
> Consider the following code:
> 
> #!/usr/bin/env escript
> 
> main(_Args) ->
>    Profile = some_profile,
>    ok = application:start(inets),
>    {ok, _Pid} = inets:start(httpc, [{profile, Profile}]),
>    ok = httpc:set_option(cookies, enabled, Profile),
>    _Ans = httpc:request("http://www.google.ru/", Profile),
>    io:format("~p~n", [httpc:which_cookies(Profile)]).
> 
> When run with R15B01 it outputs the following:
> 
> [{session_cookies,[{http_cookie,".google.ru",false,"PREF",
>                                 
> "ID=542c81909139855f:FF=0:NW=1:TM=1363356181:LM=1363356181:S=JNFNZBI_nhJC-IIO", 
> 
>                                undefined,session,"/",false,false,"0"},
>                   {http_cookie,".google.ru",false,"NID",
>                                 
> "67=CkprmSvcQFKD7P0pt1FkRHkXZXTe_geBYXy2gk65yJTJyxvIjqm0Mrc7xErtR4xL5qaKsfUMC4oTWsvJze910qRx79VBf66rivfjmN88bVhg9aDd6YS2M3UohXLXT68t", 
> 
>                                undefined,session,"/",false,false,"0"}]}]
> 
> The output for R16B is:
> 
> [{session_cookies,[]}]
> 
> There is no difference in behaviour if profile isn't used.
> 
> I'm not sure, but probably it's related to commit
> 9c85ee8b61c24587a228b3644c37b1b4fdfb7dcb, which includes
> the following change in lib/inets/src/http_client/httpc_handler.erl
> file:
> 
> - handle_cookies(Headers, Request, Options, ProfileName), + 
> handle_cookies(Headers, Request, Options, httpc_manager), %% FOO 
> profile_name
> 
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://erlang.org/mailman/listinfo/erlang-bugs


From bourinov@REDACTED  Fri Mar 15 18:10:49 2013
From: bourinov@REDACTED (Max Bourinov)
Date: Fri, 15 Mar 2013 18:10:49 +0100
Subject: [erlang-bugs] Error in cover.erl
Message-ID: <CANsaZAjNBrxOuhaXhjjdrRDveikMw=13fqVXnq+EyQLZL+2GYw@mail.gmail.com>

Error in cover.erl

=ERROR REPORT==== 15-Mar-2013::18:07:42 ===
Error in process <0.213.0> with exit value:
{function_clause,[{lists,last,[[]],[{file,"lists.erl"},{line,162}]},{cover,fix_clauses,3,[{file,"cover.erl"},{line,1621}]},{cover,fix_expr,3,[{file,"cover.erl"},{line,1609}]},{cover,fix_expr,3,[{file,"cover.erl"},{line,1614}]},{cover,fix_expr,3,[...

ERROR: eunit failed while processing /user/max/project/processor:
{'EXIT',{function_clause,[{lists,last,[[]],[{file,"lists.erl"},{line,162}]},
                          {cover,fix_clauses,3,
                                 [{file,"cover.erl"},{line,1621}]},

{cover,fix_expr,3,[{file,"cover.erl"},{line,1609}]},

{cover,fix_expr,3,[{file,"cover.erl"},{line,1614}]},

{cover,fix_expr,3,[{file,"cover.erl"},{line,1614}]},

{cover,fix_expr,3,[{file,"cover.erl"},{line,1616}]},
                          {cover,fix_last_expr,3,
                                 [{file,"cover.erl"},{line,1590}]},
                          {cover,munge_body,4,
                                 [{file,"cover.erl"},{line,1535}]}]}}


Best regards,
Max
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130315/77e83a1b/attachment.htm>

From arn@REDACTED  Fri Mar 15 18:38:47 2013
From: arn@REDACTED (Anton Yabchinskiy)
Date: Fri, 15 Mar 2013 21:38:47 +0400
Subject: [erlang-bugs] Possible regression in httpc's cookie handling
In-Reply-To: <514335F8.9040308@ericsson.com>
References: <Prayer.1.3.4.1303151819560.31955@km21038-04.keymachine.de>
 <514335F8.9040308@ericsson.com>
Message-ID: <20130315173847.GA21449@mithlond.erebor71.org>

On 2013-03-15 15:53:44+0100, Ingela Anderton Andin wrote:
> Hi!
> 
> Thank you for reporting this. It looks really strange and I must
> have committed it by accident. The change has nothing to do with the
> with the
> rest of the commit. You could try changing it back and see if it helps.
> We have a new test case to write.

Yes, reverting that line does help. It works as expected now.


From n.oxyde@REDACTED  Mon Mar 18 14:08:32 2013
From: n.oxyde@REDACTED (Anthony Ramine)
Date: Mon, 18 Mar 2013 14:08:32 +0100
Subject: [erlang-bugs] Minor annoyance after 'make clean'
In-Reply-To: <BCA61AA5-F4D4-4D58-9455-9E3721D4DB54@gmail.com>
References: <5114E1C0.10901@cs.ntua.gr>
 <BCA61AA5-F4D4-4D58-9455-9E3721D4DB54@gmail.com>
Message-ID: <22ED1658-3AD6-49BD-9D69-83A02726CF90@gmail.com>

Ping?

-- 
Anthony Ramine

Le 8 f?vr. 2013 ? 12:47, Anthony Ramine a ?crit :

> Hi,
> 
> It can, and here is a fix.
> 
> 	git fetch https://github.com/nox/otp.git fix-ssh-html-doc
> 
> 	https://github.com/nox/otp/compare/erlang:master...fix-ssh-html-doc
> 	https://github.com/nox/otp/compare/erlang:master...fix-ssh-html-doc.patch
> 
> Regards,
> 
> -- 
> Anthony Ramine
> 
> Le 8 f?vr. 2013 ? 12:30, Kostis Sagonas a ?crit :
> 
>> Every time I issue a 'make clean' the file
>> 
>>  lib/ssh/doc/html/SSH_protocols.png
>> 
>> which apparently is part of the code base of the master branch,  gets deleted.  Is this intentional?
>> 
>> The problem is that after the 'make clean', a subsequent 'git status' command shows the following:
>> 
>> # On branch master
>> # Changed but not updated:
>> #   (use "git add/rm <file>..." to update what will be committed)
>> #   (use "git checkout -- <file>..." to discard changes in working directory)
>> #
>> #       deleted:    lib/ssh/doc/html/SSH_protocols.png
>> #
>> 
>> 
>> Can this be fixed?
>> 
>> Kostis
>> _______________________________________________
>> erlang-bugs mailing list
>> erlang-bugs@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-bugs
> 


From fredrik@REDACTED  Mon Mar 18 17:04:55 2013
From: fredrik@REDACTED (Fredrik)
Date: Mon, 18 Mar 2013 17:04:55 +0100
Subject: [erlang-bugs] Minor annoyance after 'make clean'
In-Reply-To: <22ED1658-3AD6-49BD-9D69-83A02726CF90@gmail.com>
References: <5114E1C0.10901@cs.ntua.gr>
 <BCA61AA5-F4D4-4D58-9455-9E3721D4DB54@gmail.com>
 <22ED1658-3AD6-49BD-9D69-83A02726CF90@gmail.com>
Message-ID: <51473B27.5080608@erlang.org>

On 03/18/2013 02:08 PM, Anthony Ramine wrote:
> Ping?
>
Fetched,
Currently building in the 'pu' branch.
Thanks,

-- 

BR Fredrik Gustafsson
Erlang OTP Team


From smith.winston.101@REDACTED  Mon Mar 18 19:27:32 2013
From: smith.winston.101@REDACTED (Winston Smith)
Date: Mon, 18 Mar 2013 14:27:32 -0400
Subject: [erlang-bugs] Mnesia/R15B: TYPE ASSERTION FAILED,
 erl_term.c line 109 (when stopping mnesia)
In-Reply-To: <CADH-AwGS5ut88xFgpFUmmLzt0xfrQA75c5JSwVxJP8_DPw8iwQ@mail.gmail.com>
References: <CADH-AwGS5ut88xFgpFUmmLzt0xfrQA75c5JSwVxJP8_DPw8iwQ@mail.gmail.com>
Message-ID: <CADH-AwE9jiRdkO27n-wYoX+dfbAHK-GMLE4LYYtwaLeu86qUXg@mail.gmail.com>

On Mon, Apr 2, 2012 at 11:04 PM, Winston Smith
<smith.winston.101@REDACTED>wrote:

> I have run into the following issue with R15B cross compiled to an
> AVR32 (similar to ARM) system (no HiPE).
>
>
> (mynode@REDACTED)6> mnesia:stop().
> TYPE ASSERTION FAILED, file beam/erl_term.c, line 109: tag_val_def:
> 0x8e422b5c
> Aborted
>
>
> Interestingly, if I bring up a standalone erl, I don't get the assert,
> it segfaults instead:
>
>
> # erts-5.9/bin/erl
> Eshell V5.9  (abort with ^G)
> 1> mnesia:create_schema([node()]).
> ok
> 2> mnesia:start().
> ok
> 3> mnesia:stop().
> Segmentation fault


Just to follow up on this (for search engine completeness!) I cross
compiled R16B for the avr32 system and tried this out again -- the SEGV
issue seems to be resolved ... I'm not sure where/when it actually got
fixed.

Thanks,

-W.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130318/e452482c/attachment.htm>

From zerthurd@REDACTED  Thu Mar 21 07:40:21 2013
From: zerthurd@REDACTED (Maxim Treskin)
Date: Thu, 21 Mar 2013 13:40:21 +0700
Subject: [erlang-bugs] Dialyzer bug: incorrect duplicate modules
Message-ID: <CA+Km4sde2PHYyaNqhbpRyBxFcPGr85JT9GSttCm2KZtFOAem5w@mail.gmail.com>

Hello

At Montenegro Erlang Hackaton (
http://lanyrd.com/2013/herceg-novi-erlang-meetup/ , there were only two
people, unfortunately ) we found incorrect behaviour of Dialyzer.

Our project erroneous had a duplicated modules with the same name, but
different content. When we check it with dialyzer it show me something like
that:

Duplicate modules: [["/var/tmp/myproj/apps/myproj/ebin/psc_operate.beam",

 "/var/tmp/myproj/deps/somedep/ebin/amp_common_utils.beam"]]

Obviously it is not the same modules. So I had to search this bug and find
strange behaviour in dialyzer. Function lists:zip/2 called with two list,
where first is reversed list of modules as atom, and second is list of
filepaths for modules. And this list not always contains correspond
elements. Module with name some_module1 can be has filename like
abc_module55.beam. This is the cause of error.

This bug exists in R15B02 and R16.

I wrote such patch to fix bug, but I don't know whether this is solution or
not, though it works fine.

--- /opt/r16a/lib/dialyzer-2.5.4/src/dialyzer_analysis_callgraph.erl
 2013-01-31 12:55:53.210402846 +0700
+++ dialyzer_pa/dialyzer_analysis_callgraph.erl 2013-03-21
13:20:46.794991889 +0700
@@ -255,10 +255,18 @@
   CServer2 = dialyzer_codeserver:set_next_core_label(NextLabel, CServer),
   case Failed =:= [] of
     true ->
-      NewFiles = lists:zip(lists:reverse(Modules), Files),
+      %% Modules and Files have not the same order, so it is meaningless
to zip it
+      %% NewFiles = lists:zip(lists:reverse(Modules), Files),
+
       ModDict =
-        lists:foldl(fun({Mod, F}, Dict) -> dict:append(Mod, F, Dict) end,
-                    dict:new(), NewFiles),
+        lists:foldl(fun(F, Dict) ->
+                        ModFile = lists:last(filename:split(F)),
+                        Mod = filename:basename(ModFile, ".beam"),
+                        dict:append(Mod, F, Dict) end,
+                    dict:new(), Files),
+      %% ModDict =
+      %%   lists:foldl(fun({Mod, F}, Dict) -> dict:append(Mod, F, Dict)
end,
+      %%               dict:new(), NewFiles),
       check_for_duplicate_modules(ModDict);
     false ->
       Msg = io_lib:format("Could not scan the following file(s): ~p",


-- 
Max Treskin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130321/19641a45/attachment.htm>

From xramtsov@REDACTED  Thu Mar 21 10:40:26 2013
From: xramtsov@REDACTED (Evgeniy Khramtsov)
Date: Thu, 21 Mar 2013 19:40:26 +1000
Subject: [erlang-bugs] Dialyzer bug: incorrect duplicate modules
In-Reply-To: <CA+Km4sde2PHYyaNqhbpRyBxFcPGr85JT9GSttCm2KZtFOAem5w@mail.gmail.com>
References: <CA+Km4sde2PHYyaNqhbpRyBxFcPGr85JT9GSttCm2KZtFOAem5w@mail.gmail.com>
Message-ID: <514AD58A.9010208@gmail.com>

On 21.03.2013 16:40, Maxim Treskin wrote:
> Hello
>
> At Montenegro Erlang Hackaton ( 
> http://lanyrd.com/2013/herceg-novi-erlang-meetup/ , there were only 
> two people, unfortunately ) we found incorrect behaviour of Dialyzer.
>
> Our project erroneous had a duplicated modules with the same name, but 
> different content. When we check it with dialyzer it show me something 
> like that:
>
> Duplicate modules: [["/var/tmp/myproj/apps/myproj/ebin/psc_operate.beam",
>                     
>  "/var/tmp/myproj/deps/somedep/ebin/amp_common_utils.beam"]]
>
> Obviously it is not the same modules. So I had to search this bug and 
> find strange behaviour in dialyzer. Function lists:zip/2 called with 
> two list, where first is reversed list of modules as atom, and second 
> is list of filepaths for modules. And this list not always contains 
> correspond elements. Module with name some_module1 can be has filename 
> like abc_module55.beam. This is the cause of error.
>
> This bug exists in R15B02 and R16.
>
> I wrote such patch to fix bug, but I don't know whether this is 
> solution or not, though it works fine.
>
> --- /opt/r16a/lib/dialyzer-2.5.4/src/dialyzer_analysis_callgraph.erl   
>  2013-01-31 12:55:53.210402846 +0700
> +++ dialyzer_pa/dialyzer_analysis_callgraph.erl 2013-03-21 
> 13:20:46.794991889 +0700
> @@ -255,10 +255,18 @@
>    CServer2 = dialyzer_codeserver:set_next_core_label(NextLabel, CServer),
>    case Failed =:= [] of
>      true ->
> -      NewFiles = lists:zip(lists:reverse(Modules), Files),
> +      %% Modules and Files have not the same order, so it is 
> meaningless to zip it
> +      %% NewFiles = lists:zip(lists:reverse(Modules), Files),
> +
>        ModDict =
> -        lists:foldl(fun({Mod, F}, Dict) -> dict:append(Mod, F, Dict) end,
> -                    dict:new(), NewFiles),
> +        lists:foldl(fun(F, Dict) ->
> +                        ModFile = lists:last(filename:split(F)),
> +                        Mod = filename:basename(ModFile, ".beam"),
> +                        dict:append(Mod, F, Dict) end,
> +                    dict:new(), Files),
> +      %% ModDict =
> +      %%   lists:foldl(fun({Mod, F}, Dict) -> dict:append(Mod, F, 
> Dict) end,
> +      %%               dict:new(), NewFiles),
>        check_for_duplicate_modules(ModDict);
>      false ->
>        Msg = io_lib:format("Could not scan the following file(s): ~p",

I have the same problem. Thanks for the patch.

-- 
Regards,
Evgeniy Khramtsov, ProcessOne.
xmpp:xram@REDACTED


From bourinov@REDACTED  Thu Mar 21 11:10:00 2013
From: bourinov@REDACTED (Max Bourinov)
Date: Thu, 21 Mar 2013 11:10:00 +0100
Subject: [erlang-bugs] Dialyzer bug: incorrect duplicate modules
In-Reply-To: <514AD58A.9010208@gmail.com>
References: <CA+Km4sde2PHYyaNqhbpRyBxFcPGr85JT9GSttCm2KZtFOAem5w@mail.gmail.com>
 <514AD58A.9010208@gmail.com>
Message-ID: <CANsaZAgErDF9vNV_SgNz2OFKYhPgQDaogJszyHydfg1y=xy-pQ@mail.gmail.com>

Montenegro Erlang Hackaton was great indeed!

Thank you for your patch Max!


Best regards,
Max


On Thu, Mar 21, 2013 at 10:40 AM, Evgeniy Khramtsov <xramtsov@REDACTED>wrote:

> On 21.03.2013 16:40, Maxim Treskin wrote:
>
>> Hello
>>
>> At Montenegro Erlang Hackaton ( http://lanyrd.com/2013/herceg-**
>> novi-erlang-meetup/ <http://lanyrd.com/2013/herceg-novi-erlang-meetup/>, there were only two people, unfortunately ) we found incorrect behaviour
>> of Dialyzer.
>>
>> Our project erroneous had a duplicated modules with the same name, but
>> different content. When we check it with dialyzer it show me something like
>> that:
>>
>> Duplicate modules: [["/var/tmp/myproj/apps/**
>> myproj/ebin/psc_operate.beam",
>>                      "/var/tmp/myproj/deps/somedep/**
>> ebin/amp_common_utils.beam"]]
>>
>> Obviously it is not the same modules. So I had to search this bug and
>> find strange behaviour in dialyzer. Function lists:zip/2 called with two
>> list, where first is reversed list of modules as atom, and second is list
>> of filepaths for modules. And this list not always contains correspond
>> elements. Module with name some_module1 can be has filename like
>> abc_module55.beam. This is the cause of error.
>>
>> This bug exists in R15B02 and R16.
>>
>> I wrote such patch to fix bug, but I don't know whether this is solution
>> or not, though it works fine.
>>
>> --- /opt/r16a/lib/dialyzer-2.5.4/**src/dialyzer_analysis_**callgraph.erl
>>    2013-01-31 12:55:53.210402846 +0700
>> +++ dialyzer_pa/dialyzer_analysis_**callgraph.erl 2013-03-21
>> 13:20:46.794991889 +0700
>> @@ -255,10 +255,18 @@
>>    CServer2 = dialyzer_codeserver:set_next_**core_label(NextLabel,
>> CServer),
>>    case Failed =:= [] of
>>      true ->
>> -      NewFiles = lists:zip(lists:reverse(**Modules), Files),
>> +      %% Modules and Files have not the same order, so it is meaningless
>> to zip it
>> +      %% NewFiles = lists:zip(lists:reverse(**Modules), Files),
>> +
>>        ModDict =
>> -        lists:foldl(fun({Mod, F}, Dict) -> dict:append(Mod, F, Dict) end,
>> -                    dict:new(), NewFiles),
>> +        lists:foldl(fun(F, Dict) ->
>> +                        ModFile = lists:last(filename:split(F)),
>> +                        Mod = filename:basename(ModFile, ".beam"),
>> +                        dict:append(Mod, F, Dict) end,
>> +                    dict:new(), Files),
>> +      %% ModDict =
>> +      %%   lists:foldl(fun({Mod, F}, Dict) -> dict:append(Mod, F, Dict)
>> end,
>> +      %%               dict:new(), NewFiles),
>>        check_for_duplicate_modules(**ModDict);
>>      false ->
>>        Msg = io_lib:format("Could not scan the following file(s): ~p",
>>
>
> I have the same problem. Thanks for the patch.
>
> --
> Regards,
> Evgeniy Khramtsov, ProcessOne.
> xmpp:xram@REDACTED
>
> ______________________________**_________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://erlang.org/mailman/**listinfo/erlang-bugs<http://erlang.org/mailman/listinfo/erlang-bugs>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130321/41741aca/attachment.htm>

From sgolovan@REDACTED  Sun Mar 24 07:58:58 2013
From: sgolovan@REDACTED (Sergei Golovan)
Date: Sun, 24 Mar 2013 10:58:58 +0400
Subject: [erlang-bugs] Bug with named subpatterns in re module
Message-ID: <CAOq2pXFvqt749WMJ8uZW3kk07ES3Djg9MznKcxfcwntHV-NqiQ@mail.gmail.com>

Hi!

Chris King recently discovered a bug in re module. Appears that the
matched named subpatterns are not always returned.

The following command works correctly:
1> re:run("bar", "^(?<a>foo)(?<b>bla)$|^(?<a>[[:word:]]+)$",
[dupnames, {capture, [a, b], list}]).
{match,["bar",[]]}

But semantically the same one doesn't (note the swapped <a> and <b>):
1> re:run("bar", "^(?<b>foo)(?<a>bla)$|^(?<b>[[:word:]]+)$",
[dupnames, {capture, [a, b], list}]).
{match,[[],[]]}

In both cases the second branch matches, but only the first command
returns the required subpattern.

The bug is reproducible in R16B.

Cheers!
-- 
Sergei Golovan


From pan@REDACTED  Thu Mar 28 12:35:54 2013
From: pan@REDACTED (Patrik Nyblom)
Date: Thu, 28 Mar 2013 12:35:54 +0100
Subject: [erlang-bugs] Bug with named subpatterns in re module
In-Reply-To: <CAOq2pXFvqt749WMJ8uZW3kk07ES3Djg9MznKcxfcwntHV-NqiQ@mail.gmail.com>
References: <CAOq2pXFvqt749WMJ8uZW3kk07ES3Djg9MznKcxfcwntHV-NqiQ@mail.gmail.com>
Message-ID: <51542B1A.2010406@erlang.org>

Hi!

I'm unsure of the nature of this bug. What are you actually expecting as 
a return when you use duplicate names and named capture? Both instances 
of the name, "the right instance" of the name or a badarg?

I.e would you like

re:run("bar", "^(?<b>foo)(?<a>bla)$|^(?<b>[[:word:]]+)$",[dupnames, {capture, [a, b], list}]).

to give the same result as:

re:run("bar", "^(?<b>foo)(?<a>bla)$|^(?<c>[[:word:]]+)$",[dupnames, {capture, [a, b, c], list}]).

? Or return the second instance if that matches, but the first instance 
if that one matches? Or should we simply not allow it? The thing is that 
even with dupnames, you have a varying amount of subexpressions. 
Capturing 'all' (or rather 'all_but_first') will show you that this call 
returns three distinct subexpressions, of which two happen to have the 
same name (regardless of the names). If the part before | matches, the 
result is only two subexpressions, as the first two subexpressions 
match. No duplicate naming will change this. There is no real "select 
the one that matches" functionality in giving two subexpressions the 
same name.

PCRE just picks one of the occurences of a name when you ask for it - in 
your last example the occurence you were not expecting, but that's more 
or less random, the first example would give unexpected results if the 
first part matched. PCRE has no functionality to pick all occurences of 
a name, but that could of course be changed if there was some 
understandable semantics that should be implemented. I think badarg 
exception is the way to go though...

Cheers,
/Patrik

On 03/24/2013 07:58 AM, Sergei Golovan wrote:
> Hi!
>
> Chris King recently discovered a bug in re module. Appears that the
> matched named subpatterns are not always returned.
>
> The following command works correctly:
> 1> re:run("bar", "^(?<a>foo)(?<b>bla)$|^(?<a>[[:word:]]+)$",
> [dupnames, {capture, [a, b], list}]).
> {match,["bar",[]]}
>
> But semantically the same one doesn't (note the swapped <a> and <b>):
> 1> re:run("bar", "^(?<b>foo)(?<a>bla)$|^(?<b>[[:word:]]+)$",
> [dupnames, {capture, [a, b], list}]).
> {match,[[],[]]}
>
> In both cases the second branch matches, but only the first command
> returns the required subpattern.
>
> The bug is reproducible in R16B.
>
> Cheers!


From sgolovan@REDACTED  Thu Mar 28 12:59:04 2013
From: sgolovan@REDACTED (Sergei Golovan)
Date: Thu, 28 Mar 2013 15:59:04 +0400
Subject: [erlang-bugs] Bug with named subpatterns in re module
In-Reply-To: <51542B1A.2010406@erlang.org>
References: <CAOq2pXFvqt749WMJ8uZW3kk07ES3Djg9MznKcxfcwntHV-NqiQ@mail.gmail.com>
 <51542B1A.2010406@erlang.org>
Message-ID: <CAOq2pXE9g4r+ZLorDw8TEE6pGQsb3APCeia-EDz6i4CTpBehZg@mail.gmail.com>

Hi!

On Thu, Mar 28, 2013 at 3:35 PM, Patrik Nyblom <pan@REDACTED> wrote:
>
> I'm unsure of the nature of this bug. What are you actually expecting as a
> return when you use duplicate names and named capture? Both instances of the
> name, "the right instance" of the name or a badarg?

At least the results should not depend on the pattern names.

When I run the following Perl script:

#! /usr/bin/perl

$var = 'bar';
$var =~ m/^(?<a>foo)(?<b>bla)$|^(?<a>[[:word:]]+)$/;
pplus();
$var =~ m/^(?<b>foo)(?<a>bla)$|^(?<b>[[:word:]]+)$/;
pplus();

sub pplus {
    foreach (keys %+) {
        print "$_: $+{$_}\n";
    }
}

It prints the following:

a: bar
b: bar

Which means that it captures the only matching pattern. Perl docs say
that in case of duplicate names the leftmost matched one is captured.
I would say that the less the difference in behavior in re and the
original Perl regexp the better.

>
> I.e would you like
>
>
> re:run("bar", "^(?<b>foo)(?<a>bla)$|^(?<b>[[:word:]]+)$",[dupnames,
> {capture, [a, b], list}]).
>
> to give the same result as:
>
> re:run("bar", "^(?<b>foo)(?<a>bla)$|^(?<c>[[:word:]]+)$",[dupnames,
> {capture, [a, b, c], list}]).
>
> ? Or return the second instance if that matches, but the first instance if
> that one matches? Or should we simply not allow it? The thing is that even
> with dupnames, you have a varying amount of subexpressions. Capturing 'all'
> (or rather 'all_but_first') will show you that this call returns three
> distinct subexpressions, of which two happen to have the same name
> (regardless of the names). If the part before | matches, the result is only
> two subexpressions, as the first two subexpressions match. No duplicate
> naming will change this. There is no real "select the one that matches"
> functionality in giving two subexpressions the same name.
>
> PCRE just picks one of the occurences of a name when you ask for it - in
> your last example the occurence you were not expecting, but that's more or
> less random, the first example would give unexpected results if the first
> part matched. PCRE has no functionality to pick all occurences of a name,
> but that could of course be changed if there was some understandable
> semantics that should be implemented. I think badarg exception is the way to
> go though...

Well, re manpage says that dupnames is helpful in case when it's
certain that two subpatterns with the same name can't be matched
simultaneously. Fortunately, the considered regexp falls in this
category. So, I guess that either dupnames has to be removed at all,
or something should be done with it.

Cheers!
-- 
Sergei Golovan


From pan@REDACTED  Thu Mar 28 17:13:25 2013
From: pan@REDACTED (Patrik Nyblom)
Date: Thu, 28 Mar 2013 17:13:25 +0100
Subject: [erlang-bugs] Bug with named subpatterns in re module
In-Reply-To: <CAOq2pXE9g4r+ZLorDw8TEE6pGQsb3APCeia-EDz6i4CTpBehZg@mail.gmail.com>
References: <CAOq2pXFvqt749WMJ8uZW3kk07ES3Djg9MznKcxfcwntHV-NqiQ@mail.gmail.com>
 <51542B1A.2010406@erlang.org>
 <CAOq2pXE9g4r+ZLorDw8TEE6pGQsb3APCeia-EDz6i4CTpBehZg@mail.gmail.com>
Message-ID: <51546C25.3090801@erlang.org>

On 03/28/2013 12:59 PM, Sergei Golovan wrote:
> Hi!
>
> On Thu, Mar 28, 2013 at 3:35 PM, Patrik Nyblom <pan@REDACTED> wrote:
>> I'm unsure of the nature of this bug. What are you actually expecting as a
>> return when you use duplicate names and named capture? Both instances of the
>> name, "the right instance" of the name or a badarg?
> At least the results should not depend on the pattern names.
No, definitely not - the results now are more or less random, so 
something needs to be done-
>
> When I run the following Perl script:
>
> #! /usr/bin/perl
>
> $var = 'bar';
> $var =~ m/^(?<a>foo)(?<b>bla)$|^(?<a>[[:word:]]+)$/;
> pplus();
> $var =~ m/^(?<b>foo)(?<a>bla)$|^(?<b>[[:word:]]+)$/;
> pplus();
>
> sub pplus {
>      foreach (keys %+) {
>          print "$_: $+{$_}\n";
>      }
> }
>
> It prints the following:
>
> a: bar
> b: bar
>
> Which means that it captures the only matching pattern. Perl docs say
> that in case of duplicate names the leftmost matched one is captured.
> I would say that the less the difference in behavior in re and the
> original Perl regexp the better.
Okay, thanks for explaining!

The leftmost matching might be doable - the pcre_get_stringtable_entries 
can be used and we could then extract the first entry for that name that 
is bound. We now use pcre_get_stringnumber, which gives a random 
instance of that name and should not be used with dupnames.

What about "all" then, it returns all bound indexes and will possibly 
return the duplicate name's binding twice, once as [] and once as "bar" 
(in your example). Should it skip a binding where the same name is bound 
later, or should it return them all, as it does now? 'all' kind of means 
"all indexes" rather than "all names". Should we add "all_names" to get 
the behavior that you demonstrate in your Perl program? Or maybe just 
let 'all' be as is and just fix the thing where you specifically list 
names... Hmmm - thoughts?
>
>> I.e would you like
>>
>>
>> re:run("bar", "^(?<b>foo)(?<a>bla)$|^(?<b>[[:word:]]+)$",[dupnames,
>> {capture, [a, b], list}]).
>>
>> to give the same result as:
>>
>> re:run("bar", "^(?<b>foo)(?<a>bla)$|^(?<c>[[:word:]]+)$",[dupnames,
>> {capture, [a, b, c], list}]).
>>
>> ? Or return the second instance if that matches, but the first instance if
>> that one matches? Or should we simply not allow it? The thing is that even
>> with dupnames, you have a varying amount of subexpressions. Capturing 'all'
>> (or rather 'all_but_first') will show you that this call returns three
>> distinct subexpressions, of which two happen to have the same name
>> (regardless of the names). If the part before | matches, the result is only
>> two subexpressions, as the first two subexpressions match. No duplicate
>> naming will change this. There is no real "select the one that matches"
>> functionality in giving two subexpressions the same name.
>>
>> PCRE just picks one of the occurences of a name when you ask for it - in
>> your last example the occurence you were not expecting, but that's more or
>> less random, the first example would give unexpected results if the first
>> part matched. PCRE has no functionality to pick all occurences of a name,
>> but that could of course be changed if there was some understandable
>> semantics that should be implemented. I think badarg exception is the way to
>> go though...
> Well, re manpage says that dupnames is helpful in case when it's
> certain that two subpatterns with the same name can't be matched
> simultaneously. Fortunately, the considered regexp falls in this
> category. So, I guess that either dupnames has to be removed at all,
> or something should be done with it.
Funny that I wrote that, when I very well knew that the PCRE API's I 
used did not work with dupnames :)

Well, removing dupnames might be the easiest, but as there are perl 
semantics we can imitate, I think we should give it a try!
>
> Cheers!
Cheers,
/Patrik


From sgolovan@REDACTED  Thu Mar 28 17:52:56 2013
From: sgolovan@REDACTED (Sergei Golovan)
Date: Thu, 28 Mar 2013 20:52:56 +0400
Subject: [erlang-bugs] Bug with named subpatterns in re module
In-Reply-To: <51546C25.3090801@erlang.org>
References: <CAOq2pXFvqt749WMJ8uZW3kk07ES3Djg9MznKcxfcwntHV-NqiQ@mail.gmail.com>
 <51542B1A.2010406@erlang.org>
 <CAOq2pXE9g4r+ZLorDw8TEE6pGQsb3APCeia-EDz6i4CTpBehZg@mail.gmail.com>
 <51546C25.3090801@erlang.org>
Message-ID: <CAOq2pXE2dri3Lp2NcQPJ8hwFkiL7Zj-H7FsZpLa5RMYJ_cwX=g@mail.gmail.com>

Hi!

On Thu, Mar 28, 2013 at 8:13 PM, Patrik Nyblom <pan@REDACTED> wrote:
>
> Well, removing dupnames might be the easiest, but as there are perl
> semantics we can imitate, I think we should give it a try!

I should say that PCRE manual describes named subpatterns using the
following regexp:

(?<DN>Mon|Fri|Sun)(?:day)?|
(?<DN>Tue)(?:sday)?|
(?<DN>Wed)(?:nesday)?|
(?<DN>Thu)(?:rsday)?|
(?<DN>Sat)(?:urday)?

(search 'NAMED SUBPATTERNS' in http://www.pcre.org/pcre.txt). And currently

1> re:run("Monday",
"(?<DN>Mon|Fri|Sun)(?:day)?|(?<DN>Tue)(?:sday)?|(?<DN>Wed)(?:nesday)?|(?<DN>Thu)(?:rsday)?|(?<DN>Sat)(?:urday)?",
[dupnames, {capture, ['DN'], list}]).
{match,[[]]}

doesn't work. If I leave only one branch it works fine:
2> re:run("Monday", "(?<DN>Mon|Fri|Sun)(?:day)?", [dupnames, {capture,
['DN'], list}]).
{match,["Mon"]}

Cheers!
-- 
Sergei Golovan


From erlangpro@REDACTED  Fri Mar 29 21:34:06 2013
From: erlangpro@REDACTED (Josh =?iso-8859-1?Q?March=E1n?=)
Date: Fri, 29 Mar 2013 16:34:06 -0400
Subject: [erlang-bugs] R16 breaks dots
Message-ID: <20130329203406.GB1251@zushakon>

It's widely known that it's useful to be able to use
dots/periods/full-stops (choose your dialect) in Erlang code to maximize
compatibility, specially with more modern languages like JavaScript.

Unfortunately for the world of Erlang, R16 breaks something that has been
tremendously useful. I am no longer able to do this:

console.log("Hello from erlang", 1, 2, "More string here", TildePMe)

which is leading to a lot of confusion when I regularly switch between
JavaScript and Erlang.

I would like to formally request that $. once again become a valid
character in Erlang identifiers. Until such a time, I regret I must divest
from version upgrades. I look forward to your response (and prompt bugfix).
-- 
Josh March?n
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 490 bytes
Desc: not available
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130329/efec6fbe/attachment.bin>

From norton@REDACTED  Sat Mar 30 08:45:13 2013
From: norton@REDACTED (Joseph Wayne Norton)
Date: Sat, 30 Mar 2013 16:45:13 +0900
Subject: [erlang-bugs] R16 breaks dots
In-Reply-To: <20130329203406.GB1251@zushakon>
References: <20130329203406.GB1251@zushakon>
Message-ID: <7D1A4A43-6783-4ACF-936F-C644BEF8839E@lovely.email.ne.jp>


Josh -

I'm just curious but shouldn't quoted atoms work for your needs?

  'console.log'("Hello from erlang", 1, 2, "More string here", TildePMe)

thanks,

Joe N.


On Mar 30, 2013, at 5:34 AM, Josh March?n <erlangpro@REDACTED> wrote:

> It's widely known that it's useful to be able to use
> dots/periods/full-stops (choose your dialect) in Erlang code to maximize
> compatibility, specially with more modern languages like JavaScript.
> 
> Unfortunately for the world of Erlang, R16 breaks something that has been
> tremendously useful. I am no longer able to do this:
> 
> console.log("Hello from erlang", 1, 2, "More string here", TildePMe)
> 
> which is leading to a lot of confusion when I regularly switch between
> JavaScript and Erlang.
> 
> I would like to formally request that $. once again become a valid
> character in Erlang identifiers. Until such a time, I regret I must divest
> from version upgrades. I look forward to your response (and prompt bugfix).
> -- 
> Josh March?n
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://erlang.org/mailman/listinfo/erlang-bugs


From n.oxyde@REDACTED  Sat Mar 30 10:42:04 2013
From: n.oxyde@REDACTED (Anthony Ramine)
Date: Sat, 30 Mar 2013 10:42:04 +0100
Subject: [erlang-bugs] R16 breaks dots
In-Reply-To: <20130329203406.GB1251@zushakon>
References: <20130329203406.GB1251@zushakon>
Message-ID: <13603AE6-E8E6-4EAD-899F-35D9CD7D2187@gmail.com>

Hello,

While I find the prospect of refusing to upgrade Erlang because it can't be made to look like JavaScript anymore (are you seriously serious?), I do want to know why dots aren't allowed in atoms anymore and would like to see them back too.

It was pretty useful to be able to write unquoted fully-qualified node names in the prompt, e.g. foo@REDACTED

Furthermore, it feels to me like their removal was a mistake, as demonstrated by this:

1> foo.bar.
* 1: syntax error before: '.'
1> foo. bar.
foo
2> bar.
bar

What you can see here is that the blanks after a dot are still mandatory to properly parse a '.' character as a 'dot' token, terminating an expression in the shell (or a form in a module), this was mandatory to distinguish dot terminators from dots in atoms.

If dots are really to not be allowed anymore in atoms, the blanks should be made optional, to be consistent with the rest of the language where blanks are optional before or after a symbol (with the notable exception of a match '=' followed by a binary literal '<<...>>').

Anyway, for the original complaint of Erlang's syntax not being the same as JavaScript and compatibility concerns, it should be noted that *syntax is nothing* and that all that matters are semantics. The ones from JS being at antipodes from Erlang's, I think it's a good thing you can't mistake one for another. It should also be noted that there is nothing "more modern" about JS' syntax when compared to the one of Erlang.

Regards,

-- 
Anthony Ramine

Le 29 mars 2013 ? 21:34, Josh March?n a ?crit :

> It's widely known that it's useful to be able to use
> dots/periods/full-stops (choose your dialect) in Erlang code to maximize
> compatibility, specially with more modern languages like JavaScript.
> 
> Unfortunately for the world of Erlang, R16 breaks something that has been
> tremendously useful. I am no longer able to do this:
> 
> console.log("Hello from erlang", 1, 2, "More string here", TildePMe)
> 
> which is leading to a lot of confusion when I regularly switch between
> JavaScript and Erlang.
> 
> I would like to formally request that $. once again become a valid
> character in Erlang identifiers. Until such a time, I regret I must divest
> from version upgrades. I look forward to your response (and prompt bugfix).
> -- 
> Josh March?n


From mononcqc@REDACTED  Sat Mar 30 14:42:02 2013
From: mononcqc@REDACTED (Fred Hebert)
Date: Sat, 30 Mar 2013 09:42:02 -0400
Subject: [erlang-bugs] R16 breaks dots
In-Reply-To: <20130329203406.GB1251@zushakon>
References: <20130329203406.GB1251@zushakon>
Message-ID: <20130330134201.GA22837@ferdmbp.local>

>From memory, the problem is that support for periods in atoms was there
in order to support packages, which would let you have an atom of the
form topdir.subdir.module to represent items.

When packages (an experimental feature few people used) got removed
along with parametrized modules in R16, the code that allowed full stops
in atoms also got the axe.

Since then (and before packages), what you need to do to get that is
wrap things up in single quotes. 'console.log'("Hello from Erlang").

Now for a less serious thing, I recommend you use the following in
Javascript:

    var Log = function(args) { console.log(args) }

And the following in Erlang:

    Log = fun(Args) -> io:format(Args) end

Which means you can now use the fantastic 'Log("Hello!")' function
everywhere you go!

As Anthony said, I'm a bit surprised you're not willing to upgrade
because the languages look different (they should look different, given
they *are* different, in my opinion).

Regards,
Fred.


On 03/29, Josh March???n wrote:
> It's widely known that it's useful to be able to use
> dots/periods/full-stops (choose your dialect) in Erlang code to maximize
> compatibility, specially with more modern languages like JavaScript.
> 
> Unfortunately for the world of Erlang, R16 breaks something that has been
> tremendously useful. I am no longer able to do this:
> 
> console.log("Hello from erlang", 1, 2, "More string here", TildePMe)
> 
> which is leading to a lot of confusion when I regularly switch between
> JavaScript and Erlang.
> 
> I would like to formally request that $. once again become a valid
> character in Erlang identifiers. Until such a time, I regret I must divest
> from version upgrades. I look forward to your response (and prompt bugfix).
> -- 
> Josh March?n


> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://erlang.org/mailman/listinfo/erlang-bugs


From carlsson.richard@REDACTED  Sat Mar 30 23:53:43 2013
From: carlsson.richard@REDACTED (Richard Carlsson)
Date: Sat, 30 Mar 2013 23:53:43 +0100
Subject: [erlang-bugs] R16 breaks dots
In-Reply-To: <13603AE6-E8E6-4EAD-899F-35D9CD7D2187@gmail.com>
References: <20130329203406.GB1251@zushakon>
 <13603AE6-E8E6-4EAD-899F-35D9CD7D2187@gmail.com>
Message-ID: <51576CF7.6010905@gmail.com>

On 2013-03-30 10:42, Anthony Ramine wrote:
> I do want to know why dots aren't allowed in atoms anymore
> and would like to see them back too.

As Fred already mentioned, this feature was added as part of the 
"packages" and was removed along with them.

> It was pretty useful to be able to write unquoted fully-qualified
> node names in the prompt, e.g. foo@REDACTED

I think that many agree on this, and maybe the OTP team can be convinced 
to take this part back. It should be pretty simple to extract the 
relevant code from the commit that removes packages.

> Furthermore, it feels to me like their removal was a mistake, as
> demonstrated by this:
>
> 1> foo.bar. * 1: syntax error before: '.' 1> foo. bar. foo 2> bar.
> bar
>
> What you can see here is that the blanks after a dot are still
> mandatory to properly parse a '.' character as a 'dot' token,
> terminating an expression in the shell (or a form in a module), this
> was mandatory to distinguish dot terminators from dots in atoms.
>
> If dots are really to not be allowed anymore in atoms, the blanks
> should be made optional, to be consistent with the rest of the
> language where blanks are optional before or after a symbol (with the
> notable exception of a match '=' followed by a binary literal
> '<<...>>').

This is not quite how the grammar works. First of all, the 'dot' token 
is identified as a "." followed by whitespace or a comment or EOF, and 
the packages addition did not change that. However, periods that are not 
a dot token or part of any other token are seen as '.' tokens. For example:

1> erl_scan:string("foo.bar. ").
{ok,[{atom,1,foo},{'.',1},{atom,1,bar},{dot,1}],1}
2> erl_scan:string("foo. bar. ").
{ok,[{atom,1,foo},{dot,1},{atom,1,bar},{dot,1}],1}

Now, the Erlang parser works on complete "forms" at a time - these are 
the token sequences that are terminated by dot tokens. In the first 
case, you have one form containing three tokens. In the second case, you 
have two forms containing one token each. Blanks cannot be made optional 
after periods, because you must be able to distinguish between token 
sequences like these.

It's also the case that you can't just change the scanning of atoms to 
allow periods as part of the atom token - in that case, the scanner 
would report a single atom for "foo.bar" instead of three tokens 'foo' 
'.' 'bar', and then the grammar would not be able to identify phrases 
like "Rec#foo.bar" or "#foo.bar". To support dotted atoms, the packages 
added a grammar rule that allowed a seqence <atom> '.' ... <atom> to be 
merged into a single atom unless it was part of another rule such as '#' 
<atom> '.' <atom>. (I think that Haskell had to do some similar tricks 
with their grammar to allow dotted names.) This could easily be put back 
in there. But at no point has it been the case in Erlang that unquoted 
atom tokens could contain periods.

     /Richard


From n.oxyde@REDACTED  Sun Mar 31 16:22:37 2013
From: n.oxyde@REDACTED (Anthony Ramine)
Date: Sun, 31 Mar 2013 16:22:37 +0200
Subject: [erlang-bugs] Bit string generators, unsized binaries,
	modules and the REPL
Message-ID: <649B6ECF-85AD-40BB-9CB1-C04DC348C499@gmail.com>

Hello,

People on IRC noticed a difference between compiled modules and the REPL in how some binary generators are handled.

Compare:

$ cat unsized_bin_gen_pat.erl
-module(unsized_bin_gen_pat).
-export([t/0]).
t() -> << <<X,Tail/binary>> || <<X,Tail/binary>> <= <<1,2,3>> >>.
$ erlc unsized_bin_gen_pat.erl
$ erl

1> % compiled
1> unsized_bin_gen_pat:t().
<<1,2,3,2,3,3>>
2> % evaluated
2> << <<X,Tail/binary>> || <<X,Tail/binary>> <= <<1,2,3>> >>.
<<1,2,3>>

I don't think the compiler should be changed to behave like the REPL, nor I think the REPL should be changed to behave like the compiler. Instead, I think an unsized binary tail in the pattern of a binary generator does not make sense, and this should happen:

$ erlc unsized_bin_gen_pat.erl
unsized_bin_gen_pat.erl:3: binary fields without size are not allowed in patterns of bit string generators

This patch implements this new error and simplifies how v3_core works with forbidden unsized tail segments in patterns of bit string generators.

	git fetch https://github.com/nox/otp illegal-bitstring-gen-pattern

	https://github.com/nox/otp/compare/erlang:maint...illegal-bitstring-gen-pattern
	https://github.com/nox/otp/compare/erlang:maint...illegal-bitstring-gen-pattern.patch

Looking at the commit 5daa001 by Bj?rn Gustavsson "Don't generate multiple tail segments in binary matching", this patch will probably by rejected as it seems the compiler behaves as wanted by the OTP team. If this is indeed the case, erl_eval should be fixed.

Regards,

-- 
Anthony Ramine