From ferenc.holzhauser@REDACTED Fri Mar 1 14:07:08 2013 From: ferenc.holzhauser@REDACTED (Ferenc Holzhauser) Date: Fri, 1 Mar 2013 14:07:08 +0100 Subject: [erlang-bugs] R16B asn1 incompatibility could be more explicitly stated in readme Message-ID: Hi, After updating my development machine to R16B a project that uses ASN1 encoding/decoding stopped working. I need to recompile the ASN1 files (old generated modules try to use asn1rt_ber_bin_v2 which disappeared in R16B) and also change code for the new binary return of encode. Although I'd love to have them backward compatible so I can try the new nice things "for free" I'm not complaining at about these obvious improvements. After a certain level of refactoring, backward compatibility is difficult to keep. There are hints in the readme but IMO this incompatibility should be a bit more explicitly mentioned. BR, Ferenc -------------- next part -------------- An HTML attachment was scrubbed... URL: From pan@REDACTED Fri Mar 1 15:28:49 2013 From: pan@REDACTED (Patrik Nyblom) Date: Fri, 1 Mar 2013 15:28:49 +0100 Subject: [erlang-bugs] make -j 16 fails when ./configure --with-dynamic-trace=systemtap In-Reply-To: <512F7A2A.9030008@softlab.ntua.gr> References: <510BD9E6.5040604@softlab.ntua.gr> <512E39E5.7030002@softlab.ntua.gr> <512F23E4.3070607@erlang.org> <512F7A2A.9030008@softlab.ntua.gr> Message-ID: <5130BB21.9050504@erlang.org> On 02/28/2013 04:39 PM, Yiannis Tsiouris wrote: > Hi Patrik, > > On 02/28/2013 11:31 AM, Patrik Nyblom wrote: >> On 02/27/2013 05:52 PM, Yiannis Tsiouris wrote: >>> On 02/01/2013 05:06 PM, Yiannis Tsiouris wrote: >>>> I'm trying to build an Erlang/OTP system configured with >>>> --with-dynamic-trace=systemtap and it fails with: >>>>> beam/dtrace-wrapper.h:49:27: error: erlang_dtrace.h: No such file or >>>> directory >>>> >>>> I attach the full log for details. >>>> >>>> Let me state that this works well when I do a simple make (without the >>>> -j flag). Is this a known issue? >>> >>> Has anyone done anothing for this? Because it's still failing... >> Nope. Wasn't severe enough to make it into the last sprint for R16B. >> >> Can you try the attached patch and see if it works for you? > It works great! Thanks for finding some time to work on this! I > suspect that this is going to be committed in the erlang/otp/master > branch (and not included in R16B), right? Yes, or rather in the maint branch (future R16B01), and via that into master master. > > yiannis > /Patrik From essen@REDACTED Sat Mar 2 21:17:35 2013 From: essen@REDACTED (=?ISO-8859-1?Q?Lo=EFc_Hoguin?=) Date: Sat, 02 Mar 2013 21:17:35 +0100 Subject: [erlang-bugs] More HiPE issues with binaries Message-ID: <51325E5F.7090000@ninenines.eu> Hello, Cowboy doesn't work when compiled with HiPE. When using curl on a simple hello world example, it sometimes work as expected, sometimes return a 408 timeout error. When using http_load (http://acme.com/software/http_load/) on the same example, it sometimes work and sometimes throws a weird function_clause error. =ERROR REPORT==== 2-Mar-2013::21:13:54 === Error in process <0.26124.0> with exit value: {function_clause,[{cowboy_protocol,parse_hd_name,9,[]},{lists,zip,2,[]}]} As you can guess, lists:zip/2 doesn't call cowboy_protocol:parse_hd_name/9. Someone else reported a similar issue with the stacktrace in another project on IRC. Same result with R15B03 and R16B. Here are the steps to reproduce. Sorry it's not the smallest download, I can't isolate: git clone git://github.com/extend/cowboy.git cd cowboy/examples/hello_world rebar get-deps compile cd deps/cowboy ERLC_OPTS=+native make clean app cd - ./start.sh Then with curl: curl -i http://localhost:8080 It will intermittently return 200 or 408. With http_load: echo "http://localhost:8080" > urls.txt http_load -parallel 500 -seconds 10 urls.txt It will print a lot of these weird errors: =ERROR REPORT==== 2-Mar-2013::21:13:54 === Error in process <0.26098.0> with exit value: {function_clause,[{cowboy_protocol,parse_hd_name,9,[]},{lists,zip,2,[]}]} Tell me how I can help get this fixed. -- Lo?c Hoguin Erlang Cowboy Nine Nines http://ninenines.eu From kostis@REDACTED Sat Mar 2 21:59:32 2013 From: kostis@REDACTED (Kostis Sagonas) Date: Sat, 02 Mar 2013 21:59:32 +0100 Subject: [erlang-bugs] More HiPE issues with binaries In-Reply-To: <51325E5F.7090000@ninenines.eu> References: <51325E5F.7090000@ninenines.eu> Message-ID: <51326834.5070901@cs.ntua.gr> On 03/02/2013 09:17 PM, Lo?c Hoguin wrote: > Hello, > > Cowboy doesn't work when compiled with HiPE. When using curl on a simple > hello world example, it sometimes work as expected, sometimes return a > 408 timeout error. When using http_load > (http://acme.com/software/http_load/) on the same example, it sometimes > work and sometimes throws a weird function_clause error. > > =ERROR REPORT==== 2-Mar-2013::21:13:54 === > Error in process <0.26124.0> with exit value: > {function_clause,[{cowboy_protocol,parse_hd_name,9,[]},{lists,zip,2,[]}]} > > As you can guess, lists:zip/2 doesn't call > cowboy_protocol:parse_hd_name/9. Someone else reported a similar issue > with the stacktrace in another project on IRC. > > Same result with R15B03 and R16B. > > Here are the steps to reproduce. Sorry it's not the smallest download, I > can't isolate: > > ... > > Tell me how I can help get this fixed. Hi Lo?c, One thing to know is that the stack traces that are produced when running native code are not as precise as those when running BEAM byte code. In particular, the stack (naturally) does not contain frames for tail calls and the stack walking component may occasionally be confused by mode-switches (e.g. byte code calling native code and vice versa). The latter is what most probably is happening here: you are most probably running with the 'lists' module not natively compiled. Anyway, I'll put it on my TODO list to look at it but this period I am swamped. It would help to see if the bug persists if you configure with --enable-native-libs (if it does not then it's most probably something in the mode switch part) or if you can minimize it further to something with fewer cowboy files compiled to native code or at least something that always exhibits the error. Cheers, Kostis From kostis@REDACTED Sun Mar 3 15:16:27 2013 From: kostis@REDACTED (Kostis Sagonas) Date: Sun, 03 Mar 2013 15:16:27 +0100 Subject: [erlang-bugs] More HiPE issues with binaries In-Reply-To: <51325E5F.7090000@ninenines.eu> References: <51325E5F.7090000@ninenines.eu> Message-ID: <51335B3B.2070206@cs.ntua.gr> On 03/02/2013 09:17 PM, Lo?c Hoguin wrote: > Hello, > > Cowboy doesn't work when compiled with HiPE. When using curl on a simple > hello world example, it sometimes work as expected, sometimes return a > 408 timeout error. When using http_load > (http://acme.com/software/http_load/) on the same example, it sometimes > work and sometimes throws a weird function_clause error. > > ... > > Here are the steps to reproduce. Sorry it's not the smallest download, I > can't isolate: OK, I've spent two hours on this and was able to minimize down to file cowboy_protocol.erl, which seems to be responsible for the behavior you are reporting. With this file compiled to BEAM byte code and everything else compiled to native code, cowboy seems to be working fine on my tests. Can you please confirm? If this file is the problematic one, perhaps you can disable native code compilation just for it for the time being. Also, it would help me if you trace all the calls to its functions and check whether their returns differ between byte code and native code execution. I will look more into it when I find some time... Kostis From essen@REDACTED Mon Mar 4 22:57:12 2013 From: essen@REDACTED (=?ISO-8859-1?Q?Lo=EFc_Hoguin?=) Date: Mon, 04 Mar 2013 22:57:12 +0100 Subject: [erlang-bugs] More HiPE issues with binaries In-Reply-To: <51335B3B.2070206@cs.ntua.gr> References: <51325E5F.7090000@ninenines.eu> <51335B3B.2070206@cs.ntua.gr> Message-ID: <513518B8.8000603@ninenines.eu> On 03/03/2013 03:16 PM, Kostis Sagonas wrote: > On 03/02/2013 09:17 PM, Lo?c Hoguin wrote: >> Hello, >> >> Cowboy doesn't work when compiled with HiPE. When using curl on a simple >> hello world example, it sometimes work as expected, sometimes return a >> 408 timeout error. When using http_load >> (http://acme.com/software/http_load/) on the same example, it sometimes >> work and sometimes throws a weird function_clause error. >> >> ... >> >> Here are the steps to reproduce. Sorry it's not the smallest download, I >> can't isolate: > > OK, I've spent two hours on this and was able to minimize down to file > cowboy_protocol.erl, which seems to be responsible for the behavior you > are reporting. With this file compiled to BEAM byte code and everything > else compiled to native code, cowboy seems to be working fine on my > tests. Can you please confirm? Confirmed. > If this file is the problematic one, perhaps you can disable native code > compilation just for it for the time being. Also, it would help me if I'm not using native, it was just an experiment. I would like to make it work for future users though. > you trace all the calls to its functions and check whether their returns > differ between byte code and native code execution. We're investigating. While doing so I found that erlang:display(binary_to_list(Buffer)) didn't work as expected (with just cowboy_protocol natively compiled). Perhaps you can add that to your todo list. io:format works fine but seems to reduce the probability that the bug happens (as does calling gc directly). > I will look more into it when I find some time... No worries. It's mostly just an interesting bug, I'm looking into it on my spare time too. Thanks for the pointers. -- Lo?c Hoguin Erlang Cowboy Nine Nines http://ninenines.eu From garret.smith@REDACTED Tue Mar 5 02:26:49 2013 From: garret.smith@REDACTED (Garret Smith) Date: Mon, 4 Mar 2013 17:26:49 -0800 Subject: [erlang-bugs] R15B01 erlang:now() jumping ~24 days into the future Message-ID: I have been beating my head against a wall for weeks tracking down spooky behaviour[sic] in one of our production systems. I finally tracked it down to "jumps" in the times returned by erlang:now(), causing all timers in the system to expire at once. I have witnessed this bug on R15B01, both 64 and 32-bit versions running on Windows Server 2008 R2, both on bare metal and VirtualBox VM. The time jump is always around 2126000 seconds, or a little over 24 days. The now() time does not try to converge with os:timestamp() as the documentation suggests, and as I confirmed it does if you just change the system clock. Another VM running concurrently on the same machine but with little load (diagnostic node & production node) did not time jump. Higher load seems to make the time jumps happen more often. Frequency between time jumps varies between seconds and hours, but when a jump occurs, it is always 2126000 + (9 to 26) seconds. I never see the jump in logfile timestamps that use os:timestamp() for tagging log messages. I had to start tracing a production node before I caught the jump. Here are some lines from a trace, where the timestamp in trace_ts is printed using calendar:now_to_local_time() and then in raw tuple format: 2013-4-16 21:40:1.993399|{1366,173601,993399} 2013-4-16 21:40:1.993400|{1366,173601,993400} 2013-5-11 12:13:41.986961|{1368,299621,986961} 2013-5-11 12:13:41.986962|{1368,299621,986962} then a bit later... 2013-5-11 12:36:19.955129|{1368,300979,955129} 2013-5-11 12:36:19.955130|{1368,300979,955130} 2013-6-5 3:9:49.538830|{1370,426989,538830} 2013-6-5 3:9:49.538833|{1370,426989,538833} I captured many such jumps over the course of a day or so. Obviously from the dates, 2 jumps happened before I started tracing. I was able to reproduce the bug, though not as efficiently as my production system, with the following sample program: https://gist.github.com/garret-smith/5087169 It took over an hour of runtime before the first time jump. I am working on a better way to reproduce it at the moment, but it's hard to test the test with a bug so intermittent. I am also testing various other VM versions. My first hope was that this was limited to the 64-bit version where we first encountered the problem, but a change to the 32-bit version has only made the problem happen less often, not eliminated it. We never saw this bug with R14B03 which we were running previously to R15B01. However, system load is different so I can't make a direct comparison. I did notice a few significant updates to the Windows time related code between R14B03 and R15: git log sys_time.c commit 46eb4359b05b220861453a869dc734480ec045a6 Author: Patrik Nyblom Date: Tue Dec 6 19:07:16 2011 +0100 Emulate localtime, gmtime and mktime to enable negative time_t commit 913f05af100e98a8665bbb6168e89fbcfe4ece75 Author: Bjrn-Egil Dahlberg Date: Fri Dec 2 15:25:06 2011 +0100 Teach windows sys_localtime_r I am completely stumped. What can I do next to help track down the source of the bug? Thanks, Garret Smith -------------- next part -------------- An HTML attachment was scrubbed... URL: From pan@REDACTED Tue Mar 5 08:50:42 2013 From: pan@REDACTED (Patrik Nyblom) Date: Tue, 5 Mar 2013 08:50:42 +0100 Subject: [erlang-bugs] R15B01 erlang:now() jumping ~24 days into the future In-Reply-To: References: Message-ID: <5135A3D2.4080305@erlang.org> Hi! On 03/05/2013 02:26 AM, Garret Smith wrote: > I have been beating my head against a wall for weeks tracking down > spooky behaviour[sic] in one of our production systems. I finally > tracked it down to "jumps" in the times returned by erlang:now(), > causing all timers in the system to expire at once. I have witnessed > this bug on R15B01, both 64 and 32-bit versions running on Windows > Server 2008 R2, both on bare metal and VirtualBox VM. > > The time jump is always around 2126000 seconds, or a little over 24 > days. The now() time does not try to converge with os:timestamp() as > the documentation suggests, and as I confirmed it does if you just > change the system clock. > > Another VM running concurrently on the same machine but with little > load (diagnostic node & production node) did not time jump. > > Higher load seems to make the time jumps happen more often. > > Frequency between time jumps varies between seconds and hours, but > when a jump occurs, it is always 2126000 + (9 to 26) seconds. > > I never see the jump in logfile timestamps that use os:timestamp() for > tagging log messages. I had to start tracing a production node before > I caught the jump. Here are some lines from a trace, where the > timestamp in trace_ts is printed using calendar:now_to_local_time() > and then in raw tuple format: > > 2013-4-16 21:40:1.993399|{1366,173601,993399} > 2013-4-16 21:40:1.993400|{1366,173601,993400} > 2013-5-11 12:13:41.986961|{1368,299621,986961} > 2013-5-11 12:13:41.986962|{1368,299621,986962} > > then a bit later... > > 2013-5-11 12:36:19.955129|{1368,300979,955129} > 2013-5-11 12:36:19.955130|{1368,300979,955130} > 2013-6-5 3:9:49.538830|{1370,426989,538830} > 2013-6-5 3:9:49.538833|{1370,426989,538833} > Gah! That's obviously not supposed to happen... > I captured many such jumps over the course of a day or so. Obviously > from the dates, 2 jumps happened before I started tracing. > > I was able to reproduce the bug, though not as efficiently as my > production system, with the following sample program: > https://gist.github.com/garret-smith/5087169 > > It took over an hour of runtime before the first time jump. I am > working on a better way to reproduce it at the moment, but it's hard > to test the test with a bug so intermittent. > > I am also testing various other VM versions. My first hope was that > this was limited to the 64-bit version where we first encountered the > problem, but a change to the 32-bit version has only made the problem > happen less often, not eliminated it. > > We never saw this bug with R14B03 which we were running previously to > R15B01. However, system load is different so I can't make a direct > comparison. I did notice a few significant updates to the Windows > time related code between R14B03 and R15: > > git log sys_time.c > > commit 46eb4359b05b220861453a869dc734480ec045a6 > Author: Patrik Nyblom > > Date: Tue Dec 6 19:07:16 2011 +0100 > > Emulate localtime, gmtime and mktime to enable negative time_t > > commit 913f05af100e98a8665bbb6168e89fbcfe4ece75 > Author: Bjrn-Egil Dahlberg > > Date: Fri Dec 2 15:25:06 2011 +0100 > > Teach windows sys_localtime_r > > Yep, that's me... But even if I gave a totally weird time back from those, the erlang:now logic should have stopped this from happening. I'll try to reproduce using your example program. If nothing else helps, I'll instrument a VM that gives som traces in the time code... > I am completely stumped. What can I do next to help track down the > source of the bug? > Unfortunately, so am I. Especially weird that it's load related... Maybe something is not locked as it should be... > Thanks, > Garret Smith Thanks for reporting, I'll get back to you! Cheers, /Patrik > > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs -------------- next part -------------- An HTML attachment was scrubbed... URL: From pan@REDACTED Tue Mar 5 12:06:27 2013 From: pan@REDACTED (Patrik Nyblom) Date: Tue, 5 Mar 2013 12:06:27 +0100 Subject: [erlang-bugs] R15B01 erlang:now() jumping ~24 days into the future In-Reply-To: <5135A3D2.4080305@erlang.org> References: <5135A3D2.4080305@erlang.org> Message-ID: <5135D1B3.8000400@erlang.org> Hi again... I'm not sure about one thing. What happens to os:timestamp() during these jumps? Does it stay on track or does it also jump around? I've tried to reproduce it with your program, but has not yet succeeded. Have you seen this on the R16B release as well? Is the hardware in any way fancy (like a lot of cores, some new processor I don't have or something else?) or is there anything else special about the machine? Also the time zone you're running in would be interesting, as there is some time zone specific code there... I would really like to be able to reproduce it so you don't have to do all the tests at your site, it might end up being really time consuming for you if I make to many mistakes :) Cheers, /Patrik On 03/05/2013 08:50 AM, Patrik Nyblom wrote: > Hi! > > On 03/05/2013 02:26 AM, Garret Smith wrote: >> I have been beating my head against a wall for weeks tracking down >> spooky behaviour[sic] in one of our production systems. I finally >> tracked it down to "jumps" in the times returned by erlang:now(), >> causing all timers in the system to expire at once. I have witnessed >> this bug on R15B01, both 64 and 32-bit versions running on Windows >> Server 2008 R2, both on bare metal and VirtualBox VM. >> >> The time jump is always around 2126000 seconds, or a little over 24 >> days. The now() time does not try to converge with os:timestamp() as >> the documentation suggests, and as I confirmed it does if you just >> change the system clock. >> >> Another VM running concurrently on the same machine but with little >> load (diagnostic node & production node) did not time jump. >> >> Higher load seems to make the time jumps happen more often. >> >> Frequency between time jumps varies between seconds and hours, but >> when a jump occurs, it is always 2126000 + (9 to 26) seconds. >> >> I never see the jump in logfile timestamps that use os:timestamp() >> for tagging log messages. I had to start tracing a production node >> before I caught the jump. Here are some lines from a trace, where the >> timestamp in trace_ts is printed using calendar:now_to_local_time() >> and then in raw tuple format: >> >> 2013-4-16 21:40:1.993399|{1366,173601,993399} >> 2013-4-16 21:40:1.993400|{1366,173601,993400} >> 2013-5-11 12:13:41.986961|{1368,299621,986961} >> 2013-5-11 12:13:41.986962|{1368,299621,986962} >> >> then a bit later... >> >> 2013-5-11 12:36:19.955129|{1368,300979,955129} >> 2013-5-11 12:36:19.955130|{1368,300979,955130} >> 2013-6-5 3:9:49.538830|{1370,426989,538830} >> 2013-6-5 3:9:49.538833|{1370,426989,538833} >> > Gah! That's obviously not supposed to happen... >> I captured many such jumps over the course of a day or so. Obviously >> from the dates, 2 jumps happened before I started tracing. >> >> I was able to reproduce the bug, though not as efficiently as my >> production system, with the following sample program: >> https://gist.github.com/garret-smith/5087169 >> >> It took over an hour of runtime before the first time jump. I am >> working on a better way to reproduce it at the moment, but it's hard >> to test the test with a bug so intermittent. >> >> I am also testing various other VM versions. My first hope was that >> this was limited to the 64-bit version where we first encountered the >> problem, but a change to the 32-bit version has only made the problem >> happen less often, not eliminated it. >> >> We never saw this bug with R14B03 which we were running previously to >> R15B01. However, system load is different so I can't make a direct >> comparison. I did notice a few significant updates to the Windows >> time related code between R14B03 and R15: >> >> git log sys_time.c >> >> commit 46eb4359b05b220861453a869dc734480ec045a6 >> Author: Patrik Nyblom > >> Date: Tue Dec 6 19:07:16 2011 +0100 >> >> Emulate localtime, gmtime and mktime to enable negative time_t >> >> commit 913f05af100e98a8665bbb6168e89fbcfe4ece75 >> Author: Bjrn-Egil Dahlberg > > >> Date: Fri Dec 2 15:25:06 2011 +0100 >> >> Teach windows sys_localtime_r >> >> > Yep, that's me... But even if I gave a totally weird time back from > those, the erlang:now logic should have stopped this from happening. > I'll try to reproduce using your example program. If nothing else > helps, I'll instrument a VM that gives som traces in the time code... >> I am completely stumped. What can I do next to help track down the >> source of the bug? >> > Unfortunately, so am I. Especially weird that it's load related... > Maybe something is not locked as it should be... >> Thanks, >> Garret Smith > Thanks for reporting, I'll get back to you! > > Cheers, > /Patrik >> >> >> _______________________________________________ >> erlang-bugs mailing list >> erlang-bugs@REDACTED >> http://erlang.org/mailman/listinfo/erlang-bugs > > > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs -------------- next part -------------- An HTML attachment was scrubbed... URL: From garret.smith@REDACTED Tue Mar 5 17:37:19 2013 From: garret.smith@REDACTED (Garret Smith) Date: Tue, 5 Mar 2013 08:37:19 -0800 Subject: [erlang-bugs] R15B01 erlang:now() jumping ~24 days into the future In-Reply-To: <5135D1B3.8000400@erlang.org> References: <5135A3D2.4080305@erlang.org> <5135D1B3.8000400@erlang.org> Message-ID: I haven't seen anything unexpected in os:timestamp(). No jumps at all. CPU is an Intel Xeon X3430. I have reproduced it in the LosAngeles/Pacific Time (GMT -8) and US East coast time zone (GMT -5). I have not yet tried R16B. I'll be starting that today. I'm also trying to improve the test program, since it's taking quite a long time between jumps for me as well. I'll let you know as soon as I have a better one. You have no idea how relieved I am that you are looking into this! Thanks, Garret Smith On Tue, Mar 5, 2013 at 3:06 AM, Patrik Nyblom wrote: > Hi again... > > I'm not sure about one thing. What happens to os:timestamp() during these > jumps? Does it stay on track or does it also jump around? > > I've tried to reproduce it with your program, but has not yet succeeded. > Have you seen this on the R16B release as well? > > Is the hardware in any way fancy (like a lot of cores, some new processor > I don't have or something else?) or is there anything else special about > the machine? Also the time zone you're running in would be interesting, as > there is some time zone specific code there... > > I would really like to be able to reproduce it so you don't have to do all > the tests at your site, it might end up being really time consuming for you > if I make to many mistakes :) > > Cheers, > /Patrik > > > > On 03/05/2013 08:50 AM, Patrik Nyblom wrote: > > Hi! > > On 03/05/2013 02:26 AM, Garret Smith wrote: > > I have been beating my head against a wall for weeks tracking down > spooky behaviour[sic] in one of our production systems. I finally tracked > it down to "jumps" in the times returned by erlang:now(), causing all > timers in the system to expire at once. I have witnessed this bug on > R15B01, both 64 and 32-bit versions running on Windows Server 2008 R2, both > on bare metal and VirtualBox VM. > > The time jump is always around 2126000 seconds, or a little over 24 > days. The now() time does not try to converge with os:timestamp() as the > documentation suggests, and as I confirmed it does if you just change the > system clock. > > Another VM running concurrently on the same machine but with little load > (diagnostic node & production node) did not time jump. > > Higher load seems to make the time jumps happen more often. > > Frequency between time jumps varies between seconds and hours, but when a > jump occurs, it is always 2126000 + (9 to 26) seconds. > > I never see the jump in logfile timestamps that use os:timestamp() for > tagging log messages. I had to start tracing a production node before I > caught the jump. Here are some lines from a trace, where the timestamp in > trace_ts is printed using calendar:now_to_local_time() and then in raw > tuple format: > > 2013-4-16 21:40:1.993399|{1366,173601,993399} > 2013-4-16 21:40:1.993400|{1366,173601,993400} > 2013-5-11 12:13:41.986961|{1368,299621,986961} > 2013-5-11 12:13:41.986962|{1368,299621,986962} > > then a bit later... > > 2013-5-11 12:36:19.955129|{1368,300979,955129} > 2013-5-11 12:36:19.955130|{1368,300979,955130} > 2013-6-5 3:9:49.538830|{1370,426989,538830} > 2013-6-5 3:9:49.538833|{1370,426989,538833} > > Gah! That's obviously not supposed to happen... > > I captured many such jumps over the course of a day or so. Obviously > from the dates, 2 jumps happened before I started tracing. > > I was able to reproduce the bug, though not as efficiently as my > production system, with the following sample program: > https://gist.github.com/garret-smith/5087169 > > It took over an hour of runtime before the first time jump. I am working > on a better way to reproduce it at the moment, but it's hard to test the > test with a bug so intermittent. > > I am also testing various other VM versions. My first hope was that > this was limited to the 64-bit version where we first encountered the > problem, but a change to the 32-bit version has only made the problem > happen less often, not eliminated it. > > We never saw this bug with R14B03 which we were running previously to > R15B01. However, system load is different so I can't make a direct > comparison. I did notice a few significant updates to the Windows time > related code between R14B03 and R15: > > git log sys_time.c > > commit 46eb4359b05b220861453a869dc734480ec045a6 > Author: Patrik Nyblom > Date: Tue Dec 6 19:07:16 2011 +0100 > > Emulate localtime, gmtime and mktime to enable negative time_t > > commit 913f05af100e98a8665bbb6168e89fbcfe4ece75 > Author: Bjrn-Egil Dahlberg > Date: Fri Dec 2 15:25:06 2011 +0100 > > Teach windows sys_localtime_r > > > Yep, that's me... But even if I gave a totally weird time back from > those, the erlang:now logic should have stopped this from happening. I'll > try to reproduce using your example program. If nothing else helps, I'll > instrument a VM that gives som traces in the time code... > > I am completely stumped. What can I do next to help track down the > source of the bug? > > Unfortunately, so am I. Especially weird that it's load related... > Maybe something is not locked as it should be... > > Thanks, > Garret Smith > > Thanks for reporting, I'll get back to you! > > Cheers, > /Patrik > > > > _______________________________________________ > erlang-bugs mailing listerlang-bugs@REDACTED://erlang.org/mailman/listinfo/erlang-bugs > > > > > _______________________________________________ > erlang-bugs mailing listerlang-bugs@REDACTED://erlang.org/mailman/listinfo/erlang-bugs > > > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From garret.smith@REDACTED Tue Mar 5 20:20:40 2013 From: garret.smith@REDACTED (Garret Smith) Date: Tue, 5 Mar 2013 11:20:40 -0800 Subject: [erlang-bugs] R15B01 erlang:now() jumping ~24 days into the future In-Reply-To: References: <5135A3D2.4080305@erlang.org> <5135D1B3.8000400@erlang.org> Message-ID: The gist https://gist.github.com/garret-smith/5087169 is updated with a slightly better version. I was able to reproduce the jump in less than an hour. I also did some more things to perturb the timing code while the test program was running. Here is the latest info, everything I can think of that may have the slightest effect: * R15B01 64-bit build * Pacific time zone (GMT -8) * Xeon E5405 in an HP DL160 * no arguments to erl.exe * bursty, high CPU load, >75% memory used by other software * running Observer on the test VM displaying the "Load Charts" tab * made some small adjustments (~ 60 seconds) to the system clock while running the tests - now() and os:timestamp() behaved as expected, initially showing a delta and slowly converging * w32tm /resync to fix the system clock some time after perturbing it The time jump in now() occurred when now() was ~9 seconds behind os:timestamp() as reported by the new test program. I'm starting to look at R16B now. -Garret Smith On Tue, Mar 5, 2013 at 8:37 AM, Garret Smith wrote: > I haven't seen anything unexpected in os:timestamp(). No jumps at all. > > CPU is an Intel Xeon X3430. > > I have reproduced it in the LosAngeles/Pacific Time (GMT -8) and US East > coast time zone (GMT -5). > > I have not yet tried R16B. I'll be starting that today. I'm also trying > to improve the test program, since it's taking quite a long time between > jumps for me as well. I'll let you know as soon as I have a better one. > > You have no idea how relieved I am that you are looking into this! > > Thanks, > Garret Smith > > > On Tue, Mar 5, 2013 at 3:06 AM, Patrik Nyblom wrote: > >> Hi again... >> >> I'm not sure about one thing. What happens to os:timestamp() during these >> jumps? Does it stay on track or does it also jump around? >> >> I've tried to reproduce it with your program, but has not yet succeeded. >> Have you seen this on the R16B release as well? >> >> Is the hardware in any way fancy (like a lot of cores, some new processor >> I don't have or something else?) or is there anything else special about >> the machine? Also the time zone you're running in would be interesting, as >> there is some time zone specific code there... >> >> I would really like to be able to reproduce it so you don't have to do >> all the tests at your site, it might end up being really time consuming for >> you if I make to many mistakes :) >> >> Cheers, >> /Patrik >> >> >> >> On 03/05/2013 08:50 AM, Patrik Nyblom wrote: >> >> Hi! >> >> On 03/05/2013 02:26 AM, Garret Smith wrote: >> >> I have been beating my head against a wall for weeks tracking down >> spooky behaviour[sic] in one of our production systems. I finally tracked >> it down to "jumps" in the times returned by erlang:now(), causing all >> timers in the system to expire at once. I have witnessed this bug on >> R15B01, both 64 and 32-bit versions running on Windows Server 2008 R2, both >> on bare metal and VirtualBox VM. >> >> The time jump is always around 2126000 seconds, or a little over 24 >> days. The now() time does not try to converge with os:timestamp() as the >> documentation suggests, and as I confirmed it does if you just change the >> system clock. >> >> Another VM running concurrently on the same machine but with little load >> (diagnostic node & production node) did not time jump. >> >> Higher load seems to make the time jumps happen more often. >> >> Frequency between time jumps varies between seconds and hours, but when >> a jump occurs, it is always 2126000 + (9 to 26) seconds. >> >> I never see the jump in logfile timestamps that use os:timestamp() for >> tagging log messages. I had to start tracing a production node before I >> caught the jump. Here are some lines from a trace, where the timestamp in >> trace_ts is printed using calendar:now_to_local_time() and then in raw >> tuple format: >> >> 2013-4-16 21:40:1.993399|{1366,173601,993399} >> 2013-4-16 21:40:1.993400|{1366,173601,993400} >> 2013-5-11 12:13:41.986961|{1368,299621,986961} >> 2013-5-11 12:13:41.986962|{1368,299621,986962} >> >> then a bit later... >> >> 2013-5-11 12:36:19.955129|{1368,300979,955129} >> 2013-5-11 12:36:19.955130|{1368,300979,955130} >> 2013-6-5 3:9:49.538830|{1370,426989,538830} >> 2013-6-5 3:9:49.538833|{1370,426989,538833} >> >> Gah! That's obviously not supposed to happen... >> >> I captured many such jumps over the course of a day or so. Obviously >> from the dates, 2 jumps happened before I started tracing. >> >> I was able to reproduce the bug, though not as efficiently as my >> production system, with the following sample program: >> https://gist.github.com/garret-smith/5087169 >> >> It took over an hour of runtime before the first time jump. I am working >> on a better way to reproduce it at the moment, but it's hard to test the >> test with a bug so intermittent. >> >> I am also testing various other VM versions. My first hope was that >> this was limited to the 64-bit version where we first encountered the >> problem, but a change to the 32-bit version has only made the problem >> happen less often, not eliminated it. >> >> We never saw this bug with R14B03 which we were running previously to >> R15B01. However, system load is different so I can't make a direct >> comparison. I did notice a few significant updates to the Windows time >> related code between R14B03 and R15: >> >> git log sys_time.c >> >> commit 46eb4359b05b220861453a869dc734480ec045a6 >> Author: Patrik Nyblom >> Date: Tue Dec 6 19:07:16 2011 +0100 >> >> Emulate localtime, gmtime and mktime to enable negative time_t >> >> commit 913f05af100e98a8665bbb6168e89fbcfe4ece75 >> Author: Bjrn-Egil Dahlberg >> Date: Fri Dec 2 15:25:06 2011 +0100 >> >> Teach windows sys_localtime_r >> >> >> Yep, that's me... But even if I gave a totally weird time back from >> those, the erlang:now logic should have stopped this from happening. I'll >> try to reproduce using your example program. If nothing else helps, I'll >> instrument a VM that gives som traces in the time code... >> >> I am completely stumped. What can I do next to help track down the >> source of the bug? >> >> Unfortunately, so am I. Especially weird that it's load related... >> Maybe something is not locked as it should be... >> >> Thanks, >> Garret Smith >> >> Thanks for reporting, I'll get back to you! >> >> Cheers, >> /Patrik >> >> >> >> _______________________________________________ >> erlang-bugs mailing listerlang-bugs@REDACTED://erlang.org/mailman/listinfo/erlang-bugs >> >> >> >> >> _______________________________________________ >> erlang-bugs mailing listerlang-bugs@REDACTED://erlang.org/mailman/listinfo/erlang-bugs >> >> >> >> _______________________________________________ >> erlang-bugs mailing list >> erlang-bugs@REDACTED >> http://erlang.org/mailman/listinfo/erlang-bugs >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From garret.smith@REDACTED Tue Mar 5 21:10:45 2013 From: garret.smith@REDACTED (Garret Smith) Date: Tue, 5 Mar 2013 12:10:45 -0800 Subject: [erlang-bugs] R15B01 erlang:now() jumping ~24 days into the future In-Reply-To: References: <5135A3D2.4080305@erlang.org> <5135D1B3.8000400@erlang.org> Message-ID: On the same machine with the same steps as previous, I reproduced the time jump on R16B. This time the jump happened with a <5 sec delta btw now() and os:timestamp(). Still jumping ~2126000 seconds. -Garret On Tue, Mar 5, 2013 at 11:20 AM, Garret Smith wrote: > The gist https://gist.github.com/garret-smith/5087169 is updated with a > slightly better version. I was able to reproduce the jump in less than an > hour. I also did some more things to perturb the timing code while the > test program was running. > > Here is the latest info, everything I can think of that may have the > slightest effect: > * R15B01 64-bit build > * Pacific time zone (GMT -8) > * Xeon E5405 in an HP DL160 > * no arguments to erl.exe > * bursty, high CPU load, >75% memory used by other software > * running Observer on the test VM displaying the "Load Charts" tab > * made some small adjustments (~ 60 seconds) to the system clock while > running the tests - now() and os:timestamp() behaved as expected, initially > showing a delta and slowly converging > * w32tm /resync to fix the system clock some time after perturbing it > > The time jump in now() occurred when now() was ~9 seconds behind > os:timestamp() as reported by the new test program. > > I'm starting to look at R16B now. > > -Garret Smith > > > On Tue, Mar 5, 2013 at 8:37 AM, Garret Smith wrote: > >> I haven't seen anything unexpected in os:timestamp(). No jumps at all. >> >> CPU is an Intel Xeon X3430. >> >> I have reproduced it in the LosAngeles/Pacific Time (GMT -8) and US East >> coast time zone (GMT -5). >> >> I have not yet tried R16B. I'll be starting that today. I'm also trying >> to improve the test program, since it's taking quite a long time between >> jumps for me as well. I'll let you know as soon as I have a better one. >> >> You have no idea how relieved I am that you are looking into this! >> >> Thanks, >> Garret Smith >> >> >> On Tue, Mar 5, 2013 at 3:06 AM, Patrik Nyblom wrote: >> >>> Hi again... >>> >>> I'm not sure about one thing. What happens to os:timestamp() during >>> these jumps? Does it stay on track or does it also jump around? >>> >>> I've tried to reproduce it with your program, but has not yet succeeded. >>> Have you seen this on the R16B release as well? >>> >>> Is the hardware in any way fancy (like a lot of cores, some new >>> processor I don't have or something else?) or is there anything else >>> special about the machine? Also the time zone you're running in would be >>> interesting, as there is some time zone specific code there... >>> >>> I would really like to be able to reproduce it so you don't have to do >>> all the tests at your site, it might end up being really time consuming for >>> you if I make to many mistakes :) >>> >>> Cheers, >>> /Patrik >>> >>> >>> >>> On 03/05/2013 08:50 AM, Patrik Nyblom wrote: >>> >>> Hi! >>> >>> On 03/05/2013 02:26 AM, Garret Smith wrote: >>> >>> I have been beating my head against a wall for weeks tracking down >>> spooky behaviour[sic] in one of our production systems. I finally tracked >>> it down to "jumps" in the times returned by erlang:now(), causing all >>> timers in the system to expire at once. I have witnessed this bug on >>> R15B01, both 64 and 32-bit versions running on Windows Server 2008 R2, both >>> on bare metal and VirtualBox VM. >>> >>> The time jump is always around 2126000 seconds, or a little over 24 >>> days. The now() time does not try to converge with os:timestamp() as the >>> documentation suggests, and as I confirmed it does if you just change the >>> system clock. >>> >>> Another VM running concurrently on the same machine but with little >>> load (diagnostic node & production node) did not time jump. >>> >>> Higher load seems to make the time jumps happen more often. >>> >>> Frequency between time jumps varies between seconds and hours, but when >>> a jump occurs, it is always 2126000 + (9 to 26) seconds. >>> >>> I never see the jump in logfile timestamps that use os:timestamp() for >>> tagging log messages. I had to start tracing a production node before I >>> caught the jump. Here are some lines from a trace, where the timestamp in >>> trace_ts is printed using calendar:now_to_local_time() and then in raw >>> tuple format: >>> >>> 2013-4-16 21:40:1.993399|{1366,173601,993399} >>> 2013-4-16 21:40:1.993400|{1366,173601,993400} >>> 2013-5-11 12:13:41.986961|{1368,299621,986961} >>> 2013-5-11 12:13:41.986962|{1368,299621,986962} >>> >>> then a bit later... >>> >>> 2013-5-11 12:36:19.955129|{1368,300979,955129} >>> 2013-5-11 12:36:19.955130|{1368,300979,955130} >>> 2013-6-5 3:9:49.538830|{1370,426989,538830} >>> 2013-6-5 3:9:49.538833|{1370,426989,538833} >>> >>> Gah! That's obviously not supposed to happen... >>> >>> I captured many such jumps over the course of a day or so. Obviously >>> from the dates, 2 jumps happened before I started tracing. >>> >>> I was able to reproduce the bug, though not as efficiently as my >>> production system, with the following sample program: >>> https://gist.github.com/garret-smith/5087169 >>> >>> It took over an hour of runtime before the first time jump. I am >>> working on a better way to reproduce it at the moment, but it's hard to >>> test the test with a bug so intermittent. >>> >>> I am also testing various other VM versions. My first hope was that >>> this was limited to the 64-bit version where we first encountered the >>> problem, but a change to the 32-bit version has only made the problem >>> happen less often, not eliminated it. >>> >>> We never saw this bug with R14B03 which we were running previously to >>> R15B01. However, system load is different so I can't make a direct >>> comparison. I did notice a few significant updates to the Windows time >>> related code between R14B03 and R15: >>> >>> git log sys_time.c >>> >>> commit 46eb4359b05b220861453a869dc734480ec045a6 >>> Author: Patrik Nyblom >>> Date: Tue Dec 6 19:07:16 2011 +0100 >>> >>> Emulate localtime, gmtime and mktime to enable negative time_t >>> >>> commit 913f05af100e98a8665bbb6168e89fbcfe4ece75 >>> Author: Bjrn-Egil Dahlberg >>> Date: Fri Dec 2 15:25:06 2011 +0100 >>> >>> Teach windows sys_localtime_r >>> >>> >>> Yep, that's me... But even if I gave a totally weird time back from >>> those, the erlang:now logic should have stopped this from happening. I'll >>> try to reproduce using your example program. If nothing else helps, I'll >>> instrument a VM that gives som traces in the time code... >>> >>> I am completely stumped. What can I do next to help track down the >>> source of the bug? >>> >>> Unfortunately, so am I. Especially weird that it's load related... >>> Maybe something is not locked as it should be... >>> >>> Thanks, >>> Garret Smith >>> >>> Thanks for reporting, I'll get back to you! >>> >>> Cheers, >>> /Patrik >>> >>> >>> >>> _______________________________________________ >>> erlang-bugs mailing listerlang-bugs@REDACTED://erlang.org/mailman/listinfo/erlang-bugs >>> >>> >>> >>> >>> _______________________________________________ >>> erlang-bugs mailing listerlang-bugs@REDACTED://erlang.org/mailman/listinfo/erlang-bugs >>> >>> >>> >>> _______________________________________________ >>> erlang-bugs mailing list >>> erlang-bugs@REDACTED >>> http://erlang.org/mailman/listinfo/erlang-bugs >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pan@REDACTED Wed Mar 6 10:46:01 2013 From: pan@REDACTED (Patrik Nyblom) Date: Wed, 6 Mar 2013 10:46:01 +0100 Subject: [erlang-bugs] R15B01 erlang:now() jumping ~24 days into the future In-Reply-To: References: <5135A3D2.4080305@erlang.org> <5135D1B3.8000400@erlang.org> Message-ID: <51371059.2080700@erlang.org> Thanks for all the extra info. I'll try the updated program, running all the steps you've described, on a four-core machine with Win2008 that I've setup for this. Hopefully I'll be able to reproduce it now :) Thanks! /Patrik On 03/05/2013 09:10 PM, Garret Smith wrote: > On the same machine with the same steps as previous, I reproduced the > time jump on R16B. > This time the jump happened with a <5 sec delta btw now() and > os:timestamp(). > Still jumping ~2126000 seconds. > > -Garret > > > On Tue, Mar 5, 2013 at 11:20 AM, Garret Smith > wrote: > > The gist https://gist.github.com/garret-smith/5087169 is updated > with a slightly better version. I was able to reproduce the jump > in less than an hour. I also did some more things to perturb the > timing code while the test program was running. > > Here is the latest info, everything I can think of that may have > the slightest effect: > * R15B01 64-bit build > * Pacific time zone (GMT -8) > * Xeon E5405 in an HP DL160 > * no arguments to erl.exe > * bursty, high CPU load, >75% memory used by other software > * running Observer on the test VM displaying the "Load Charts" tab > * made some small adjustments (~ 60 seconds) to the system clock > while running the tests - now() and os:timestamp() behaved as > expected, initially showing a delta and slowly converging > * w32tm /resync to fix the system clock some time after perturbing it > > The time jump in now() occurred when now() was ~9 seconds behind > os:timestamp() as reported by the new test program. > > I'm starting to look at R16B now. > > -Garret Smith > > > On Tue, Mar 5, 2013 at 8:37 AM, Garret Smith > > wrote: > > I haven't seen anything unexpected in os:timestamp(). No > jumps at all. > > CPU is an Intel Xeon X3430. > > I have reproduced it in the LosAngeles/Pacific Time (GMT -8) > and US East coast time zone (GMT -5). > > I have not yet tried R16B. I'll be starting that today. I'm > also trying to improve the test program, since it's taking > quite a long time between jumps for me as well. I'll let you > know as soon as I have a better one. > > You have no idea how relieved I am that you are looking into this! > > Thanks, > Garret Smith > > > On Tue, Mar 5, 2013 at 3:06 AM, Patrik Nyblom > wrote: > > Hi again... > > I'm not sure about one thing. What happens to > os:timestamp() during these jumps? Does it stay on track > or does it also jump around? > > I've tried to reproduce it with your program, but has not > yet succeeded. Have you seen this on the R16B release as well? > > Is the hardware in any way fancy (like a lot of cores, > some new processor I don't have or something else?) or is > there anything else special about the machine? Also the > time zone you're running in would be interesting, as there > is some time zone specific code there... > > I would really like to be able to reproduce it so you > don't have to do all the tests at your site, it might end > up being really time consuming for you if I make to many > mistakes :) > > Cheers, > /Patrik > > > > On 03/05/2013 08:50 AM, Patrik Nyblom wrote: >> Hi! >> >> On 03/05/2013 02:26 AM, Garret Smith wrote: >>> I have been beating my head against a wall for weeks >>> tracking down spooky behaviour[sic] in one of our >>> production systems. I finally tracked it down to >>> "jumps" in the times returned by erlang:now(), causing >>> all timers in the system to expire at once. I have >>> witnessed this bug on R15B01, both 64 and 32-bit >>> versions running on Windows Server 2008 R2, both on bare >>> metal and VirtualBox VM. >>> >>> The time jump is always around 2126000 seconds, or a >>> little over 24 days. The now() time does not try to >>> converge with os:timestamp() as the documentation >>> suggests, and as I confirmed it does if you just change >>> the system clock. >>> >>> Another VM running concurrently on the same machine but >>> with little load (diagnostic node & production node) did >>> not time jump. >>> >>> Higher load seems to make the time jumps happen more often. >>> >>> Frequency between time jumps varies between seconds and >>> hours, but when a jump occurs, it is always 2126000 + (9 >>> to 26) seconds. >>> >>> I never see the jump in logfile timestamps that use >>> os:timestamp() for tagging log messages. I had to start >>> tracing a production node before I caught the jump. >>> Here are some lines from a trace, where the timestamp in >>> trace_ts is printed using calendar:now_to_local_time() >>> and then in raw tuple format: >>> >>> 2013-4-16 21:40:1.993399|{1366,173601,993399} >>> 2013-4-16 21:40:1.993400|{1366,173601,993400} >>> 2013-5-11 12:13:41.986961|{1368,299621,986961} >>> 2013-5-11 12:13:41.986962|{1368,299621,986962} >>> >>> then a bit later... >>> >>> 2013-5-11 12:36:19.955129|{1368,300979,955129} >>> 2013-5-11 12:36:19.955130|{1368,300979,955130} >>> 2013-6-5 3:9:49.538830|{1370,426989,538830} >>> 2013-6-5 3:9:49.538833|{1370,426989,538833} >>> >> Gah! That's obviously not supposed to happen... >>> I captured many such jumps over the course of a day or >>> so. Obviously from the dates, 2 jumps happened before I >>> started tracing. >>> >>> I was able to reproduce the bug, though not as >>> efficiently as my production system, with the following >>> sample program: https://gist.github.com/garret-smith/5087169 >>> >>> It took over an hour of runtime before the first time >>> jump. I am working on a better way to reproduce it at >>> the moment, but it's hard to test the test with a bug so >>> intermittent. >>> >>> I am also testing various other VM versions. My first >>> hope was that this was limited to the 64-bit version >>> where we first encountered the problem, but a change to >>> the 32-bit version has only made the problem happen less >>> often, not eliminated it. >>> >>> We never saw this bug with R14B03 which we were running >>> previously to R15B01. However, system load is different >>> so I can't make a direct comparison. I did notice a few >>> significant updates to the Windows time related code >>> between R14B03 and R15: >>> >>> git log sys_time.c >>> >>> commit 46eb4359b05b220861453a869dc734480ec045a6 >>> Author: Patrik Nyblom >> > >>> Date: Tue Dec 6 19:07:16 2011 +0100 >>> >>> Emulate localtime, gmtime and mktime to enable >>> negative time_t >>> >>> commit 913f05af100e98a8665bbb6168e89fbcfe4ece75 >>> Author: Bjrn-Egil Dahlberg >> > >>> Date: Fri Dec 2 15:25:06 2011 +0100 >>> >>> Teach windows sys_localtime_r >>> >>> >> Yep, that's me... But even if I gave a totally weird time >> back from those, the erlang:now logic should have stopped >> this from happening. I'll try to reproduce using your >> example program. If nothing else helps, I'll instrument a >> VM that gives som traces in the time code... >>> I am completely stumped. What can I do next to help >>> track down the source of the bug? >>> >> Unfortunately, so am I. Especially weird that it's load >> related... Maybe something is not locked as it should be... >>> Thanks, >>> Garret Smith >> Thanks for reporting, I'll get back to you! >> >> Cheers, >> /Patrik >>> >>> >>> _______________________________________________ >>> erlang-bugs mailing list >>> erlang-bugs@REDACTED >>> http://erlang.org/mailman/listinfo/erlang-bugs >> >> >> >> _______________________________________________ >> erlang-bugs mailing list >> erlang-bugs@REDACTED >> http://erlang.org/mailman/listinfo/erlang-bugs > > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pan@REDACTED Thu Mar 7 16:37:13 2013 From: pan@REDACTED (Patrik Nyblom) Date: Thu, 7 Mar 2013 16:37:13 +0100 Subject: [erlang-bugs] R15B01 erlang:now() jumping ~24 days into the future In-Reply-To: References: <5135A3D2.4080305@erlang.org> <5135D1B3.8000400@erlang.org> Message-ID: <5138B429.4040605@erlang.org> Hi Garret! I've been able to reproduce it on my freshly installed Win2008 machine! Great, now I only need to debug it and find the error :) I'll get back to you as soon as I feel I have a fix - it might take a few days, given the relatively long turn around time, but we'll get there! Thank you for all the help and information! Cheers, /Patrik On 03/05/2013 09:10 PM, Garret Smith wrote: > On the same machine with the same steps as previous, I reproduced the > time jump on R16B. > This time the jump happened with a <5 sec delta btw now() and > os:timestamp(). > Still jumping ~2126000 seconds. > > -Garret > > > On Tue, Mar 5, 2013 at 11:20 AM, Garret Smith > wrote: > > The gist https://gist.github.com/garret-smith/5087169 is updated > with a slightly better version. I was able to reproduce the jump > in less than an hour. I also did some more things to perturb the > timing code while the test program was running. > > Here is the latest info, everything I can think of that may have > the slightest effect: > * R15B01 64-bit build > * Pacific time zone (GMT -8) > * Xeon E5405 in an HP DL160 > * no arguments to erl.exe > * bursty, high CPU load, >75% memory used by other software > * running Observer on the test VM displaying the "Load Charts" tab > * made some small adjustments (~ 60 seconds) to the system clock > while running the tests - now() and os:timestamp() behaved as > expected, initially showing a delta and slowly converging > * w32tm /resync to fix the system clock some time after perturbing it > > The time jump in now() occurred when now() was ~9 seconds behind > os:timestamp() as reported by the new test program. > > I'm starting to look at R16B now. > > -Garret Smith > > > On Tue, Mar 5, 2013 at 8:37 AM, Garret Smith > > wrote: > > I haven't seen anything unexpected in os:timestamp(). No > jumps at all. > > CPU is an Intel Xeon X3430. > > I have reproduced it in the LosAngeles/Pacific Time (GMT -8) > and US East coast time zone (GMT -5). > > I have not yet tried R16B. I'll be starting that today. I'm > also trying to improve the test program, since it's taking > quite a long time between jumps for me as well. I'll let you > know as soon as I have a better one. > > You have no idea how relieved I am that you are looking into this! > > Thanks, > Garret Smith > > > On Tue, Mar 5, 2013 at 3:06 AM, Patrik Nyblom > wrote: > > Hi again... > > I'm not sure about one thing. What happens to > os:timestamp() during these jumps? Does it stay on track > or does it also jump around? > > I've tried to reproduce it with your program, but has not > yet succeeded. Have you seen this on the R16B release as well? > > Is the hardware in any way fancy (like a lot of cores, > some new processor I don't have or something else?) or is > there anything else special about the machine? Also the > time zone you're running in would be interesting, as there > is some time zone specific code there... > > I would really like to be able to reproduce it so you > don't have to do all the tests at your site, it might end > up being really time consuming for you if I make to many > mistakes :) > > Cheers, > /Patrik > > > > On 03/05/2013 08:50 AM, Patrik Nyblom wrote: >> Hi! >> >> On 03/05/2013 02:26 AM, Garret Smith wrote: >>> I have been beating my head against a wall for weeks >>> tracking down spooky behaviour[sic] in one of our >>> production systems. I finally tracked it down to >>> "jumps" in the times returned by erlang:now(), causing >>> all timers in the system to expire at once. I have >>> witnessed this bug on R15B01, both 64 and 32-bit >>> versions running on Windows Server 2008 R2, both on bare >>> metal and VirtualBox VM. >>> >>> The time jump is always around 2126000 seconds, or a >>> little over 24 days. The now() time does not try to >>> converge with os:timestamp() as the documentation >>> suggests, and as I confirmed it does if you just change >>> the system clock. >>> >>> Another VM running concurrently on the same machine but >>> with little load (diagnostic node & production node) did >>> not time jump. >>> >>> Higher load seems to make the time jumps happen more often. >>> >>> Frequency between time jumps varies between seconds and >>> hours, but when a jump occurs, it is always 2126000 + (9 >>> to 26) seconds. >>> >>> I never see the jump in logfile timestamps that use >>> os:timestamp() for tagging log messages. I had to start >>> tracing a production node before I caught the jump. >>> Here are some lines from a trace, where the timestamp in >>> trace_ts is printed using calendar:now_to_local_time() >>> and then in raw tuple format: >>> >>> 2013-4-16 21:40:1.993399|{1366,173601,993399} >>> 2013-4-16 21:40:1.993400|{1366,173601,993400} >>> 2013-5-11 12:13:41.986961|{1368,299621,986961} >>> 2013-5-11 12:13:41.986962|{1368,299621,986962} >>> >>> then a bit later... >>> >>> 2013-5-11 12:36:19.955129|{1368,300979,955129} >>> 2013-5-11 12:36:19.955130|{1368,300979,955130} >>> 2013-6-5 3:9:49.538830|{1370,426989,538830} >>> 2013-6-5 3:9:49.538833|{1370,426989,538833} >>> >> Gah! That's obviously not supposed to happen... >>> I captured many such jumps over the course of a day or >>> so. Obviously from the dates, 2 jumps happened before I >>> started tracing. >>> >>> I was able to reproduce the bug, though not as >>> efficiently as my production system, with the following >>> sample program: https://gist.github.com/garret-smith/5087169 >>> >>> It took over an hour of runtime before the first time >>> jump. I am working on a better way to reproduce it at >>> the moment, but it's hard to test the test with a bug so >>> intermittent. >>> >>> I am also testing various other VM versions. My first >>> hope was that this was limited to the 64-bit version >>> where we first encountered the problem, but a change to >>> the 32-bit version has only made the problem happen less >>> often, not eliminated it. >>> >>> We never saw this bug with R14B03 which we were running >>> previously to R15B01. However, system load is different >>> so I can't make a direct comparison. I did notice a few >>> significant updates to the Windows time related code >>> between R14B03 and R15: >>> >>> git log sys_time.c >>> >>> commit 46eb4359b05b220861453a869dc734480ec045a6 >>> Author: Patrik Nyblom >> > >>> Date: Tue Dec 6 19:07:16 2011 +0100 >>> >>> Emulate localtime, gmtime and mktime to enable >>> negative time_t >>> >>> commit 913f05af100e98a8665bbb6168e89fbcfe4ece75 >>> Author: Bjrn-Egil Dahlberg >> > >>> Date: Fri Dec 2 15:25:06 2011 +0100 >>> >>> Teach windows sys_localtime_r >>> >>> >> Yep, that's me... But even if I gave a totally weird time >> back from those, the erlang:now logic should have stopped >> this from happening. I'll try to reproduce using your >> example program. If nothing else helps, I'll instrument a >> VM that gives som traces in the time code... >>> I am completely stumped. What can I do next to help >>> track down the source of the bug? >>> >> Unfortunately, so am I. Especially weird that it's load >> related... Maybe something is not locked as it should be... >>> Thanks, >>> Garret Smith >> Thanks for reporting, I'll get back to you! >> >> Cheers, >> /Patrik >>> >>> >>> _______________________________________________ >>> erlang-bugs mailing list >>> erlang-bugs@REDACTED >>> http://erlang.org/mailman/listinfo/erlang-bugs >> >> >> >> _______________________________________________ >> erlang-bugs mailing list >> erlang-bugs@REDACTED >> http://erlang.org/mailman/listinfo/erlang-bugs > > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kostis@REDACTED Fri Mar 8 22:06:50 2013 From: kostis@REDACTED (Kostis Sagonas) Date: Fri, 08 Mar 2013 22:06:50 +0100 Subject: [erlang-bugs] R16: HiPE failure with /bits in funs In-Reply-To: <511AC8EC.3070507@ninenines.eu> References: <511AC8EC.3070507@ninenines.eu> Message-ID: <513A52EA.5070902@cs.ntua.gr> On 02/12/2013 11:57 PM, Lo?c Hoguin wrote: > The following module fails to compile with R16. It also fails on R15B03 > and probably previous versions. I do not know HiPE internals so no patch. > > > -module(hipe_error). > -export([run/0]). > > run() -> > fun (<< $c, _/bits >>) -> ok end. > > > The following errors are reported: > > 7> c(hipe_error, [native]). > EXITED with reason > {function_clause,[{hipe_rtl_binary_match,gen_rtl,[{bs_match_string,<<99>>,1},[],[... For archival reasons, I report that this particular problem has been fixed. The relevant patch will be sent soon for inclusion in 'pu'. Kostis PS. Drop me a mail if you are interested in obtaining the patch sooner. From pan@REDACTED Mon Mar 11 17:26:05 2013 From: pan@REDACTED (Patrik Nyblom) Date: Mon, 11 Mar 2013 17:26:05 +0100 Subject: [erlang-bugs] R15B01 erlang:now() jumping ~24 days into the future In-Reply-To: <5138B429.4040605@erlang.org> References: <5135A3D2.4080305@erlang.org> <5135D1B3.8000400@erlang.org> <5138B429.4040605@erlang.org> Message-ID: <513E059D.70005@erlang.org> Hi again! I think I've found it. At least I've found one error, hopefully that's the one you've also found :) The sys_gethrtime function has gon new uses in R15 and on, uses where it is no longer protected by the erts_timeofday_mtx. So - it simply needs a lock of it's own. This gives a slight performance loss, but that could be fixed by using GetTickCount64 on win7 and win2008 at least. Can you try a version of beam.smp.dll with a lock and see if the error is gone on your machines? If that works, I would also like you to try an optimized version, but let's first make sure we have the bug nailed down :) In my dropbox, there's a beam.smp.dll. If you replace $ERL_ROOT/erts-5.10.1/bin/beam.smp.dll with that one and then start werl, the slogan should contain [source-be0da3e]. It is for 64bit windows. The public dropbox URL is: http://dl.dropbox.com/u/17212223/beam.smp.dll This should work without any special messages or such, giving a working erlang:now/0. If it starts sending strange ERROR REPORT's about ticks moving slightly backwards, we have a more complicated bug, but I haven't seen any such messages since i added proper locking. If it's possible for you to test this, I would be immensely grateful! Cheers, /Patrik On 03/07/2013 04:37 PM, Patrik Nyblom wrote: > Hi Garret! > > I've been able to reproduce it on my freshly installed Win2008 > machine! Great, now I only need to debug it and find the error :) > > I'll get back to you as soon as I feel I have a fix - it might take a > few days, given the relatively long turn around time, but we'll get there! > > Thank you for all the help and information! > > Cheers, > /Patrik > > On 03/05/2013 09:10 PM, Garret Smith wrote: >> On the same machine with the same steps as previous, I reproduced the >> time jump on R16B. >> This time the jump happened with a <5 sec delta btw now() and >> os:timestamp(). >> Still jumping ~2126000 seconds. >> >> -Garret >> >> >> On Tue, Mar 5, 2013 at 11:20 AM, Garret Smith > > wrote: >> >> The gist https://gist.github.com/garret-smith/5087169 is updated >> with a slightly better version. I was able to reproduce the jump >> in less than an hour. I also did some more things to perturb the >> timing code while the test program was running. >> >> Here is the latest info, everything I can think of that may have >> the slightest effect: >> * R15B01 64-bit build >> * Pacific time zone (GMT -8) >> * Xeon E5405 in an HP DL160 >> * no arguments to erl.exe >> * bursty, high CPU load, >75% memory used by other software >> * running Observer on the test VM displaying the "Load Charts" tab >> * made some small adjustments (~ 60 seconds) to the system clock >> while running the tests - now() and os:timestamp() behaved as >> expected, initially showing a delta and slowly converging >> * w32tm /resync to fix the system clock some time after >> perturbing it >> >> The time jump in now() occurred when now() was ~9 seconds behind >> os:timestamp() as reported by the new test program. >> >> I'm starting to look at R16B now. >> >> -Garret Smith >> >> >> On Tue, Mar 5, 2013 at 8:37 AM, Garret Smith >> > wrote: >> >> I haven't seen anything unexpected in os:timestamp(). No >> jumps at all. >> >> CPU is an Intel Xeon X3430. >> >> I have reproduced it in the LosAngeles/Pacific Time (GMT -8) >> and US East coast time zone (GMT -5). >> >> I have not yet tried R16B. I'll be starting that today. I'm >> also trying to improve the test program, since it's taking >> quite a long time between jumps for me as well. I'll let you >> know as soon as I have a better one. >> >> You have no idea how relieved I am that you are looking into >> this! >> >> Thanks, >> Garret Smith >> >> >> On Tue, Mar 5, 2013 at 3:06 AM, Patrik Nyblom > > wrote: >> >> Hi again... >> >> I'm not sure about one thing. What happens to >> os:timestamp() during these jumps? Does it stay on track >> or does it also jump around? >> >> I've tried to reproduce it with your program, but has not >> yet succeeded. Have you seen this on the R16B release as >> well? >> >> Is the hardware in any way fancy (like a lot of cores, >> some new processor I don't have or something else?) or is >> there anything else special about the machine? Also the >> time zone you're running in would be interesting, as >> there is some time zone specific code there... >> >> I would really like to be able to reproduce it so you >> don't have to do all the tests at your site, it might end >> up being really time consuming for you if I make to many >> mistakes :) >> >> Cheers, >> /Patrik >> >> >> >> On 03/05/2013 08:50 AM, Patrik Nyblom wrote: >>> Hi! >>> >>> On 03/05/2013 02:26 AM, Garret Smith wrote: >>>> I have been beating my head against a wall for weeks >>>> tracking down spooky behaviour[sic] in one of our >>>> production systems. I finally tracked it down to >>>> "jumps" in the times returned by erlang:now(), causing >>>> all timers in the system to expire at once. I have >>>> witnessed this bug on R15B01, both 64 and 32-bit >>>> versions running on Windows Server 2008 R2, both on >>>> bare metal and VirtualBox VM. >>>> >>>> The time jump is always around 2126000 seconds, or a >>>> little over 24 days. The now() time does not try to >>>> converge with os:timestamp() as the documentation >>>> suggests, and as I confirmed it does if you just change >>>> the system clock. >>>> >>>> Another VM running concurrently on the same machine but >>>> with little load (diagnostic node & production node) >>>> did not time jump. >>>> >>>> Higher load seems to make the time jumps happen more often. >>>> >>>> Frequency between time jumps varies between seconds and >>>> hours, but when a jump occurs, it is always 2126000 + >>>> (9 to 26) seconds. >>>> >>>> I never see the jump in logfile timestamps that use >>>> os:timestamp() for tagging log messages. I had to >>>> start tracing a production node before I caught the >>>> jump. Here are some lines from a trace, where the >>>> timestamp in trace_ts is printed using >>>> calendar:now_to_local_time() and then in raw tuple format: >>>> >>>> 2013-4-16 21:40:1.993399|{1366,173601,993399} >>>> 2013-4-16 21:40:1.993400|{1366,173601,993400} >>>> 2013-5-11 12:13:41.986961|{1368,299621,986961} >>>> 2013-5-11 12:13:41.986962|{1368,299621,986962} >>>> >>>> then a bit later... >>>> >>>> 2013-5-11 12:36:19.955129|{1368,300979,955129} >>>> 2013-5-11 12:36:19.955130|{1368,300979,955130} >>>> 2013-6-5 3:9:49.538830|{1370,426989,538830} >>>> 2013-6-5 3:9:49.538833|{1370,426989,538833} >>>> >>> Gah! That's obviously not supposed to happen... >>>> I captured many such jumps over the course of a day or >>>> so. Obviously from the dates, 2 jumps happened before >>>> I started tracing. >>>> >>>> I was able to reproduce the bug, though not as >>>> efficiently as my production system, with the following >>>> sample program: >>>> https://gist.github.com/garret-smith/5087169 >>>> >>>> It took over an hour of runtime before the first time >>>> jump. I am working on a better way to reproduce it at >>>> the moment, but it's hard to test the test with a bug >>>> so intermittent. >>>> >>>> I am also testing various other VM versions. My first >>>> hope was that this was limited to the 64-bit version >>>> where we first encountered the problem, but a change to >>>> the 32-bit version has only made the problem happen >>>> less often, not eliminated it. >>>> >>>> We never saw this bug with R14B03 which we were running >>>> previously to R15B01. However, system load is >>>> different so I can't make a direct comparison. I did >>>> notice a few significant updates to the Windows time >>>> related code between R14B03 and R15: >>>> >>>> git log sys_time.c >>>> >>>> commit 46eb4359b05b220861453a869dc734480ec045a6 >>>> Author: Patrik Nyblom >>> > >>>> Date: Tue Dec 6 19:07:16 2011 +0100 >>>> >>>> Emulate localtime, gmtime and mktime to enable >>>> negative time_t >>>> >>>> commit 913f05af100e98a8665bbb6168e89fbcfe4ece75 >>>> Author: Bjrn-Egil Dahlberg >>> > >>>> Date: Fri Dec 2 15:25:06 2011 +0100 >>>> >>>> Teach windows sys_localtime_r >>>> >>>> >>> Yep, that's me... But even if I gave a totally weird >>> time back from those, the erlang:now logic should have >>> stopped this from happening. I'll try to reproduce using >>> your example program. If nothing else helps, I'll >>> instrument a VM that gives som traces in the time code... >>>> I am completely stumped. What can I do next to help >>>> track down the source of the bug? >>>> >>> Unfortunately, so am I. Especially weird that it's load >>> related... Maybe something is not locked as it should be... >>>> Thanks, >>>> Garret Smith >>> Thanks for reporting, I'll get back to you! >>> >>> Cheers, >>> /Patrik >>>> >>>> >>>> _______________________________________________ >>>> erlang-bugs mailing list >>>> erlang-bugs@REDACTED >>>> http://erlang.org/mailman/listinfo/erlang-bugs >>> >>> >>> >>> _______________________________________________ >>> erlang-bugs mailing list >>> erlang-bugs@REDACTED >>> http://erlang.org/mailman/listinfo/erlang-bugs >> >> >> _______________________________________________ >> erlang-bugs mailing list >> erlang-bugs@REDACTED >> http://erlang.org/mailman/listinfo/erlang-bugs >> >> >> >> > > > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs -------------- next part -------------- An HTML attachment was scrubbed... URL: From garret.smith@REDACTED Mon Mar 11 17:34:01 2013 From: garret.smith@REDACTED (Garret Smith) Date: Mon, 11 Mar 2013 09:34:01 -0700 Subject: [erlang-bugs] R15B01 erlang:now() jumping ~24 days into the future In-Reply-To: <513E059D.70005@erlang.org> References: <5135A3D2.4080305@erlang.org> <5135D1B3.8000400@erlang.org> <5138B429.4040605@erlang.org> <513E059D.70005@erlang.org> Message-ID: Patrik, Our production systems are on R15B1/2, so I won't be able to verify against that, but I'll let you know what I see running my test program against R16B. Will you be able to generate a patched R15x version? If not, I'll try to set up a build system and apply the patch locally. -Garret On Mon, Mar 11, 2013 at 9:26 AM, Patrik Nyblom wrote: > Hi again! > > I think I've found it. At least I've found one error, hopefully that's the > one you've also found :) > > The sys_gethrtime function has gon new uses in R15 and on, uses where it > is no longer protected by the erts_timeofday_mtx. So - it simply needs a > lock of it's own. This gives a slight performance loss, but that could be > fixed by using GetTickCount64 on win7 and win2008 at least. > > Can you try a version of beam.smp.dll with a lock and see if the error is > gone on your machines? If that works, I would also like you to try an > optimized version, but let's first make sure we have the bug nailed down :) > > In my dropbox, there's a beam.smp.dll. If you replace > $ERL_ROOT/erts-5.10.1/bin/beam.smp.dll with that one and then start werl, > the slogan should contain [source-be0da3e]. It is for 64bit windows. The > public dropbox URL is: > http://dl.dropbox.com/u/17212223/beam.smp.dll > > This should work without any special messages or such, giving a working > erlang:now/0. If it starts sending strange ERROR REPORT's about ticks > moving slightly backwards, we have a more complicated bug, but I haven't > seen any such messages since i added proper locking. > > If it's possible for you to test this, I would be immensely grateful! > > Cheers, > /Patrik > > On 03/07/2013 04:37 PM, Patrik Nyblom wrote: > > Hi Garret! > > I've been able to reproduce it on my freshly installed Win2008 machine! > Great, now I only need to debug it and find the error :) > > I'll get back to you as soon as I feel I have a fix - it might take a few > days, given the relatively long turn around time, but we'll get there! > > Thank you for all the help and information! > > Cheers, > /Patrik > > On 03/05/2013 09:10 PM, Garret Smith wrote: > > On the same machine with the same steps as previous, I reproduced the > time jump on R16B. > This time the jump happened with a <5 sec delta btw now() and > os:timestamp(). > Still jumping ~2126000 seconds. > > -Garret > > > On Tue, Mar 5, 2013 at 11:20 AM, Garret Smith wrote: > >> The gist https://gist.github.com/garret-smith/5087169 is updated with >> a slightly better version. I was able to reproduce the jump in less than >> an hour. I also did some more things to perturb the timing code while the >> test program was running. >> >> Here is the latest info, everything I can think of that may have the >> slightest effect: >> * R15B01 64-bit build >> * Pacific time zone (GMT -8) >> * Xeon E5405 in an HP DL160 >> * no arguments to erl.exe >> * bursty, high CPU load, >75% memory used by other software >> * running Observer on the test VM displaying the "Load Charts" tab >> * made some small adjustments (~ 60 seconds) to the system clock while >> running the tests - now() and os:timestamp() behaved as expected, initially >> showing a delta and slowly converging >> * w32tm /resync to fix the system clock some time after perturbing it >> >> The time jump in now() occurred when now() was ~9 seconds behind >> os:timestamp() as reported by the new test program. >> >> I'm starting to look at R16B now. >> >> -Garret Smith >> >> >> On Tue, Mar 5, 2013 at 8:37 AM, Garret Smith wrote: >> >>> I haven't seen anything unexpected in os:timestamp(). No jumps at >>> all. >>> >>> CPU is an Intel Xeon X3430. >>> >>> I have reproduced it in the LosAngeles/Pacific Time (GMT -8) and US East >>> coast time zone (GMT -5). >>> >>> I have not yet tried R16B. I'll be starting that today. I'm also >>> trying to improve the test program, since it's taking quite a long time >>> between jumps for me as well. I'll let you know as soon as I have a better >>> one. >>> >>> You have no idea how relieved I am that you are looking into this! >>> >>> Thanks, >>> Garret Smith >>> >>> >>> On Tue, Mar 5, 2013 at 3:06 AM, Patrik Nyblom wrote: >>> >>>> Hi again... >>>> >>>> I'm not sure about one thing. What happens to os:timestamp() during >>>> these jumps? Does it stay on track or does it also jump around? >>>> >>>> I've tried to reproduce it with your program, but has not yet >>>> succeeded. Have you seen this on the R16B release as well? >>>> >>>> Is the hardware in any way fancy (like a lot of cores, some new >>>> processor I don't have or something else?) or is there anything else >>>> special about the machine? Also the time zone you're running in would be >>>> interesting, as there is some time zone specific code there... >>>> >>>> I would really like to be able to reproduce it so you don't have to do >>>> all the tests at your site, it might end up being really time consuming for >>>> you if I make to many mistakes :) >>>> >>>> Cheers, >>>> /Patrik >>>> >>>> >>>> >>>> On 03/05/2013 08:50 AM, Patrik Nyblom wrote: >>>> >>>> Hi! >>>> >>>> On 03/05/2013 02:26 AM, Garret Smith wrote: >>>> >>>> I have been beating my head against a wall for weeks tracking down >>>> spooky behaviour[sic] in one of our production systems. I finally tracked >>>> it down to "jumps" in the times returned by erlang:now(), causing all >>>> timers in the system to expire at once. I have witnessed this bug on >>>> R15B01, both 64 and 32-bit versions running on Windows Server 2008 R2, both >>>> on bare metal and VirtualBox VM. >>>> >>>> The time jump is always around 2126000 seconds, or a little over 24 >>>> days. The now() time does not try to converge with os:timestamp() as the >>>> documentation suggests, and as I confirmed it does if you just change the >>>> system clock. >>>> >>>> Another VM running concurrently on the same machine but with little >>>> load (diagnostic node & production node) did not time jump. >>>> >>>> Higher load seems to make the time jumps happen more often. >>>> >>>> Frequency between time jumps varies between seconds and hours, but >>>> when a jump occurs, it is always 2126000 + (9 to 26) seconds. >>>> >>>> I never see the jump in logfile timestamps that use os:timestamp() for >>>> tagging log messages. I had to start tracing a production node before I >>>> caught the jump. Here are some lines from a trace, where the timestamp in >>>> trace_ts is printed using calendar:now_to_local_time() and then in raw >>>> tuple format: >>>> >>>> 2013-4-16 21:40:1.993399|{1366,173601,993399} >>>> 2013-4-16 21:40:1.993400|{1366,173601,993400} >>>> 2013-5-11 12:13:41.986961|{1368,299621,986961} >>>> 2013-5-11 12:13:41.986962|{1368,299621,986962} >>>> >>>> then a bit later... >>>> >>>> 2013-5-11 12:36:19.955129|{1368,300979,955129} >>>> 2013-5-11 12:36:19.955130|{1368,300979,955130} >>>> 2013-6-5 3:9:49.538830|{1370,426989,538830} >>>> 2013-6-5 3:9:49.538833|{1370,426989,538833} >>>> >>>> Gah! That's obviously not supposed to happen... >>>> >>>> I captured many such jumps over the course of a day or so. Obviously >>>> from the dates, 2 jumps happened before I started tracing. >>>> >>>> I was able to reproduce the bug, though not as efficiently as my >>>> production system, with the following sample program: >>>> https://gist.github.com/garret-smith/5087169 >>>> >>>> It took over an hour of runtime before the first time jump. I am >>>> working on a better way to reproduce it at the moment, but it's hard to >>>> test the test with a bug so intermittent. >>>> >>>> I am also testing various other VM versions. My first hope was that >>>> this was limited to the 64-bit version where we first encountered the >>>> problem, but a change to the 32-bit version has only made the problem >>>> happen less often, not eliminated it. >>>> >>>> We never saw this bug with R14B03 which we were running previously to >>>> R15B01. However, system load is different so I can't make a direct >>>> comparison. I did notice a few significant updates to the Windows time >>>> related code between R14B03 and R15: >>>> >>>> git log sys_time.c >>>> >>>> commit 46eb4359b05b220861453a869dc734480ec045a6 >>>> Author: Patrik Nyblom >>>> Date: Tue Dec 6 19:07:16 2011 +0100 >>>> >>>> Emulate localtime, gmtime and mktime to enable negative time_t >>>> >>>> commit 913f05af100e98a8665bbb6168e89fbcfe4ece75 >>>> Author: Bjrn-Egil Dahlberg >>>> Date: Fri Dec 2 15:25:06 2011 +0100 >>>> >>>> Teach windows sys_localtime_r >>>> >>>> >>>> Yep, that's me... But even if I gave a totally weird time back from >>>> those, the erlang:now logic should have stopped this from happening. I'll >>>> try to reproduce using your example program. If nothing else helps, I'll >>>> instrument a VM that gives som traces in the time code... >>>> >>>> I am completely stumped. What can I do next to help track down the >>>> source of the bug? >>>> >>>> Unfortunately, so am I. Especially weird that it's load related... >>>> Maybe something is not locked as it should be... >>>> >>>> Thanks, >>>> Garret Smith >>>> >>>> Thanks for reporting, I'll get back to you! >>>> >>>> Cheers, >>>> /Patrik >>>> >>>> >>>> >>>> _______________________________________________ >>>> erlang-bugs mailing listerlang-bugs@REDACTED://erlang.org/mailman/listinfo/erlang-bugs >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> erlang-bugs mailing listerlang-bugs@REDACTED://erlang.org/mailman/listinfo/erlang-bugs >>>> >>>> >>>> >>>> _______________________________________________ >>>> erlang-bugs mailing list >>>> erlang-bugs@REDACTED >>>> http://erlang.org/mailman/listinfo/erlang-bugs >>>> >>>> >>> >> > > > > _______________________________________________ > erlang-bugs mailing listerlang-bugs@REDACTED://erlang.org/mailman/listinfo/erlang-bugs > > > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From garret.smith@REDACTED Mon Mar 11 17:51:48 2013 From: garret.smith@REDACTED (Garret Smith) Date: Mon, 11 Mar 2013 09:51:48 -0700 Subject: [erlang-bugs] R15B01 erlang:now() jumping ~24 days into the future In-Reply-To: <513E059D.70005@erlang.org> References: <5135A3D2.4080305@erlang.org> <5135D1B3.8000400@erlang.org> <5138B429.4040605@erlang.org> <513E059D.70005@erlang.org> Message-ID: ok, so adding a lock in the Windows-specific implementation of sys_gethrtime protects the 'wrap' and 'last_tick_count' global variables which are not required in other platforms? Thanks for the quick turnaround Patrik! -Garret On Mon, Mar 11, 2013 at 9:26 AM, Patrik Nyblom wrote: > Hi again! > > I think I've found it. At least I've found one error, hopefully that's the > one you've also found :) > > The sys_gethrtime function has gon new uses in R15 and on, uses where it > is no longer protected by the erts_timeofday_mtx. So - it simply needs a > lock of it's own. This gives a slight performance loss, but that could be > fixed by using GetTickCount64 on win7 and win2008 at least. > > Can you try a version of beam.smp.dll with a lock and see if the error is > gone on your machines? If that works, I would also like you to try an > optimized version, but let's first make sure we have the bug nailed down :) > > In my dropbox, there's a beam.smp.dll. If you replace > $ERL_ROOT/erts-5.10.1/bin/beam.smp.dll with that one and then start werl, > the slogan should contain [source-be0da3e]. It is for 64bit windows. The > public dropbox URL is: > http://dl.dropbox.com/u/17212223/beam.smp.dll > > This should work without any special messages or such, giving a working > erlang:now/0. If it starts sending strange ERROR REPORT's about ticks > moving slightly backwards, we have a more complicated bug, but I haven't > seen any such messages since i added proper locking. > > If it's possible for you to test this, I would be immensely grateful! > > Cheers, > /Patrik > > On 03/07/2013 04:37 PM, Patrik Nyblom wrote: > > Hi Garret! > > I've been able to reproduce it on my freshly installed Win2008 machine! > Great, now I only need to debug it and find the error :) > > I'll get back to you as soon as I feel I have a fix - it might take a few > days, given the relatively long turn around time, but we'll get there! > > Thank you for all the help and information! > > Cheers, > /Patrik > > On 03/05/2013 09:10 PM, Garret Smith wrote: > > On the same machine with the same steps as previous, I reproduced the > time jump on R16B. > This time the jump happened with a <5 sec delta btw now() and > os:timestamp(). > Still jumping ~2126000 seconds. > > -Garret > > > On Tue, Mar 5, 2013 at 11:20 AM, Garret Smith wrote: > >> The gist https://gist.github.com/garret-smith/5087169 is updated with >> a slightly better version. I was able to reproduce the jump in less than >> an hour. I also did some more things to perturb the timing code while the >> test program was running. >> >> Here is the latest info, everything I can think of that may have the >> slightest effect: >> * R15B01 64-bit build >> * Pacific time zone (GMT -8) >> * Xeon E5405 in an HP DL160 >> * no arguments to erl.exe >> * bursty, high CPU load, >75% memory used by other software >> * running Observer on the test VM displaying the "Load Charts" tab >> * made some small adjustments (~ 60 seconds) to the system clock while >> running the tests - now() and os:timestamp() behaved as expected, initially >> showing a delta and slowly converging >> * w32tm /resync to fix the system clock some time after perturbing it >> >> The time jump in now() occurred when now() was ~9 seconds behind >> os:timestamp() as reported by the new test program. >> >> I'm starting to look at R16B now. >> >> -Garret Smith >> >> >> On Tue, Mar 5, 2013 at 8:37 AM, Garret Smith wrote: >> >>> I haven't seen anything unexpected in os:timestamp(). No jumps at >>> all. >>> >>> CPU is an Intel Xeon X3430. >>> >>> I have reproduced it in the LosAngeles/Pacific Time (GMT -8) and US East >>> coast time zone (GMT -5). >>> >>> I have not yet tried R16B. I'll be starting that today. I'm also >>> trying to improve the test program, since it's taking quite a long time >>> between jumps for me as well. I'll let you know as soon as I have a better >>> one. >>> >>> You have no idea how relieved I am that you are looking into this! >>> >>> Thanks, >>> Garret Smith >>> >>> >>> On Tue, Mar 5, 2013 at 3:06 AM, Patrik Nyblom wrote: >>> >>>> Hi again... >>>> >>>> I'm not sure about one thing. What happens to os:timestamp() during >>>> these jumps? Does it stay on track or does it also jump around? >>>> >>>> I've tried to reproduce it with your program, but has not yet >>>> succeeded. Have you seen this on the R16B release as well? >>>> >>>> Is the hardware in any way fancy (like a lot of cores, some new >>>> processor I don't have or something else?) or is there anything else >>>> special about the machine? Also the time zone you're running in would be >>>> interesting, as there is some time zone specific code there... >>>> >>>> I would really like to be able to reproduce it so you don't have to do >>>> all the tests at your site, it might end up being really time consuming for >>>> you if I make to many mistakes :) >>>> >>>> Cheers, >>>> /Patrik >>>> >>>> >>>> >>>> On 03/05/2013 08:50 AM, Patrik Nyblom wrote: >>>> >>>> Hi! >>>> >>>> On 03/05/2013 02:26 AM, Garret Smith wrote: >>>> >>>> I have been beating my head against a wall for weeks tracking down >>>> spooky behaviour[sic] in one of our production systems. I finally tracked >>>> it down to "jumps" in the times returned by erlang:now(), causing all >>>> timers in the system to expire at once. I have witnessed this bug on >>>> R15B01, both 64 and 32-bit versions running on Windows Server 2008 R2, both >>>> on bare metal and VirtualBox VM. >>>> >>>> The time jump is always around 2126000 seconds, or a little over 24 >>>> days. The now() time does not try to converge with os:timestamp() as the >>>> documentation suggests, and as I confirmed it does if you just change the >>>> system clock. >>>> >>>> Another VM running concurrently on the same machine but with little >>>> load (diagnostic node & production node) did not time jump. >>>> >>>> Higher load seems to make the time jumps happen more often. >>>> >>>> Frequency between time jumps varies between seconds and hours, but >>>> when a jump occurs, it is always 2126000 + (9 to 26) seconds. >>>> >>>> I never see the jump in logfile timestamps that use os:timestamp() for >>>> tagging log messages. I had to start tracing a production node before I >>>> caught the jump. Here are some lines from a trace, where the timestamp in >>>> trace_ts is printed using calendar:now_to_local_time() and then in raw >>>> tuple format: >>>> >>>> 2013-4-16 21:40:1.993399|{1366,173601,993399} >>>> 2013-4-16 21:40:1.993400|{1366,173601,993400} >>>> 2013-5-11 12:13:41.986961|{1368,299621,986961} >>>> 2013-5-11 12:13:41.986962|{1368,299621,986962} >>>> >>>> then a bit later... >>>> >>>> 2013-5-11 12:36:19.955129|{1368,300979,955129} >>>> 2013-5-11 12:36:19.955130|{1368,300979,955130} >>>> 2013-6-5 3:9:49.538830|{1370,426989,538830} >>>> 2013-6-5 3:9:49.538833|{1370,426989,538833} >>>> >>>> Gah! That's obviously not supposed to happen... >>>> >>>> I captured many such jumps over the course of a day or so. Obviously >>>> from the dates, 2 jumps happened before I started tracing. >>>> >>>> I was able to reproduce the bug, though not as efficiently as my >>>> production system, with the following sample program: >>>> https://gist.github.com/garret-smith/5087169 >>>> >>>> It took over an hour of runtime before the first time jump. I am >>>> working on a better way to reproduce it at the moment, but it's hard to >>>> test the test with a bug so intermittent. >>>> >>>> I am also testing various other VM versions. My first hope was that >>>> this was limited to the 64-bit version where we first encountered the >>>> problem, but a change to the 32-bit version has only made the problem >>>> happen less often, not eliminated it. >>>> >>>> We never saw this bug with R14B03 which we were running previously to >>>> R15B01. However, system load is different so I can't make a direct >>>> comparison. I did notice a few significant updates to the Windows time >>>> related code between R14B03 and R15: >>>> >>>> git log sys_time.c >>>> >>>> commit 46eb4359b05b220861453a869dc734480ec045a6 >>>> Author: Patrik Nyblom >>>> Date: Tue Dec 6 19:07:16 2011 +0100 >>>> >>>> Emulate localtime, gmtime and mktime to enable negative time_t >>>> >>>> commit 913f05af100e98a8665bbb6168e89fbcfe4ece75 >>>> Author: Bjrn-Egil Dahlberg >>>> Date: Fri Dec 2 15:25:06 2011 +0100 >>>> >>>> Teach windows sys_localtime_r >>>> >>>> >>>> Yep, that's me... But even if I gave a totally weird time back from >>>> those, the erlang:now logic should have stopped this from happening. I'll >>>> try to reproduce using your example program. If nothing else helps, I'll >>>> instrument a VM that gives som traces in the time code... >>>> >>>> I am completely stumped. What can I do next to help track down the >>>> source of the bug? >>>> >>>> Unfortunately, so am I. Especially weird that it's load related... >>>> Maybe something is not locked as it should be... >>>> >>>> Thanks, >>>> Garret Smith >>>> >>>> Thanks for reporting, I'll get back to you! >>>> >>>> Cheers, >>>> /Patrik >>>> >>>> >>>> >>>> _______________________________________________ >>>> erlang-bugs mailing listerlang-bugs@REDACTED://erlang.org/mailman/listinfo/erlang-bugs >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> erlang-bugs mailing listerlang-bugs@REDACTED://erlang.org/mailman/listinfo/erlang-bugs >>>> >>>> >>>> >>>> _______________________________________________ >>>> erlang-bugs mailing list >>>> erlang-bugs@REDACTED >>>> http://erlang.org/mailman/listinfo/erlang-bugs >>>> >>>> >>> >> > > > > _______________________________________________ > erlang-bugs mailing listerlang-bugs@REDACTED://erlang.org/mailman/listinfo/erlang-bugs > > > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From garret.smith@REDACTED Tue Mar 12 00:48:28 2013 From: garret.smith@REDACTED (Garret Smith) Date: Mon, 11 Mar 2013 16:48:28 -0700 Subject: [erlang-bugs] R15B01 erlang:now() jumping ~24 days into the future In-Reply-To: References: <5135A3D2.4080305@erlang.org> <5135D1B3.8000400@erlang.org> <5138B429.4040605@erlang.org> <513E059D.70005@erlang.org> Message-ID: Been running the test program all day in the same scenario as before. No time jumps! Looking good... On Mon, Mar 11, 2013 at 9:34 AM, Garret Smith wrote: > Patrik, > > Our production systems are on R15B1/2, so I won't be able to verify > against that, but I'll let you know what I see running my test program > against R16B. > > Will you be able to generate a patched R15x version? If not, I'll try to > set up a build system and apply the patch locally. > > -Garret > > > On Mon, Mar 11, 2013 at 9:26 AM, Patrik Nyblom wrote: > >> Hi again! >> >> I think I've found it. At least I've found one error, hopefully that's >> the one you've also found :) >> >> The sys_gethrtime function has gon new uses in R15 and on, uses where it >> is no longer protected by the erts_timeofday_mtx. So - it simply needs a >> lock of it's own. This gives a slight performance loss, but that could be >> fixed by using GetTickCount64 on win7 and win2008 at least. >> >> Can you try a version of beam.smp.dll with a lock and see if the error is >> gone on your machines? If that works, I would also like you to try an >> optimized version, but let's first make sure we have the bug nailed down :) >> >> In my dropbox, there's a beam.smp.dll. If you replace >> $ERL_ROOT/erts-5.10.1/bin/beam.smp.dll with that one and then start werl, >> the slogan should contain [source-be0da3e]. It is for 64bit windows. The >> public dropbox URL is: >> http://dl.dropbox.com/u/17212223/beam.smp.dll >> >> This should work without any special messages or such, giving a working >> erlang:now/0. If it starts sending strange ERROR REPORT's about ticks >> moving slightly backwards, we have a more complicated bug, but I haven't >> seen any such messages since i added proper locking. >> >> If it's possible for you to test this, I would be immensely grateful! >> >> Cheers, >> /Patrik >> >> On 03/07/2013 04:37 PM, Patrik Nyblom wrote: >> >> Hi Garret! >> >> I've been able to reproduce it on my freshly installed Win2008 machine! >> Great, now I only need to debug it and find the error :) >> >> I'll get back to you as soon as I feel I have a fix - it might take a few >> days, given the relatively long turn around time, but we'll get there! >> >> Thank you for all the help and information! >> >> Cheers, >> /Patrik >> >> On 03/05/2013 09:10 PM, Garret Smith wrote: >> >> On the same machine with the same steps as previous, I reproduced the >> time jump on R16B. >> This time the jump happened with a <5 sec delta btw now() and >> os:timestamp(). >> Still jumping ~2126000 seconds. >> >> -Garret >> >> >> On Tue, Mar 5, 2013 at 11:20 AM, Garret Smith wrote: >> >>> The gist https://gist.github.com/garret-smith/5087169 is updated with >>> a slightly better version. I was able to reproduce the jump in less than >>> an hour. I also did some more things to perturb the timing code while the >>> test program was running. >>> >>> Here is the latest info, everything I can think of that may have the >>> slightest effect: >>> * R15B01 64-bit build >>> * Pacific time zone (GMT -8) >>> * Xeon E5405 in an HP DL160 >>> * no arguments to erl.exe >>> * bursty, high CPU load, >75% memory used by other software >>> * running Observer on the test VM displaying the "Load Charts" tab >>> * made some small adjustments (~ 60 seconds) to the system clock >>> while running the tests - now() and os:timestamp() behaved as expected, >>> initially showing a delta and slowly converging >>> * w32tm /resync to fix the system clock some time after perturbing it >>> >>> The time jump in now() occurred when now() was ~9 seconds behind >>> os:timestamp() as reported by the new test program. >>> >>> I'm starting to look at R16B now. >>> >>> -Garret Smith >>> >>> >>> On Tue, Mar 5, 2013 at 8:37 AM, Garret Smith wrote: >>> >>>> I haven't seen anything unexpected in os:timestamp(). No jumps at >>>> all. >>>> >>>> CPU is an Intel Xeon X3430. >>>> >>>> I have reproduced it in the LosAngeles/Pacific Time (GMT -8) and US >>>> East coast time zone (GMT -5). >>>> >>>> I have not yet tried R16B. I'll be starting that today. I'm also >>>> trying to improve the test program, since it's taking quite a long time >>>> between jumps for me as well. I'll let you know as soon as I have a better >>>> one. >>>> >>>> You have no idea how relieved I am that you are looking into this! >>>> >>>> Thanks, >>>> Garret Smith >>>> >>>> >>>> On Tue, Mar 5, 2013 at 3:06 AM, Patrik Nyblom wrote: >>>> >>>>> Hi again... >>>>> >>>>> I'm not sure about one thing. What happens to os:timestamp() during >>>>> these jumps? Does it stay on track or does it also jump around? >>>>> >>>>> I've tried to reproduce it with your program, but has not yet >>>>> succeeded. Have you seen this on the R16B release as well? >>>>> >>>>> Is the hardware in any way fancy (like a lot of cores, some new >>>>> processor I don't have or something else?) or is there anything else >>>>> special about the machine? Also the time zone you're running in would be >>>>> interesting, as there is some time zone specific code there... >>>>> >>>>> I would really like to be able to reproduce it so you don't have to do >>>>> all the tests at your site, it might end up being really time consuming for >>>>> you if I make to many mistakes :) >>>>> >>>>> Cheers, >>>>> /Patrik >>>>> >>>>> >>>>> >>>>> On 03/05/2013 08:50 AM, Patrik Nyblom wrote: >>>>> >>>>> Hi! >>>>> >>>>> On 03/05/2013 02:26 AM, Garret Smith wrote: >>>>> >>>>> I have been beating my head against a wall for weeks tracking >>>>> down spooky behaviour[sic] in one of our production systems. I finally >>>>> tracked it down to "jumps" in the times returned by erlang:now(), causing >>>>> all timers in the system to expire at once. I have witnessed this bug on >>>>> R15B01, both 64 and 32-bit versions running on Windows Server 2008 R2, both >>>>> on bare metal and VirtualBox VM. >>>>> >>>>> The time jump is always around 2126000 seconds, or a little over 24 >>>>> days. The now() time does not try to converge with os:timestamp() as the >>>>> documentation suggests, and as I confirmed it does if you just change the >>>>> system clock. >>>>> >>>>> Another VM running concurrently on the same machine but with little >>>>> load (diagnostic node & production node) did not time jump. >>>>> >>>>> Higher load seems to make the time jumps happen more often. >>>>> >>>>> Frequency between time jumps varies between seconds and hours, but >>>>> when a jump occurs, it is always 2126000 + (9 to 26) seconds. >>>>> >>>>> I never see the jump in logfile timestamps that use os:timestamp() >>>>> for tagging log messages. I had to start tracing a production node before >>>>> I caught the jump. Here are some lines from a trace, where the timestamp >>>>> in trace_ts is printed using calendar:now_to_local_time() and then in raw >>>>> tuple format: >>>>> >>>>> 2013-4-16 21:40:1.993399|{1366,173601,993399} >>>>> 2013-4-16 21:40:1.993400|{1366,173601,993400} >>>>> 2013-5-11 12:13:41.986961|{1368,299621,986961} >>>>> 2013-5-11 12:13:41.986962|{1368,299621,986962} >>>>> >>>>> then a bit later... >>>>> >>>>> 2013-5-11 12:36:19.955129|{1368,300979,955129} >>>>> 2013-5-11 12:36:19.955130|{1368,300979,955130} >>>>> 2013-6-5 3:9:49.538830|{1370,426989,538830} >>>>> 2013-6-5 3:9:49.538833|{1370,426989,538833} >>>>> >>>>> Gah! That's obviously not supposed to happen... >>>>> >>>>> I captured many such jumps over the course of a day or so. >>>>> Obviously from the dates, 2 jumps happened before I started tracing. >>>>> >>>>> I was able to reproduce the bug, though not as efficiently as my >>>>> production system, with the following sample program: >>>>> https://gist.github.com/garret-smith/5087169 >>>>> >>>>> It took over an hour of runtime before the first time jump. I am >>>>> working on a better way to reproduce it at the moment, but it's hard to >>>>> test the test with a bug so intermittent. >>>>> >>>>> I am also testing various other VM versions. My first hope was that >>>>> this was limited to the 64-bit version where we first encountered the >>>>> problem, but a change to the 32-bit version has only made the problem >>>>> happen less often, not eliminated it. >>>>> >>>>> We never saw this bug with R14B03 which we were running previously >>>>> to R15B01. However, system load is different so I can't make a direct >>>>> comparison. I did notice a few significant updates to the Windows time >>>>> related code between R14B03 and R15: >>>>> >>>>> git log sys_time.c >>>>> >>>>> commit 46eb4359b05b220861453a869dc734480ec045a6 >>>>> Author: Patrik Nyblom >>>>> Date: Tue Dec 6 19:07:16 2011 +0100 >>>>> >>>>> Emulate localtime, gmtime and mktime to enable negative time_t >>>>> >>>>> commit 913f05af100e98a8665bbb6168e89fbcfe4ece75 >>>>> Author: Bjrn-Egil Dahlberg >>>>> Date: Fri Dec 2 15:25:06 2011 +0100 >>>>> >>>>> Teach windows sys_localtime_r >>>>> >>>>> >>>>> Yep, that's me... But even if I gave a totally weird time back from >>>>> those, the erlang:now logic should have stopped this from happening. I'll >>>>> try to reproduce using your example program. If nothing else helps, I'll >>>>> instrument a VM that gives som traces in the time code... >>>>> >>>>> I am completely stumped. What can I do next to help track down the >>>>> source of the bug? >>>>> >>>>> Unfortunately, so am I. Especially weird that it's load related... >>>>> Maybe something is not locked as it should be... >>>>> >>>>> Thanks, >>>>> Garret Smith >>>>> >>>>> Thanks for reporting, I'll get back to you! >>>>> >>>>> Cheers, >>>>> /Patrik >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> erlang-bugs mailing listerlang-bugs@REDACTED://erlang.org/mailman/listinfo/erlang-bugs >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> erlang-bugs mailing listerlang-bugs@REDACTED://erlang.org/mailman/listinfo/erlang-bugs >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> erlang-bugs mailing list >>>>> erlang-bugs@REDACTED >>>>> http://erlang.org/mailman/listinfo/erlang-bugs >>>>> >>>>> >>>> >>> >> >> >> >> _______________________________________________ >> erlang-bugs mailing listerlang-bugs@REDACTED://erlang.org/mailman/listinfo/erlang-bugs >> >> >> >> _______________________________________________ >> erlang-bugs mailing list >> erlang-bugs@REDACTED >> http://erlang.org/mailman/listinfo/erlang-bugs >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pan@REDACTED Tue Mar 12 10:57:11 2013 From: pan@REDACTED (Patrik Nyblom) Date: Tue, 12 Mar 2013 10:57:11 +0100 Subject: [erlang-bugs] R15B01 erlang:now() jumping ~24 days into the future In-Reply-To: References: <5135A3D2.4080305@erlang.org> <5135D1B3.8000400@erlang.org> <5138B429.4040605@erlang.org> <513E059D.70005@erlang.org> Message-ID: <513EFBF7.7000406@erlang.org> Hi! Good! Thanks! I can build a patched R15 beam.dll for you, easiest is R15B03, but i can do a patched beam for R15B02 if that's really needed. In the end I'll probably build some kind of R15B03-2 and a R16B00-1 or something, so whoever wants the patch can get binaries. However I would like to have something tested in your real system, if that's OK with you. So - which version is best to patch for? R15B02? /Patrik On 03/12/2013 12:48 AM, Garret Smith wrote: > Been running the test program all day in the same scenario as before. > No time jumps! Looking good... > > > On Mon, Mar 11, 2013 at 9:34 AM, Garret Smith > wrote: > > Patrik, > > Our production systems are on R15B1/2, so I won't be able to > verify against that, but I'll let you know what I see running my > test program against R16B. > > Will you be able to generate a patched R15x version? If not, I'll > try to set up a build system and apply the patch locally. > > -Garret > > > On Mon, Mar 11, 2013 at 9:26 AM, Patrik Nyblom > wrote: > > Hi again! > > I think I've found it. At least I've found one error, > hopefully that's the one you've also found :) > > The sys_gethrtime function has gon new uses in R15 and on, > uses where it is no longer protected by the > erts_timeofday_mtx. So - it simply needs a lock of it's own. > This gives a slight performance loss, but that could be fixed > by using GetTickCount64 on win7 and win2008 at least. > > Can you try a version of beam.smp.dll with a lock and see if > the error is gone on your machines? If that works, I would > also like you to try an optimized version, but let's first > make sure we have the bug nailed down :) > > In my dropbox, there's a beam.smp.dll. If you replace > $ERL_ROOT/erts-5.10.1/bin/beam.smp.dll with that one and then > start werl, the slogan should contain [source-be0da3e]. It is > for 64bit windows. The public dropbox URL is: > http://dl.dropbox.com/u/17212223/beam.smp.dll > > This should work without any special messages or such, giving > a working erlang:now/0. If it starts sending strange ERROR > REPORT's about ticks moving slightly backwards, we have a more > complicated bug, but I haven't seen any such messages since i > added proper locking. > > If it's possible for you to test this, I would be immensely > grateful! > > Cheers, > /Patrik > > On 03/07/2013 04:37 PM, Patrik Nyblom wrote: >> Hi Garret! >> >> I've been able to reproduce it on my freshly installed >> Win2008 machine! Great, now I only need to debug it and find >> the error :) >> >> I'll get back to you as soon as I feel I have a fix - it >> might take a few days, given the relatively long turn around >> time, but we'll get there! >> >> Thank you for all the help and information! >> >> Cheers, >> /Patrik >> >> On 03/05/2013 09:10 PM, Garret Smith wrote: >>> On the same machine with the same steps as previous, I >>> reproduced the time jump on R16B. >>> This time the jump happened with a <5 sec delta btw now() >>> and os:timestamp(). >>> Still jumping ~2126000 seconds. >>> >>> -Garret >>> >>> >>> On Tue, Mar 5, 2013 at 11:20 AM, Garret Smith >>> > wrote: >>> >>> The gist https://gist.github.com/garret-smith/5087169 is >>> updated with a slightly better version. I was able to >>> reproduce the jump in less than an hour. I also did >>> some more things to perturb the timing code while the >>> test program was running. >>> >>> Here is the latest info, everything I can think of that >>> may have the slightest effect: >>> * R15B01 64-bit build >>> * Pacific time zone (GMT -8) >>> * Xeon E5405 in an HP DL160 >>> * no arguments to erl.exe >>> * bursty, high CPU load, >75% memory used by other software >>> * running Observer on the test VM displaying the "Load >>> Charts" tab >>> * made some small adjustments (~ 60 seconds) to the >>> system clock while running the tests - now() and >>> os:timestamp() behaved as expected, initially showing a >>> delta and slowly converging >>> * w32tm /resync to fix the system clock some time after >>> perturbing it >>> >>> The time jump in now() occurred when now() was ~9 >>> seconds behind os:timestamp() as reported by the new >>> test program. >>> >>> I'm starting to look at R16B now. >>> >>> -Garret Smith >>> >>> >>> On Tue, Mar 5, 2013 at 8:37 AM, Garret Smith >>> > >>> wrote: >>> >>> I haven't seen anything unexpected in >>> os:timestamp(). No jumps at all. >>> >>> CPU is an Intel Xeon X3430. >>> >>> I have reproduced it in the LosAngeles/Pacific Time >>> (GMT -8) and US East coast time zone (GMT -5). >>> >>> I have not yet tried R16B. I'll be starting that >>> today. I'm also trying to improve the test program, >>> since it's taking quite a long time between jumps >>> for me as well. I'll let you know as soon as I have >>> a better one. >>> >>> You have no idea how relieved I am that you are >>> looking into this! >>> >>> Thanks, >>> Garret Smith >>> >>> >>> On Tue, Mar 5, 2013 at 3:06 AM, Patrik Nyblom >>> > wrote: >>> >>> Hi again... >>> >>> I'm not sure about one thing. What happens to >>> os:timestamp() during these jumps? Does it stay >>> on track or does it also jump around? >>> >>> I've tried to reproduce it with your program, >>> but has not yet succeeded. Have you seen this on >>> the R16B release as well? >>> >>> Is the hardware in any way fancy (like a lot of >>> cores, some new processor I don't have or >>> something else?) or is there anything else >>> special about the machine? Also the time zone >>> you're running in would be interesting, as there >>> is some time zone specific code there... >>> >>> I would really like to be able to reproduce it >>> so you don't have to do all the tests at your >>> site, it might end up being really time >>> consuming for you if I make to many mistakes :) >>> >>> Cheers, >>> /Patrik >>> >>> >>> >>> On 03/05/2013 08:50 AM, Patrik Nyblom wrote: >>>> Hi! >>>> >>>> On 03/05/2013 02:26 AM, Garret Smith wrote: >>>>> I have been beating my head against a wall for >>>>> weeks tracking down spooky behaviour[sic] in >>>>> one of our production systems. I finally >>>>> tracked it down to "jumps" in the times >>>>> returned by erlang:now(), causing all timers >>>>> in the system to expire at once. I have >>>>> witnessed this bug on R15B01, both 64 and >>>>> 32-bit versions running on Windows Server 2008 >>>>> R2, both on bare metal and VirtualBox VM. >>>>> >>>>> The time jump is always around 2126000 >>>>> seconds, or a little over 24 days. The now() >>>>> time does not try to converge with >>>>> os:timestamp() as the documentation suggests, >>>>> and as I confirmed it does if you just change >>>>> the system clock. >>>>> >>>>> Another VM running concurrently on the same >>>>> machine but with little load (diagnostic node >>>>> & production node) did not time jump. >>>>> >>>>> Higher load seems to make the time jumps >>>>> happen more often. >>>>> >>>>> Frequency between time jumps varies between >>>>> seconds and hours, but when a jump occurs, it >>>>> is always 2126000 + (9 to 26) seconds. >>>>> >>>>> I never see the jump in logfile timestamps >>>>> that use os:timestamp() for tagging log >>>>> messages. I had to start tracing a production >>>>> node before I caught the jump. Here are some >>>>> lines from a trace, where the timestamp in >>>>> trace_ts is printed using >>>>> calendar:now_to_local_time() and then in raw >>>>> tuple format: >>>>> >>>>> 2013-4-16 21:40:1.993399|{1366,173601,993399} >>>>> 2013-4-16 21:40:1.993400|{1366,173601,993400} >>>>> 2013-5-11 12:13:41.986961|{1368,299621,986961} >>>>> 2013-5-11 12:13:41.986962|{1368,299621,986962} >>>>> >>>>> then a bit later... >>>>> >>>>> 2013-5-11 12:36:19.955129|{1368,300979,955129} >>>>> 2013-5-11 12:36:19.955130|{1368,300979,955130} >>>>> 2013-6-5 3:9:49.538830|{1370,426989,538830} >>>>> 2013-6-5 3:9:49.538833|{1370,426989,538833} >>>>> >>>> Gah! That's obviously not supposed to happen... >>>>> I captured many such jumps over the course of >>>>> a day or so. Obviously from the dates, 2 jumps >>>>> happened before I started tracing. >>>>> >>>>> I was able to reproduce the bug, though not as >>>>> efficiently as my production system, with the >>>>> following sample program: >>>>> https://gist.github.com/garret-smith/5087169 >>>>> >>>>> It took over an hour of runtime before the >>>>> first time jump. I am working on a better way >>>>> to reproduce it at the moment, but it's hard >>>>> to test the test with a bug so intermittent. >>>>> >>>>> I am also testing various other VM versions. >>>>> My first hope was that this was limited to the >>>>> 64-bit version where we first encountered the >>>>> problem, but a change to the 32-bit version >>>>> has only made the problem happen less often, >>>>> not eliminated it. >>>>> >>>>> We never saw this bug with R14B03 which we >>>>> were running previously to R15B01. However, >>>>> system load is different so I can't make a >>>>> direct comparison. I did notice a few >>>>> significant updates to the Windows time >>>>> related code between R14B03 and R15: >>>>> >>>>> git log sys_time.c >>>>> >>>>> commit 46eb4359b05b220861453a869dc734480ec045a6 >>>>> Author: Patrik Nyblom >>>> > >>>>> Date: Tue Dec 6 19:07:16 2011 +0100 >>>>> >>>>> Emulate localtime, gmtime and mktime to >>>>> enable negative time_t >>>>> >>>>> commit 913f05af100e98a8665bbb6168e89fbcfe4ece75 >>>>> Author: Bjrn-Egil Dahlberg >>>>> > >>>>> Date: Fri Dec 2 15:25:06 2011 +0100 >>>>> >>>>> Teach windows sys_localtime_r >>>>> >>>>> >>>> Yep, that's me... But even if I gave a totally >>>> weird time back from those, the erlang:now >>>> logic should have stopped this from happening. >>>> I'll try to reproduce using your example >>>> program. If nothing else helps, I'll instrument >>>> a VM that gives som traces in the time code... >>>>> I am completely stumped. What can I do next >>>>> to help track down the source of the bug? >>>>> >>>> Unfortunately, so am I. Especially weird that >>>> it's load related... Maybe something is not >>>> locked as it should be... >>>>> Thanks, >>>>> Garret Smith >>>> Thanks for reporting, I'll get back to you! >>>> >>>> Cheers, >>>> /Patrik >>>>> >>>>> >>>>> _______________________________________________ >>>>> erlang-bugs mailing list >>>>> erlang-bugs@REDACTED >>>>> http://erlang.org/mailman/listinfo/erlang-bugs >>>> >>>> >>>> >>>> _______________________________________________ >>>> erlang-bugs mailing list >>>> erlang-bugs@REDACTED >>>> http://erlang.org/mailman/listinfo/erlang-bugs >>> >>> >>> _______________________________________________ >>> erlang-bugs mailing list >>> erlang-bugs@REDACTED >>> >>> http://erlang.org/mailman/listinfo/erlang-bugs >>> >>> >>> >>> >> >> >> >> _______________________________________________ >> erlang-bugs mailing list >> erlang-bugs@REDACTED >> http://erlang.org/mailman/listinfo/erlang-bugs > > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vances@REDACTED Tue Mar 12 14:09:18 2013 From: vances@REDACTED (Vance Shipley) Date: Tue, 12 Mar 2013 18:39:18 +0530 Subject: [erlang-bugs] R15B01 erlang:now() jumping ~24 days into the future In-Reply-To: References: Message-ID: C On Mar 5, 2013 6:56 AM, "Garret Smith" wrote: > I have been beating my head against a wall for weeks tracking down spooky > behaviour[sic] in one of our production systems. I finally tracked it down > to "jumps" in the times returned by erlang:now(), causing all timers in the > system to expire at once. I have witnessed this bug on R15B01, both 64 and > 32-bit versions running on Windows Server 2008 R2, both on bare metal and > VirtualBox VM. > > The time jump is always around 2126000 seconds, or a little over 24 days. > The now() time does not try to converge with os:timestamp() as the > documentation suggests, and as I confirmed it does if you just change the > system clock. > > Another VM running concurrently on the same machine but with little load > (diagnostic node & production node) did not time jump. > > Higher load seems to make the time jumps happen more often. > > Frequency between time jumps varies between seconds and hours, but when a > jump occurs, it is always 2126000 + (9 to 26) seconds. > > I never see the jump in logfile timestamps that use os:timestamp() for > tagging log messages. I had to start tracing a production node before I > caught the jump. Here are some lines from a trace, where the timestamp in > trace_ts is printed using calendar:now_to_local_time() and then in raw > tuple format: > > 2013-4-16 21:40:1.993399|{1366,173601,993399} > 2013-4-16 21:40:1.993400|{1366,173601,993400} > 2013-5-11 12:13:41.986961|{1368,299621,986961} > 2013-5-11 12:13:41.986962|{1368,299621,986962} > > then a bit later... > > 2013-5-11 12:36:19.955129|{1368,300979,955129} > 2013-5-11 12:36:19.955130|{1368,300979,955130} > 2013-6-5 3:9:49.538830|{1370,426989,538830} > 2013-6-5 3:9:49.538833|{1370,426989,538833} > > I captured many such jumps over the course of a day or so. Obviously from > the dates, 2 jumps happened before I started tracing. > > I was able to reproduce the bug, though not as efficiently as my > production system, with the following sample program: > https://gist.github.com/garret-smith/5087169 > > It took over an hour of runtime before the first time jump. I am working > on a better way to reproduce it at the moment, but it's hard to test the > test with a bug so intermittent. > > I am also testing various other VM versions. My first hope was that this > was limited to the 64-bit version where we first encountered the problem, > but a change to the 32-bit version has only made the problem happen less > often, not eliminated it. > > We never saw this bug with R14B03 which we were running previously to > R15B01. However, system load is different so I can't make a direct > comparison. I did notice a few significant updates to the Windows time > related code between R14B03 and R15: > > git log sys_time.c > > commit 46eb4359b05b220861453a869dc734480ec045a6 > Author: Patrik Nyblom > Date: Tue Dec 6 19:07:16 2011 +0100 > > Emulate localtime, gmtime and mktime to enable negative time_t > > commit 913f05af100e98a8665bbb6168e89fbcfe4ece75 > Author: Bjrn-Egil Dahlberg > Date: Fri Dec 2 15:25:06 2011 +0100 > > Teach windows sys_localtime_r > > > I am completely stumped. What can I do next to help track down the source > of the bug? > > Thanks, > Garret Smith > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pan@REDACTED Tue Mar 12 14:38:28 2013 From: pan@REDACTED (Patrik Nyblom) Date: Tue, 12 Mar 2013 14:38:28 +0100 Subject: [erlang-bugs] R15B01 erlang:now() jumping ~24 days into the future In-Reply-To: References: Message-ID: <513F2FD4.8010108@erlang.org> Hi! There's a patched version of the R15B02 dll in my public dropbox, under the name r15.beam.smp.dll: http://dl.dropbox.com/u/17212223/r15.beam.smp.dll If you replace the R15 beam.smp.dll with this one, the werl slogan should contain the version erts-5.9.2.0.1, if you could try that on the real app, I would be immensely grateful! Cheers, /Patrik On 03/12/2013 02:09 PM, Vance Shipley wrote: > > C > > On Mar 5, 2013 6:56 AM, "Garret Smith" > wrote: > > I have been beating my head against a wall for weeks tracking down > spooky behaviour[sic] in one of our production systems. I finally > tracked it down to "jumps" in the times returned by erlang:now(), > causing all timers in the system to expire at once. I have > witnessed this bug on R15B01, both 64 and 32-bit versions running > on Windows Server 2008 R2, both on bare metal and VirtualBox VM. > > The time jump is always around 2126000 seconds, or a little over > 24 days. The now() time does not try to converge with > os:timestamp() as the documentation suggests, and as I confirmed > it does if you just change the system clock. > > Another VM running concurrently on the same machine but with > little load (diagnostic node & production node) did not time jump. > > Higher load seems to make the time jumps happen more often. > > Frequency between time jumps varies between seconds and hours, but > when a jump occurs, it is always 2126000 + (9 to 26) seconds. > > I never see the jump in logfile timestamps that use os:timestamp() > for tagging log messages. I had to start tracing a production > node before I caught the jump. Here are some lines from a trace, > where the timestamp in trace_ts is printed using > calendar:now_to_local_time() and then in raw tuple format: > > 2013-4-16 21:40:1.993399|{1366,173601,993399} > 2013-4-16 21:40:1.993400|{1366,173601,993400} > 2013-5-11 12:13:41.986961|{1368,299621,986961} > 2013-5-11 12:13:41.986962|{1368,299621,986962} > > then a bit later... > > 2013-5-11 12:36:19.955129|{1368,300979,955129} > 2013-5-11 12:36:19.955130|{1368,300979,955130} > 2013-6-5 3:9:49.538830|{1370,426989,538830} > 2013-6-5 3:9:49.538833|{1370,426989,538833} > > I captured many such jumps over the course of a day or so. > Obviously from the dates, 2 jumps happened before I started tracing. > > I was able to reproduce the bug, though not as efficiently as my > production system, with the following sample program: > https://gist.github.com/garret-smith/5087169 > > It took over an hour of runtime before the first time jump. I am > working on a better way to reproduce it at the moment, but it's > hard to test the test with a bug so intermittent. > > I am also testing various other VM versions. My first hope was > that this was limited to the 64-bit version where we first > encountered the problem, but a change to the 32-bit version has > only made the problem happen less often, not eliminated it. > > We never saw this bug with R14B03 which we were running previously > to R15B01. However, system load is different so I can't make a > direct comparison. I did notice a few significant updates to the > Windows time related code between R14B03 and R15: > > git log sys_time.c > > commit 46eb4359b05b220861453a869dc734480ec045a6 > Author: Patrik Nyblom > > Date: Tue Dec 6 19:07:16 2011 +0100 > > Emulate localtime, gmtime and mktime to enable negative time_t > > commit 913f05af100e98a8665bbb6168e89fbcfe4ece75 > Author: Bjrn-Egil Dahlberg > > Date: Fri Dec 2 15:25:06 2011 +0100 > > Teach windows sys_localtime_r > > > I am completely stumped. What can I do next to help track down > the source of the bug? > > Thanks, > Garret Smith > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs > > > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs -------------- next part -------------- An HTML attachment was scrubbed... URL: From garret.smith@REDACTED Tue Mar 12 16:36:37 2013 From: garret.smith@REDACTED (Garret Smith) Date: Tue, 12 Mar 2013 08:36:37 -0700 Subject: [erlang-bugs] R15B01 erlang:now() jumping ~24 days into the future In-Reply-To: <513F2FD4.8010108@erlang.org> References: <513F2FD4.8010108@erlang.org> Message-ID: On Mar 12, 2013 6:38 AM, "Patrik Nyblom" wrote: > > Hi! > > There's a patched version of the R15B02 dll in my public dropbox, under the name r15.beam.smp.dll: > > http://dl.dropbox.com/u/17212223/r15.beam.smp.dll R15B02 will work. I'll get started but it will take a couple days to get everything built, deployed and watch for time jumps. Thank you for the binary! > > If you replace the R15 beam.smp.dll with this one, the werl slogan should contain the version erts-5.9.2.0.1, if you could try that on the real app, I would be immensely grateful! > > Cheers, > /Patrik > > On 03/12/2013 02:09 PM, Vance Shipley wrote: >> >> C >> >> On Mar 5, 2013 6:56 AM, "Garret Smith" wrote: >>> >>> I have been beating my head against a wall for weeks tracking down spooky behaviour[sic] in one of our production systems. I finally tracked it down to "jumps" in the times returned by erlang:now(), causing all timers in the system to expire at once. I have witnessed this bug on R15B01, both 64 and 32-bit versions running on Windows Server 2008 R2, both on bare metal and VirtualBox VM. >>> >>> The time jump is always around 2126000 seconds, or a little over 24 days. The now() time does not try to converge with os:timestamp() as the documentation suggests, and as I confirmed it does if you just change the system clock. >>> >>> Another VM running concurrently on the same machine but with little load (diagnostic node & production node) did not time jump. >>> >>> Higher load seems to make the time jumps happen more often. >>> >>> Frequency between time jumps varies between seconds and hours, but when a jump occurs, it is always 2126000 + (9 to 26) seconds. >>> >>> I never see the jump in logfile timestamps that use os:timestamp() for tagging log messages. I had to start tracing a production node before I caught the jump. Here are some lines from a trace, where the timestamp in trace_ts is printed using calendar:now_to_local_time() and then in raw tuple format: >>> >>> 2013-4-16 21:40:1.993399|{1366,173601,993399} >>> 2013-4-16 21:40:1.993400|{1366,173601,993400} >>> 2013-5-11 12:13:41.986961|{1368,299621,986961} >>> 2013-5-11 12:13:41.986962|{1368,299621,986962} >>> >>> then a bit later... >>> >>> 2013-5-11 12:36:19.955129|{1368,300979,955129} >>> 2013-5-11 12:36:19.955130|{1368,300979,955130} >>> 2013-6-5 3:9:49.538830|{1370,426989,538830} >>> 2013-6-5 3:9:49.538833|{1370,426989,538833} >>> >>> I captured many such jumps over the course of a day or so. Obviously from the dates, 2 jumps happened before I started tracing. >>> >>> I was able to reproduce the bug, though not as efficiently as my production system, with the following sample program: https://gist.github.com/garret-smith/5087169 >>> >>> It took over an hour of runtime before the first time jump. I am working on a better way to reproduce it at the moment, but it's hard to test the test with a bug so intermittent. >>> >>> I am also testing various other VM versions. My first hope was that this was limited to the 64-bit version where we first encountered the problem, but a change to the 32-bit version has only made the problem happen less often, not eliminated it. >>> >>> We never saw this bug with R14B03 which we were running previously to R15B01. However, system load is different so I can't make a direct comparison. I did notice a few significant updates to the Windows time related code between R14B03 and R15: >>> >>> git log sys_time.c >>> >>> commit 46eb4359b05b220861453a869dc734480ec045a6 >>> Author: Patrik Nyblom >>> Date: Tue Dec 6 19:07:16 2011 +0100 >>> >>> Emulate localtime, gmtime and mktime to enable negative time_t >>> >>> commit 913f05af100e98a8665bbb6168e89fbcfe4ece75 >>> Author: Bjrn-Egil Dahlberg >>> Date: Fri Dec 2 15:25:06 2011 +0100 >>> >>> Teach windows sys_localtime_r >>> >>> >>> I am completely stumped. What can I do next to help track down the source of the bug? >>> >>> Thanks, >>> Garret Smith >>> >>> _______________________________________________ >>> erlang-bugs mailing list >>> erlang-bugs@REDACTED >>> http://erlang.org/mailman/listinfo/erlang-bugs >>> >> >> >> _______________________________________________ >> erlang-bugs mailing list >> erlang-bugs@REDACTED >> http://erlang.org/mailman/listinfo/erlang-bugs > > > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mamuelle@REDACTED Tue Mar 12 16:39:48 2013 From: mamuelle@REDACTED (Magnus =?UTF-8?B?TcO8bGxlcg==?=) Date: Tue, 12 Mar 2013 16:39:48 +0100 Subject: [erlang-bugs] R16B takes long to compile a simple module Message-ID: <20130312163948.b196047d228b342de33335f9@informatik.hu-berlin.de> The following small module takes ~10s to compile with R16B (erl +V below). The code is distilled from indent/erlang_indent.erl in vimerl [1]. Diagnostics are fast (a small syntax error somewhere crashes the compilation immediately). The same module compiles quickly (<1s) with R15B. ------------------------------------------------- -module(erlang_indent). -export([p/2]). -define(IS(T, C), (element(1, T) == C)). -record(state, {stack = []}). p(T1, #state{stack = [T2 | _]}) when ?IS(T2, a), ?IS(T1, b), ?IS(T1, c) -> ok; p(T, _) when ?IS(T, a1); ?IS(T, b1); ?IS(T, c1) -> ok; p(T, _) when ?IS(T, a2); ?IS(T, b2); ?IS(T, c2) -> ok; p(T, _) when ?IS(T, a) -> ok; p(T, _) when ?IS(T, a), (?IS(T, b) and ?IS(T, c)) -> ok; p(_, T) when ?IS(T, a) -> ok; p(_, T) when ?IS(T, b) -> ok; p(T, _) when ?IS(T, a) -> ok. ------------------------------------------------- $ erl +V Erlang (SMP,ASYNC_THREADS,HIPE) (BEAM) emulator version 5.10.1 The original file[2] in vimerl takes even longer to compile. Note that that [2] is actually an escript, but the error persists when it is converted to a module. [1] https://github.com/jimenezrick/vimerl [2] https://raw.github.com/jimenezrick/vimerl/master/indent/erlang_indent.erl From kostis@REDACTED Tue Mar 12 20:07:46 2013 From: kostis@REDACTED (Kostis Sagonas) Date: Tue, 12 Mar 2013 20:07:46 +0100 Subject: [erlang-bugs] Native compilation hangs with rm-reverse-eta-conversion In-Reply-To: References: <50FBAB1E.2070703@cs.ntua.gr> Message-ID: <513F7D02.9040907@cs.ntua.gr> On 01/23/2013 12:49 PM, Anthony Ramine wrote: > Hi, > > The bytecode invariant that I broke is the fact that a function cannot be used as > a closure and as a normal function both at the same time, thus the eta-abstraction > is needed by HiPE. > > Fredrik, for the time being you should probably revert rm-reverse-eta-conversion > because I don't think I'll be able to make HiPE work with the eta-abstraction in > that much time. > > Kostis, could you give me directions on how to make HiPE not need the intermediate > closures when doing fun Name/Arity? Thanks to Anthony repeatedly prompting me to look into this and sending me a minimal example to test and to Bjorn Gustavsson for checking the code of hipe_icode_coordinator, today I adapted the assumptions of the native code compiler and simplified the code that computes escaping functions. The following hipe patch should be included in OTP: git fetch git://github.com/kostis/otp.git hipe-cleanup-escaping After its inclusion, Anthony's patch that removes the automatic eta-abstraction for function references from the BEAM compiler can probably be included without any problems. Kostis From fredrik@REDACTED Wed Mar 13 10:15:06 2013 From: fredrik@REDACTED (Fredrik) Date: Wed, 13 Mar 2013 10:15:06 +0100 Subject: [erlang-bugs] [erlang-patches] Native compilation hangs with rm-reverse-eta-conversion In-Reply-To: <513F7D02.9040907@cs.ntua.gr> References: <50FBAB1E.2070703@cs.ntua.gr> <513F7D02.9040907@cs.ntua.gr> Message-ID: <5140439A.2030205@erlang.org> On 03/12/2013 08:07 PM, Kostis Sagonas wrote: > On 01/23/2013 12:49 PM, Anthony Ramine wrote: >> Hi, >> >> The bytecode invariant that I broke is the fact that a function >> cannot be used as >> a closure and as a normal function both at the same time, thus the >> eta-abstraction >> is needed by HiPE. >> >> Fredrik, for the time being you should probably revert >> rm-reverse-eta-conversion >> because I don't think I'll be able to make HiPE work with the >> eta-abstraction in >> that much time. >> >> Kostis, could you give me directions on how to make HiPE not need the >> intermediate >> closures when doing fun Name/Arity? > > Thanks to Anthony repeatedly prompting me to look into this and > sending me a minimal example to test and to Bjorn Gustavsson for > checking the code of hipe_icode_coordinator, today I adapted the > assumptions of the native code compiler and simplified the code that > computes escaping functions. The following hipe patch should be > included in OTP: > > > git fetch git://github.com/kostis/otp.git hipe-cleanup-escaping > > > After its inclusion, Anthony's patch that removes the automatic > eta-abstraction for function references from the BEAM compiler can > probably be included without any problems. > > Kostis > _______________________________________________ > erlang-patches mailing list > erlang-patches@REDACTED > http://erlang.org/mailman/listinfo/erlang-patches Fetched. It is now in the 'pu' branch. -- BR Fredrik Gustafsson Erlang OTP Team From bgustavsson@REDACTED Wed Mar 13 15:29:43 2013 From: bgustavsson@REDACTED (=?UTF-8?Q?Bj=C3=B6rn_Gustavsson?=) Date: Wed, 13 Mar 2013 15:29:43 +0100 Subject: [erlang-bugs] R16B takes long to compile a simple module In-Reply-To: <20130312163948.b196047d228b342de33335f9@informatik.hu-berlin.de> References: <20130312163948.b196047d228b342de33335f9@informatik.hu-berlin.de> Message-ID: Thanks for reporting this issue! I introduced a new optimization in R16 and failed to optimize it. I will fix it in the R16B01 release. /Bjorn On Tue, Mar 12, 2013 at 4:39 PM, Magnus M?ller < mamuelle@REDACTED> wrote: > The following small module takes ~10s to compile with R16B (erl +V > below). The code is distilled from indent/erlang_indent.erl in vimerl > [1]. Diagnostics are fast (a small syntax error somewhere crashes the > compilation immediately). The same module compiles quickly (<1s) with R15B. > > ------------------------------------------------- > -module(erlang_indent). > > -export([p/2]). > > -define(IS(T, C), (element(1, T) == C)). > > -record(state, {stack = []}). > > p(T1, #state{stack = [T2 | _]}) when ?IS(T2, a), ?IS(T1, b), ?IS(T1, c) > -> ok; p(T, _) when ?IS(T, a1); ?IS(T, b1); ?IS(T, c1) -> ok; > p(T, _) when ?IS(T, a2); ?IS(T, b2); ?IS(T, c2) -> ok; > p(T, _) when ?IS(T, a) -> ok; > p(T, _) when ?IS(T, a), (?IS(T, b) and ?IS(T, c)) -> ok; > p(_, T) when ?IS(T, a) -> ok; > p(_, T) when ?IS(T, b) -> ok; > p(T, _) when ?IS(T, a) -> ok. > ------------------------------------------------- > > $ erl +V > Erlang (SMP,ASYNC_THREADS,HIPE) (BEAM) emulator version 5.10.1 > > > The original file[2] in vimerl takes even longer to compile. Note that > that [2] is actually an escript, but the error persists when > it is converted to a module. > > > [1] https://github.com/jimenezrick/vimerl > [2] > https://raw.github.com/jimenezrick/vimerl/master/indent/erlang_indent.erl > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs > -- Bj?rn Gustavsson, Erlang/OTP, Ericsson AB -------------- next part -------------- An HTML attachment was scrubbed... URL: From arn@REDACTED Fri Mar 15 15:19:56 2013 From: arn@REDACTED (Anton Yabchinskiy) Date: 15 Mar 2013 18:19:56 +0400 Subject: [erlang-bugs] Possible regression in httpc's cookie handling Message-ID: Hello, I've encountered a difference in behaviour of HTTP client in R15B01 (Debian build) and in R16B (Erlang Solutions build). Consider the following code: #!/usr/bin/env escript main(_Args) -> Profile = some_profile, ok = application:start(inets), {ok, _Pid} = inets:start(httpc, [{profile, Profile}]), ok = httpc:set_option(cookies, enabled, Profile), _Ans = httpc:request("http://www.google.ru/", Profile), io:format("~p~n", [httpc:which_cookies(Profile)]). When run with R15B01 it outputs the following: [{session_cookies,[{http_cookie,".google.ru",false,"PREF", "ID=542c81909139855f:FF=0:NW=1:TM=1363356181:LM=1363356181:S=JNFNZBI_nhJC-IIO", undefined,session,"/",false,false,"0"}, {http_cookie,".google.ru",false,"NID", "67=CkprmSvcQFKD7P0pt1FkRHkXZXTe_geBYXy2gk65yJTJyxvIjqm0Mrc7xErtR4xL5qaKsfUMC4oTWsvJze910qRx79VBf66rivfjmN88bVhg9aDd6YS2M3UohXLXT68t", undefined,session,"/",false,false,"0"}]}] The output for R16B is: [{session_cookies,[]}] There is no difference in behaviour if profile isn't used. I'm not sure, but probably it's related to commit 9c85ee8b61c24587a228b3644c37b1b4fdfb7dcb, which includes the following change in lib/inets/src/http_client/httpc_handler.erl file: - handle_cookies(Headers, Request, Options, ProfileName), + handle_cookies(Headers, Request, Options, httpc_manager), %% FOO profile_name From Ingela.Anderton.Andin@REDACTED Fri Mar 15 15:53:44 2013 From: Ingela.Anderton.Andin@REDACTED (Ingela Anderton Andin) Date: Fri, 15 Mar 2013 15:53:44 +0100 Subject: [erlang-bugs] Possible regression in httpc's cookie handling In-Reply-To: References: Message-ID: <514335F8.9040308@ericsson.com> Hi! Thank you for reporting this. It looks really strange and I must have committed it by accident. The change has nothing to do with the with the rest of the commit. You could try changing it back and see if it helps. We have a new test case to write. Regards Ingela Erlang/OTP team - Ericsson AB Anton Yabchinskiy wrote: > Hello, > > I've encountered a difference in behaviour of HTTP client in > R15B01 (Debian build) and in R16B (Erlang Solutions build). > Consider the following code: > > #!/usr/bin/env escript > > main(_Args) -> > Profile = some_profile, > ok = application:start(inets), > {ok, _Pid} = inets:start(httpc, [{profile, Profile}]), > ok = httpc:set_option(cookies, enabled, Profile), > _Ans = httpc:request("http://www.google.ru/", Profile), > io:format("~p~n", [httpc:which_cookies(Profile)]). > > When run with R15B01 it outputs the following: > > [{session_cookies,[{http_cookie,".google.ru",false,"PREF", > > "ID=542c81909139855f:FF=0:NW=1:TM=1363356181:LM=1363356181:S=JNFNZBI_nhJC-IIO", > > undefined,session,"/",false,false,"0"}, > {http_cookie,".google.ru",false,"NID", > > "67=CkprmSvcQFKD7P0pt1FkRHkXZXTe_geBYXy2gk65yJTJyxvIjqm0Mrc7xErtR4xL5qaKsfUMC4oTWsvJze910qRx79VBf66rivfjmN88bVhg9aDd6YS2M3UohXLXT68t", > > undefined,session,"/",false,false,"0"}]}] > > The output for R16B is: > > [{session_cookies,[]}] > > There is no difference in behaviour if profile isn't used. > > I'm not sure, but probably it's related to commit > 9c85ee8b61c24587a228b3644c37b1b4fdfb7dcb, which includes > the following change in lib/inets/src/http_client/httpc_handler.erl > file: > > - handle_cookies(Headers, Request, Options, ProfileName), + > handle_cookies(Headers, Request, Options, httpc_manager), %% FOO > profile_name > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs From bourinov@REDACTED Fri Mar 15 18:10:49 2013 From: bourinov@REDACTED (Max Bourinov) Date: Fri, 15 Mar 2013 18:10:49 +0100 Subject: [erlang-bugs] Error in cover.erl Message-ID: Error in cover.erl =ERROR REPORT==== 15-Mar-2013::18:07:42 === Error in process <0.213.0> with exit value: {function_clause,[{lists,last,[[]],[{file,"lists.erl"},{line,162}]},{cover,fix_clauses,3,[{file,"cover.erl"},{line,1621}]},{cover,fix_expr,3,[{file,"cover.erl"},{line,1609}]},{cover,fix_expr,3,[{file,"cover.erl"},{line,1614}]},{cover,fix_expr,3,[... ERROR: eunit failed while processing /user/max/project/processor: {'EXIT',{function_clause,[{lists,last,[[]],[{file,"lists.erl"},{line,162}]}, {cover,fix_clauses,3, [{file,"cover.erl"},{line,1621}]}, {cover,fix_expr,3,[{file,"cover.erl"},{line,1609}]}, {cover,fix_expr,3,[{file,"cover.erl"},{line,1614}]}, {cover,fix_expr,3,[{file,"cover.erl"},{line,1614}]}, {cover,fix_expr,3,[{file,"cover.erl"},{line,1616}]}, {cover,fix_last_expr,3, [{file,"cover.erl"},{line,1590}]}, {cover,munge_body,4, [{file,"cover.erl"},{line,1535}]}]}} Best regards, Max -------------- next part -------------- An HTML attachment was scrubbed... URL: From arn@REDACTED Fri Mar 15 18:38:47 2013 From: arn@REDACTED (Anton Yabchinskiy) Date: Fri, 15 Mar 2013 21:38:47 +0400 Subject: [erlang-bugs] Possible regression in httpc's cookie handling In-Reply-To: <514335F8.9040308@ericsson.com> References: <514335F8.9040308@ericsson.com> Message-ID: <20130315173847.GA21449@mithlond.erebor71.org> On 2013-03-15 15:53:44+0100, Ingela Anderton Andin wrote: > Hi! > > Thank you for reporting this. It looks really strange and I must > have committed it by accident. The change has nothing to do with the > with the > rest of the commit. You could try changing it back and see if it helps. > We have a new test case to write. Yes, reverting that line does help. It works as expected now. From n.oxyde@REDACTED Mon Mar 18 14:08:32 2013 From: n.oxyde@REDACTED (Anthony Ramine) Date: Mon, 18 Mar 2013 14:08:32 +0100 Subject: [erlang-bugs] Minor annoyance after 'make clean' In-Reply-To: References: <5114E1C0.10901@cs.ntua.gr> Message-ID: <22ED1658-3AD6-49BD-9D69-83A02726CF90@gmail.com> Ping? -- Anthony Ramine Le 8 f?vr. 2013 ? 12:47, Anthony Ramine a ?crit : > Hi, > > It can, and here is a fix. > > git fetch https://github.com/nox/otp.git fix-ssh-html-doc > > https://github.com/nox/otp/compare/erlang:master...fix-ssh-html-doc > https://github.com/nox/otp/compare/erlang:master...fix-ssh-html-doc.patch > > Regards, > > -- > Anthony Ramine > > Le 8 f?vr. 2013 ? 12:30, Kostis Sagonas a ?crit : > >> Every time I issue a 'make clean' the file >> >> lib/ssh/doc/html/SSH_protocols.png >> >> which apparently is part of the code base of the master branch, gets deleted. Is this intentional? >> >> The problem is that after the 'make clean', a subsequent 'git status' command shows the following: >> >> # On branch master >> # Changed but not updated: >> # (use "git add/rm ..." to update what will be committed) >> # (use "git checkout -- ..." to discard changes in working directory) >> # >> # deleted: lib/ssh/doc/html/SSH_protocols.png >> # >> >> >> Can this be fixed? >> >> Kostis >> _______________________________________________ >> erlang-bugs mailing list >> erlang-bugs@REDACTED >> http://erlang.org/mailman/listinfo/erlang-bugs > From fredrik@REDACTED Mon Mar 18 17:04:55 2013 From: fredrik@REDACTED (Fredrik) Date: Mon, 18 Mar 2013 17:04:55 +0100 Subject: [erlang-bugs] Minor annoyance after 'make clean' In-Reply-To: <22ED1658-3AD6-49BD-9D69-83A02726CF90@gmail.com> References: <5114E1C0.10901@cs.ntua.gr> <22ED1658-3AD6-49BD-9D69-83A02726CF90@gmail.com> Message-ID: <51473B27.5080608@erlang.org> On 03/18/2013 02:08 PM, Anthony Ramine wrote: > Ping? > Fetched, Currently building in the 'pu' branch. Thanks, -- BR Fredrik Gustafsson Erlang OTP Team From smith.winston.101@REDACTED Mon Mar 18 19:27:32 2013 From: smith.winston.101@REDACTED (Winston Smith) Date: Mon, 18 Mar 2013 14:27:32 -0400 Subject: [erlang-bugs] Mnesia/R15B: TYPE ASSERTION FAILED, erl_term.c line 109 (when stopping mnesia) In-Reply-To: References: Message-ID: On Mon, Apr 2, 2012 at 11:04 PM, Winston Smith wrote: > I have run into the following issue with R15B cross compiled to an > AVR32 (similar to ARM) system (no HiPE). > > > (mynode@REDACTED)6> mnesia:stop(). > TYPE ASSERTION FAILED, file beam/erl_term.c, line 109: tag_val_def: > 0x8e422b5c > Aborted > > > Interestingly, if I bring up a standalone erl, I don't get the assert, > it segfaults instead: > > > # erts-5.9/bin/erl > Eshell V5.9 (abort with ^G) > 1> mnesia:create_schema([node()]). > ok > 2> mnesia:start(). > ok > 3> mnesia:stop(). > Segmentation fault Just to follow up on this (for search engine completeness!) I cross compiled R16B for the avr32 system and tried this out again -- the SEGV issue seems to be resolved ... I'm not sure where/when it actually got fixed. Thanks, -W. -------------- next part -------------- An HTML attachment was scrubbed... URL: From zerthurd@REDACTED Thu Mar 21 07:40:21 2013 From: zerthurd@REDACTED (Maxim Treskin) Date: Thu, 21 Mar 2013 13:40:21 +0700 Subject: [erlang-bugs] Dialyzer bug: incorrect duplicate modules Message-ID: Hello At Montenegro Erlang Hackaton ( http://lanyrd.com/2013/herceg-novi-erlang-meetup/ , there were only two people, unfortunately ) we found incorrect behaviour of Dialyzer. Our project erroneous had a duplicated modules with the same name, but different content. When we check it with dialyzer it show me something like that: Duplicate modules: [["/var/tmp/myproj/apps/myproj/ebin/psc_operate.beam", "/var/tmp/myproj/deps/somedep/ebin/amp_common_utils.beam"]] Obviously it is not the same modules. So I had to search this bug and find strange behaviour in dialyzer. Function lists:zip/2 called with two list, where first is reversed list of modules as atom, and second is list of filepaths for modules. And this list not always contains correspond elements. Module with name some_module1 can be has filename like abc_module55.beam. This is the cause of error. This bug exists in R15B02 and R16. I wrote such patch to fix bug, but I don't know whether this is solution or not, though it works fine. --- /opt/r16a/lib/dialyzer-2.5.4/src/dialyzer_analysis_callgraph.erl 2013-01-31 12:55:53.210402846 +0700 +++ dialyzer_pa/dialyzer_analysis_callgraph.erl 2013-03-21 13:20:46.794991889 +0700 @@ -255,10 +255,18 @@ CServer2 = dialyzer_codeserver:set_next_core_label(NextLabel, CServer), case Failed =:= [] of true -> - NewFiles = lists:zip(lists:reverse(Modules), Files), + %% Modules and Files have not the same order, so it is meaningless to zip it + %% NewFiles = lists:zip(lists:reverse(Modules), Files), + ModDict = - lists:foldl(fun({Mod, F}, Dict) -> dict:append(Mod, F, Dict) end, - dict:new(), NewFiles), + lists:foldl(fun(F, Dict) -> + ModFile = lists:last(filename:split(F)), + Mod = filename:basename(ModFile, ".beam"), + dict:append(Mod, F, Dict) end, + dict:new(), Files), + %% ModDict = + %% lists:foldl(fun({Mod, F}, Dict) -> dict:append(Mod, F, Dict) end, + %% dict:new(), NewFiles), check_for_duplicate_modules(ModDict); false -> Msg = io_lib:format("Could not scan the following file(s): ~p", -- Max Treskin -------------- next part -------------- An HTML attachment was scrubbed... URL: From xramtsov@REDACTED Thu Mar 21 10:40:26 2013 From: xramtsov@REDACTED (Evgeniy Khramtsov) Date: Thu, 21 Mar 2013 19:40:26 +1000 Subject: [erlang-bugs] Dialyzer bug: incorrect duplicate modules In-Reply-To: References: Message-ID: <514AD58A.9010208@gmail.com> On 21.03.2013 16:40, Maxim Treskin wrote: > Hello > > At Montenegro Erlang Hackaton ( > http://lanyrd.com/2013/herceg-novi-erlang-meetup/ , there were only > two people, unfortunately ) we found incorrect behaviour of Dialyzer. > > Our project erroneous had a duplicated modules with the same name, but > different content. When we check it with dialyzer it show me something > like that: > > Duplicate modules: [["/var/tmp/myproj/apps/myproj/ebin/psc_operate.beam", > > "/var/tmp/myproj/deps/somedep/ebin/amp_common_utils.beam"]] > > Obviously it is not the same modules. So I had to search this bug and > find strange behaviour in dialyzer. Function lists:zip/2 called with > two list, where first is reversed list of modules as atom, and second > is list of filepaths for modules. And this list not always contains > correspond elements. Module with name some_module1 can be has filename > like abc_module55.beam. This is the cause of error. > > This bug exists in R15B02 and R16. > > I wrote such patch to fix bug, but I don't know whether this is > solution or not, though it works fine. > > --- /opt/r16a/lib/dialyzer-2.5.4/src/dialyzer_analysis_callgraph.erl > 2013-01-31 12:55:53.210402846 +0700 > +++ dialyzer_pa/dialyzer_analysis_callgraph.erl 2013-03-21 > 13:20:46.794991889 +0700 > @@ -255,10 +255,18 @@ > CServer2 = dialyzer_codeserver:set_next_core_label(NextLabel, CServer), > case Failed =:= [] of > true -> > - NewFiles = lists:zip(lists:reverse(Modules), Files), > + %% Modules and Files have not the same order, so it is > meaningless to zip it > + %% NewFiles = lists:zip(lists:reverse(Modules), Files), > + > ModDict = > - lists:foldl(fun({Mod, F}, Dict) -> dict:append(Mod, F, Dict) end, > - dict:new(), NewFiles), > + lists:foldl(fun(F, Dict) -> > + ModFile = lists:last(filename:split(F)), > + Mod = filename:basename(ModFile, ".beam"), > + dict:append(Mod, F, Dict) end, > + dict:new(), Files), > + %% ModDict = > + %% lists:foldl(fun({Mod, F}, Dict) -> dict:append(Mod, F, > Dict) end, > + %% dict:new(), NewFiles), > check_for_duplicate_modules(ModDict); > false -> > Msg = io_lib:format("Could not scan the following file(s): ~p", I have the same problem. Thanks for the patch. -- Regards, Evgeniy Khramtsov, ProcessOne. xmpp:xram@REDACTED From bourinov@REDACTED Thu Mar 21 11:10:00 2013 From: bourinov@REDACTED (Max Bourinov) Date: Thu, 21 Mar 2013 11:10:00 +0100 Subject: [erlang-bugs] Dialyzer bug: incorrect duplicate modules In-Reply-To: <514AD58A.9010208@gmail.com> References: <514AD58A.9010208@gmail.com> Message-ID: Montenegro Erlang Hackaton was great indeed! Thank you for your patch Max! Best regards, Max On Thu, Mar 21, 2013 at 10:40 AM, Evgeniy Khramtsov wrote: > On 21.03.2013 16:40, Maxim Treskin wrote: > >> Hello >> >> At Montenegro Erlang Hackaton ( http://lanyrd.com/2013/herceg-** >> novi-erlang-meetup/ , there were only two people, unfortunately ) we found incorrect behaviour >> of Dialyzer. >> >> Our project erroneous had a duplicated modules with the same name, but >> different content. When we check it with dialyzer it show me something like >> that: >> >> Duplicate modules: [["/var/tmp/myproj/apps/** >> myproj/ebin/psc_operate.beam", >> "/var/tmp/myproj/deps/somedep/** >> ebin/amp_common_utils.beam"]] >> >> Obviously it is not the same modules. So I had to search this bug and >> find strange behaviour in dialyzer. Function lists:zip/2 called with two >> list, where first is reversed list of modules as atom, and second is list >> of filepaths for modules. And this list not always contains correspond >> elements. Module with name some_module1 can be has filename like >> abc_module55.beam. This is the cause of error. >> >> This bug exists in R15B02 and R16. >> >> I wrote such patch to fix bug, but I don't know whether this is solution >> or not, though it works fine. >> >> --- /opt/r16a/lib/dialyzer-2.5.4/**src/dialyzer_analysis_**callgraph.erl >> 2013-01-31 12:55:53.210402846 +0700 >> +++ dialyzer_pa/dialyzer_analysis_**callgraph.erl 2013-03-21 >> 13:20:46.794991889 +0700 >> @@ -255,10 +255,18 @@ >> CServer2 = dialyzer_codeserver:set_next_**core_label(NextLabel, >> CServer), >> case Failed =:= [] of >> true -> >> - NewFiles = lists:zip(lists:reverse(**Modules), Files), >> + %% Modules and Files have not the same order, so it is meaningless >> to zip it >> + %% NewFiles = lists:zip(lists:reverse(**Modules), Files), >> + >> ModDict = >> - lists:foldl(fun({Mod, F}, Dict) -> dict:append(Mod, F, Dict) end, >> - dict:new(), NewFiles), >> + lists:foldl(fun(F, Dict) -> >> + ModFile = lists:last(filename:split(F)), >> + Mod = filename:basename(ModFile, ".beam"), >> + dict:append(Mod, F, Dict) end, >> + dict:new(), Files), >> + %% ModDict = >> + %% lists:foldl(fun({Mod, F}, Dict) -> dict:append(Mod, F, Dict) >> end, >> + %% dict:new(), NewFiles), >> check_for_duplicate_modules(**ModDict); >> false -> >> Msg = io_lib:format("Could not scan the following file(s): ~p", >> > > I have the same problem. Thanks for the patch. > > -- > Regards, > Evgeniy Khramtsov, ProcessOne. > xmpp:xram@REDACTED > > ______________________________**_________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/**listinfo/erlang-bugs > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sgolovan@REDACTED Sun Mar 24 07:58:58 2013 From: sgolovan@REDACTED (Sergei Golovan) Date: Sun, 24 Mar 2013 10:58:58 +0400 Subject: [erlang-bugs] Bug with named subpatterns in re module Message-ID: Hi! Chris King recently discovered a bug in re module. Appears that the matched named subpatterns are not always returned. The following command works correctly: 1> re:run("bar", "^(?foo)(?bla)$|^(?[[:word:]]+)$", [dupnames, {capture, [a, b], list}]). {match,["bar",[]]} But semantically the same one doesn't (note the swapped and ): 1> re:run("bar", "^(?foo)(?bla)$|^(?[[:word:]]+)$", [dupnames, {capture, [a, b], list}]). {match,[[],[]]} In both cases the second branch matches, but only the first command returns the required subpattern. The bug is reproducible in R16B. Cheers! -- Sergei Golovan From pan@REDACTED Thu Mar 28 12:35:54 2013 From: pan@REDACTED (Patrik Nyblom) Date: Thu, 28 Mar 2013 12:35:54 +0100 Subject: [erlang-bugs] Bug with named subpatterns in re module In-Reply-To: References: Message-ID: <51542B1A.2010406@erlang.org> Hi! I'm unsure of the nature of this bug. What are you actually expecting as a return when you use duplicate names and named capture? Both instances of the name, "the right instance" of the name or a badarg? I.e would you like re:run("bar", "^(?foo)(?bla)$|^(?[[:word:]]+)$",[dupnames, {capture, [a, b], list}]). to give the same result as: re:run("bar", "^(?foo)(?bla)$|^(?[[:word:]]+)$",[dupnames, {capture, [a, b, c], list}]). ? Or return the second instance if that matches, but the first instance if that one matches? Or should we simply not allow it? The thing is that even with dupnames, you have a varying amount of subexpressions. Capturing 'all' (or rather 'all_but_first') will show you that this call returns three distinct subexpressions, of which two happen to have the same name (regardless of the names). If the part before | matches, the result is only two subexpressions, as the first two subexpressions match. No duplicate naming will change this. There is no real "select the one that matches" functionality in giving two subexpressions the same name. PCRE just picks one of the occurences of a name when you ask for it - in your last example the occurence you were not expecting, but that's more or less random, the first example would give unexpected results if the first part matched. PCRE has no functionality to pick all occurences of a name, but that could of course be changed if there was some understandable semantics that should be implemented. I think badarg exception is the way to go though... Cheers, /Patrik On 03/24/2013 07:58 AM, Sergei Golovan wrote: > Hi! > > Chris King recently discovered a bug in re module. Appears that the > matched named subpatterns are not always returned. > > The following command works correctly: > 1> re:run("bar", "^(?foo)(?bla)$|^(?[[:word:]]+)$", > [dupnames, {capture, [a, b], list}]). > {match,["bar",[]]} > > But semantically the same one doesn't (note the swapped and ): > 1> re:run("bar", "^(?foo)(?bla)$|^(?[[:word:]]+)$", > [dupnames, {capture, [a, b], list}]). > {match,[[],[]]} > > In both cases the second branch matches, but only the first command > returns the required subpattern. > > The bug is reproducible in R16B. > > Cheers! From sgolovan@REDACTED Thu Mar 28 12:59:04 2013 From: sgolovan@REDACTED (Sergei Golovan) Date: Thu, 28 Mar 2013 15:59:04 +0400 Subject: [erlang-bugs] Bug with named subpatterns in re module In-Reply-To: <51542B1A.2010406@erlang.org> References: <51542B1A.2010406@erlang.org> Message-ID: Hi! On Thu, Mar 28, 2013 at 3:35 PM, Patrik Nyblom wrote: > > I'm unsure of the nature of this bug. What are you actually expecting as a > return when you use duplicate names and named capture? Both instances of the > name, "the right instance" of the name or a badarg? At least the results should not depend on the pattern names. When I run the following Perl script: #! /usr/bin/perl $var = 'bar'; $var =~ m/^(?foo)(?bla)$|^(?[[:word:]]+)$/; pplus(); $var =~ m/^(?foo)(?bla)$|^(?[[:word:]]+)$/; pplus(); sub pplus { foreach (keys %+) { print "$_: $+{$_}\n"; } } It prints the following: a: bar b: bar Which means that it captures the only matching pattern. Perl docs say that in case of duplicate names the leftmost matched one is captured. I would say that the less the difference in behavior in re and the original Perl regexp the better. > > I.e would you like > > > re:run("bar", "^(?foo)(?bla)$|^(?[[:word:]]+)$",[dupnames, > {capture, [a, b], list}]). > > to give the same result as: > > re:run("bar", "^(?foo)(?bla)$|^(?[[:word:]]+)$",[dupnames, > {capture, [a, b, c], list}]). > > ? Or return the second instance if that matches, but the first instance if > that one matches? Or should we simply not allow it? The thing is that even > with dupnames, you have a varying amount of subexpressions. Capturing 'all' > (or rather 'all_but_first') will show you that this call returns three > distinct subexpressions, of which two happen to have the same name > (regardless of the names). If the part before | matches, the result is only > two subexpressions, as the first two subexpressions match. No duplicate > naming will change this. There is no real "select the one that matches" > functionality in giving two subexpressions the same name. > > PCRE just picks one of the occurences of a name when you ask for it - in > your last example the occurence you were not expecting, but that's more or > less random, the first example would give unexpected results if the first > part matched. PCRE has no functionality to pick all occurences of a name, > but that could of course be changed if there was some understandable > semantics that should be implemented. I think badarg exception is the way to > go though... Well, re manpage says that dupnames is helpful in case when it's certain that two subpatterns with the same name can't be matched simultaneously. Fortunately, the considered regexp falls in this category. So, I guess that either dupnames has to be removed at all, or something should be done with it. Cheers! -- Sergei Golovan From pan@REDACTED Thu Mar 28 17:13:25 2013 From: pan@REDACTED (Patrik Nyblom) Date: Thu, 28 Mar 2013 17:13:25 +0100 Subject: [erlang-bugs] Bug with named subpatterns in re module In-Reply-To: References: <51542B1A.2010406@erlang.org> Message-ID: <51546C25.3090801@erlang.org> On 03/28/2013 12:59 PM, Sergei Golovan wrote: > Hi! > > On Thu, Mar 28, 2013 at 3:35 PM, Patrik Nyblom wrote: >> I'm unsure of the nature of this bug. What are you actually expecting as a >> return when you use duplicate names and named capture? Both instances of the >> name, "the right instance" of the name or a badarg? > At least the results should not depend on the pattern names. No, definitely not - the results now are more or less random, so something needs to be done- > > When I run the following Perl script: > > #! /usr/bin/perl > > $var = 'bar'; > $var =~ m/^(?foo)(?bla)$|^(?[[:word:]]+)$/; > pplus(); > $var =~ m/^(?foo)(?bla)$|^(?[[:word:]]+)$/; > pplus(); > > sub pplus { > foreach (keys %+) { > print "$_: $+{$_}\n"; > } > } > > It prints the following: > > a: bar > b: bar > > Which means that it captures the only matching pattern. Perl docs say > that in case of duplicate names the leftmost matched one is captured. > I would say that the less the difference in behavior in re and the > original Perl regexp the better. Okay, thanks for explaining! The leftmost matching might be doable - the pcre_get_stringtable_entries can be used and we could then extract the first entry for that name that is bound. We now use pcre_get_stringnumber, which gives a random instance of that name and should not be used with dupnames. What about "all" then, it returns all bound indexes and will possibly return the duplicate name's binding twice, once as [] and once as "bar" (in your example). Should it skip a binding where the same name is bound later, or should it return them all, as it does now? 'all' kind of means "all indexes" rather than "all names". Should we add "all_names" to get the behavior that you demonstrate in your Perl program? Or maybe just let 'all' be as is and just fix the thing where you specifically list names... Hmmm - thoughts? > >> I.e would you like >> >> >> re:run("bar", "^(?foo)(?bla)$|^(?[[:word:]]+)$",[dupnames, >> {capture, [a, b], list}]). >> >> to give the same result as: >> >> re:run("bar", "^(?foo)(?bla)$|^(?[[:word:]]+)$",[dupnames, >> {capture, [a, b, c], list}]). >> >> ? Or return the second instance if that matches, but the first instance if >> that one matches? Or should we simply not allow it? The thing is that even >> with dupnames, you have a varying amount of subexpressions. Capturing 'all' >> (or rather 'all_but_first') will show you that this call returns three >> distinct subexpressions, of which two happen to have the same name >> (regardless of the names). If the part before | matches, the result is only >> two subexpressions, as the first two subexpressions match. No duplicate >> naming will change this. There is no real "select the one that matches" >> functionality in giving two subexpressions the same name. >> >> PCRE just picks one of the occurences of a name when you ask for it - in >> your last example the occurence you were not expecting, but that's more or >> less random, the first example would give unexpected results if the first >> part matched. PCRE has no functionality to pick all occurences of a name, >> but that could of course be changed if there was some understandable >> semantics that should be implemented. I think badarg exception is the way to >> go though... > Well, re manpage says that dupnames is helpful in case when it's > certain that two subpatterns with the same name can't be matched > simultaneously. Fortunately, the considered regexp falls in this > category. So, I guess that either dupnames has to be removed at all, > or something should be done with it. Funny that I wrote that, when I very well knew that the PCRE API's I used did not work with dupnames :) Well, removing dupnames might be the easiest, but as there are perl semantics we can imitate, I think we should give it a try! > > Cheers! Cheers, /Patrik From sgolovan@REDACTED Thu Mar 28 17:52:56 2013 From: sgolovan@REDACTED (Sergei Golovan) Date: Thu, 28 Mar 2013 20:52:56 +0400 Subject: [erlang-bugs] Bug with named subpatterns in re module In-Reply-To: <51546C25.3090801@erlang.org> References: <51542B1A.2010406@erlang.org> <51546C25.3090801@erlang.org> Message-ID: Hi! On Thu, Mar 28, 2013 at 8:13 PM, Patrik Nyblom wrote: > > Well, removing dupnames might be the easiest, but as there are perl > semantics we can imitate, I think we should give it a try! I should say that PCRE manual describes named subpatterns using the following regexp: (?Mon|Fri|Sun)(?:day)?| (?Tue)(?:sday)?| (?Wed)(?:nesday)?| (?Thu)(?:rsday)?| (?Sat)(?:urday)? (search 'NAMED SUBPATTERNS' in http://www.pcre.org/pcre.txt). And currently 1> re:run("Monday", "(?Mon|Fri|Sun)(?:day)?|(?Tue)(?:sday)?|(?Wed)(?:nesday)?|(?Thu)(?:rsday)?|(?Sat)(?:urday)?", [dupnames, {capture, ['DN'], list}]). {match,[[]]} doesn't work. If I leave only one branch it works fine: 2> re:run("Monday", "(?Mon|Fri|Sun)(?:day)?", [dupnames, {capture, ['DN'], list}]). {match,["Mon"]} Cheers! -- Sergei Golovan From erlangpro@REDACTED Fri Mar 29 21:34:06 2013 From: erlangpro@REDACTED (Josh =?iso-8859-1?Q?March=E1n?=) Date: Fri, 29 Mar 2013 16:34:06 -0400 Subject: [erlang-bugs] R16 breaks dots Message-ID: <20130329203406.GB1251@zushakon> It's widely known that it's useful to be able to use dots/periods/full-stops (choose your dialect) in Erlang code to maximize compatibility, specially with more modern languages like JavaScript. Unfortunately for the world of Erlang, R16 breaks something that has been tremendously useful. I am no longer able to do this: console.log("Hello from erlang", 1, 2, "More string here", TildePMe) which is leading to a lot of confusion when I regularly switch between JavaScript and Erlang. I would like to formally request that $. once again become a valid character in Erlang identifiers. Until such a time, I regret I must divest from version upgrades. I look forward to your response (and prompt bugfix). -- Josh March?n -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 490 bytes Desc: not available URL: From norton@REDACTED Sat Mar 30 08:45:13 2013 From: norton@REDACTED (Joseph Wayne Norton) Date: Sat, 30 Mar 2013 16:45:13 +0900 Subject: [erlang-bugs] R16 breaks dots In-Reply-To: <20130329203406.GB1251@zushakon> References: <20130329203406.GB1251@zushakon> Message-ID: <7D1A4A43-6783-4ACF-936F-C644BEF8839E@lovely.email.ne.jp> Josh - I'm just curious but shouldn't quoted atoms work for your needs? 'console.log'("Hello from erlang", 1, 2, "More string here", TildePMe) thanks, Joe N. On Mar 30, 2013, at 5:34 AM, Josh March?n wrote: > It's widely known that it's useful to be able to use > dots/periods/full-stops (choose your dialect) in Erlang code to maximize > compatibility, specially with more modern languages like JavaScript. > > Unfortunately for the world of Erlang, R16 breaks something that has been > tremendously useful. I am no longer able to do this: > > console.log("Hello from erlang", 1, 2, "More string here", TildePMe) > > which is leading to a lot of confusion when I regularly switch between > JavaScript and Erlang. > > I would like to formally request that $. once again become a valid > character in Erlang identifiers. Until such a time, I regret I must divest > from version upgrades. I look forward to your response (and prompt bugfix). > -- > Josh March?n > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs From n.oxyde@REDACTED Sat Mar 30 10:42:04 2013 From: n.oxyde@REDACTED (Anthony Ramine) Date: Sat, 30 Mar 2013 10:42:04 +0100 Subject: [erlang-bugs] R16 breaks dots In-Reply-To: <20130329203406.GB1251@zushakon> References: <20130329203406.GB1251@zushakon> Message-ID: <13603AE6-E8E6-4EAD-899F-35D9CD7D2187@gmail.com> Hello, While I find the prospect of refusing to upgrade Erlang because it can't be made to look like JavaScript anymore (are you seriously serious?), I do want to know why dots aren't allowed in atoms anymore and would like to see them back too. It was pretty useful to be able to write unquoted fully-qualified node names in the prompt, e.g. foo@REDACTED Furthermore, it feels to me like their removal was a mistake, as demonstrated by this: 1> foo.bar. * 1: syntax error before: '.' 1> foo. bar. foo 2> bar. bar What you can see here is that the blanks after a dot are still mandatory to properly parse a '.' character as a 'dot' token, terminating an expression in the shell (or a form in a module), this was mandatory to distinguish dot terminators from dots in atoms. If dots are really to not be allowed anymore in atoms, the blanks should be made optional, to be consistent with the rest of the language where blanks are optional before or after a symbol (with the notable exception of a match '=' followed by a binary literal '<<...>>'). Anyway, for the original complaint of Erlang's syntax not being the same as JavaScript and compatibility concerns, it should be noted that *syntax is nothing* and that all that matters are semantics. The ones from JS being at antipodes from Erlang's, I think it's a good thing you can't mistake one for another. It should also be noted that there is nothing "more modern" about JS' syntax when compared to the one of Erlang. Regards, -- Anthony Ramine Le 29 mars 2013 ? 21:34, Josh March?n a ?crit : > It's widely known that it's useful to be able to use > dots/periods/full-stops (choose your dialect) in Erlang code to maximize > compatibility, specially with more modern languages like JavaScript. > > Unfortunately for the world of Erlang, R16 breaks something that has been > tremendously useful. I am no longer able to do this: > > console.log("Hello from erlang", 1, 2, "More string here", TildePMe) > > which is leading to a lot of confusion when I regularly switch between > JavaScript and Erlang. > > I would like to formally request that $. once again become a valid > character in Erlang identifiers. Until such a time, I regret I must divest > from version upgrades. I look forward to your response (and prompt bugfix). > -- > Josh March?n From mononcqc@REDACTED Sat Mar 30 14:42:02 2013 From: mononcqc@REDACTED (Fred Hebert) Date: Sat, 30 Mar 2013 09:42:02 -0400 Subject: [erlang-bugs] R16 breaks dots In-Reply-To: <20130329203406.GB1251@zushakon> References: <20130329203406.GB1251@zushakon> Message-ID: <20130330134201.GA22837@ferdmbp.local> >From memory, the problem is that support for periods in atoms was there in order to support packages, which would let you have an atom of the form topdir.subdir.module to represent items. When packages (an experimental feature few people used) got removed along with parametrized modules in R16, the code that allowed full stops in atoms also got the axe. Since then (and before packages), what you need to do to get that is wrap things up in single quotes. 'console.log'("Hello from Erlang"). Now for a less serious thing, I recommend you use the following in Javascript: var Log = function(args) { console.log(args) } And the following in Erlang: Log = fun(Args) -> io:format(Args) end Which means you can now use the fantastic 'Log("Hello!")' function everywhere you go! As Anthony said, I'm a bit surprised you're not willing to upgrade because the languages look different (they should look different, given they *are* different, in my opinion). Regards, Fred. On 03/29, Josh March???n wrote: > It's widely known that it's useful to be able to use > dots/periods/full-stops (choose your dialect) in Erlang code to maximize > compatibility, specially with more modern languages like JavaScript. > > Unfortunately for the world of Erlang, R16 breaks something that has been > tremendously useful. I am no longer able to do this: > > console.log("Hello from erlang", 1, 2, "More string here", TildePMe) > > which is leading to a lot of confusion when I regularly switch between > JavaScript and Erlang. > > I would like to formally request that $. once again become a valid > character in Erlang identifiers. Until such a time, I regret I must divest > from version upgrades. I look forward to your response (and prompt bugfix). > -- > Josh March?n > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs From carlsson.richard@REDACTED Sat Mar 30 23:53:43 2013 From: carlsson.richard@REDACTED (Richard Carlsson) Date: Sat, 30 Mar 2013 23:53:43 +0100 Subject: [erlang-bugs] R16 breaks dots In-Reply-To: <13603AE6-E8E6-4EAD-899F-35D9CD7D2187@gmail.com> References: <20130329203406.GB1251@zushakon> <13603AE6-E8E6-4EAD-899F-35D9CD7D2187@gmail.com> Message-ID: <51576CF7.6010905@gmail.com> On 2013-03-30 10:42, Anthony Ramine wrote: > I do want to know why dots aren't allowed in atoms anymore > and would like to see them back too. As Fred already mentioned, this feature was added as part of the "packages" and was removed along with them. > It was pretty useful to be able to write unquoted fully-qualified > node names in the prompt, e.g. foo@REDACTED I think that many agree on this, and maybe the OTP team can be convinced to take this part back. It should be pretty simple to extract the relevant code from the commit that removes packages. > Furthermore, it feels to me like their removal was a mistake, as > demonstrated by this: > > 1> foo.bar. * 1: syntax error before: '.' 1> foo. bar. foo 2> bar. > bar > > What you can see here is that the blanks after a dot are still > mandatory to properly parse a '.' character as a 'dot' token, > terminating an expression in the shell (or a form in a module), this > was mandatory to distinguish dot terminators from dots in atoms. > > If dots are really to not be allowed anymore in atoms, the blanks > should be made optional, to be consistent with the rest of the > language where blanks are optional before or after a symbol (with the > notable exception of a match '=' followed by a binary literal > '<<...>>'). This is not quite how the grammar works. First of all, the 'dot' token is identified as a "." followed by whitespace or a comment or EOF, and the packages addition did not change that. However, periods that are not a dot token or part of any other token are seen as '.' tokens. For example: 1> erl_scan:string("foo.bar. "). {ok,[{atom,1,foo},{'.',1},{atom,1,bar},{dot,1}],1} 2> erl_scan:string("foo. bar. "). {ok,[{atom,1,foo},{dot,1},{atom,1,bar},{dot,1}],1} Now, the Erlang parser works on complete "forms" at a time - these are the token sequences that are terminated by dot tokens. In the first case, you have one form containing three tokens. In the second case, you have two forms containing one token each. Blanks cannot be made optional after periods, because you must be able to distinguish between token sequences like these. It's also the case that you can't just change the scanning of atoms to allow periods as part of the atom token - in that case, the scanner would report a single atom for "foo.bar" instead of three tokens 'foo' '.' 'bar', and then the grammar would not be able to identify phrases like "Rec#foo.bar" or "#foo.bar". To support dotted atoms, the packages added a grammar rule that allowed a seqence '.' ... to be merged into a single atom unless it was part of another rule such as '#' '.' . (I think that Haskell had to do some similar tricks with their grammar to allow dotted names.) This could easily be put back in there. But at no point has it been the case in Erlang that unquoted atom tokens could contain periods. /Richard From n.oxyde@REDACTED Sun Mar 31 16:22:37 2013 From: n.oxyde@REDACTED (Anthony Ramine) Date: Sun, 31 Mar 2013 16:22:37 +0200 Subject: [erlang-bugs] Bit string generators, unsized binaries, modules and the REPL Message-ID: <649B6ECF-85AD-40BB-9CB1-C04DC348C499@gmail.com> Hello, People on IRC noticed a difference between compiled modules and the REPL in how some binary generators are handled. Compare: $ cat unsized_bin_gen_pat.erl -module(unsized_bin_gen_pat). -export([t/0]). t() -> << <> || <> <= <<1,2,3>> >>. $ erlc unsized_bin_gen_pat.erl $ erl 1> % compiled 1> unsized_bin_gen_pat:t(). <<1,2,3,2,3,3>> 2> % evaluated 2> << <> || <> <= <<1,2,3>> >>. <<1,2,3>> I don't think the compiler should be changed to behave like the REPL, nor I think the REPL should be changed to behave like the compiler. Instead, I think an unsized binary tail in the pattern of a binary generator does not make sense, and this should happen: $ erlc unsized_bin_gen_pat.erl unsized_bin_gen_pat.erl:3: binary fields without size are not allowed in patterns of bit string generators This patch implements this new error and simplifies how v3_core works with forbidden unsized tail segments in patterns of bit string generators. git fetch https://github.com/nox/otp illegal-bitstring-gen-pattern https://github.com/nox/otp/compare/erlang:maint...illegal-bitstring-gen-pattern https://github.com/nox/otp/compare/erlang:maint...illegal-bitstring-gen-pattern.patch Looking at the commit 5daa001 by Bj?rn Gustavsson "Don't generate multiple tail segments in binary matching", this patch will probably by rejected as it seems the compiler behaves as wanted by the OTP team. If this is indeed the case, erl_eval should be fixed. Regards, -- Anthony Ramine