From n.oxyde@REDACTED Mon Apr 1 18:34:35 2013 From: n.oxyde@REDACTED (Anthony Ramine) Date: Mon, 1 Apr 2013 18:34:35 +0200 Subject: [erlang-bugs] R16 breaks dots In-Reply-To: <51576CF7.6010905@gmail.com> References: <20130329203406.GB1251@zushakon> <13603AE6-E8E6-4EAD-899F-35D9CD7D2187@gmail.com> <51576CF7.6010905@gmail.com> Message-ID: <3E8948CA-EBD7-48EC-A9B8-6CEB8FF9C96F@gmail.com> Hello, I wrote a patch that allows dotted atoms everywhere in the syntax but record expressions, where they would be ambiguous. I wrote no tests as I have no idea where they should go. Furthermore, I think that maybe the pretty-printer should be patched to not quote dotted atoms outside record expressions. git fetch https://github.com/nox/otp.git restore-dotted-atoms https://github.com/nox/otp/compare/erlang:maint...restore-dotted-atoms https://github.com/nox/otp/compare/erlang:maint...restore-dotted-atoms.patch Regards, -- Anthony Ramine Le 30 mars 2013 ? 23:53, Richard Carlsson a ?crit : > I think that many agree on this, and maybe the OTP team can be convinced to take this part back. It should be pretty simple to extract the relevant code from the commit that removes packages. From drew.varner@REDACTED Mon Apr 1 21:10:23 2013 From: drew.varner@REDACTED (Drew Varner) Date: Mon, 1 Apr 2013 23:10:23 +0400 Subject: [erlang-bugs] Add default LDAP schemes to http_uri Message-ID: <0428A170-8333-4566-9C0C-D0A09076329C@redops.org> Is it reasonable to add ldap default ports to http_uri [1] since eldap is in Erlang? I have been fiddling with a module to fetch CRLs. I need to parse URIs for LDAP CRL distribution points. [1] https://github.com/erlang/otp/blob/maint/lib/inets/src/http_lib/http_uri.erl#L76 change to? scheme_defaults() -> [{http, 80}, {https, 443}, {ftp, 21}, {ssh, 22}, {sftp, 22}, {tftp, 69}, {ldap, 389}, {ldaps, 636}]. Thanks, Drew From paul@REDACTED Mon Apr 1 21:42:15 2013 From: paul@REDACTED (Paul Rubin) Date: Mon, 1 Apr 2013 12:42:15 -0700 Subject: [erlang-bugs] BEAM crash from GTK Message-ID: I have an application using the Ranch tcp pool that I'm trying to test at high concurrency. The most trivial test for this just opens a bunch of idle connections and waits for the user to do something. With 1000 connections it works fine and I can start the observer application. With 10000 connections, trying to start observer crashes the BEAM: 1> observer:start(). *** buffer overflow detected ***: /usr/lib64/erlang/erts-5.9.3.1/bin/beam.smp terminated ======= Backtrace: ========= /lib64/libc.so.6(__fortify_fail+0x37)[0x31b830a697] /lib64/libc.so.6[0x31b8308810] /lib64/libc.so.6[0x31b830a607] /lib64/libglib-2.0.so.0(g_spawn_sync+0x1cc)[0x3e230873dc] /lib64/libglib-2.0.so.0(g_spawn_command_line_sync+0x78)[0x3e23087a98] /lib64/libgio-2.0.so.0[0x3e258af8af] /lib64/libgio-2.0.so.0(g_dbus_address_get_for_bus_sync+0x2ca)[0x3e258b11ca] /lib64/libgio-2.0.so.0[0x3e258ba0be] /lib64/libgio-2.0.so.0(g_bus_get+0x54)[0x3e258ba1b4] /lib64/libgio-2.0.so.0(g_bus_watch_name+0xe8)[0x3e258c7eb8] /usr/lib64/gtk-2.0/2.10.0/immodules/im-ibus.so(+0x4370)[0x7fd0b1714370] /lib64/libgobject-2.0.so.0(g_type_class_ref+0x4d6)[0x3e2382e0b6] /lib64/libgobject-2.0.so.0(g_object_newv+0x831)[0x3e238165a1] /lib64/libgobject-2.0.so.0(g_object_new+0xec)[0x3e23816b3c] /usr/lib64/gtk-2.0/2.10.0/immodules/im-ibus.so(ibus_im_context_new+0x12)[0x7fd0b1714ed2] /lib64/libgtk-x11-2.0.so.0[0x3e2a532b26] /lib64/libgtk-x11-2.0.so.0[0x3e2a533409] /lib64/libgtk-x11-2.0.so.0[0x3e2a5336a6] /lib64/libwx_gtk2u_core-2.8.so.0(_ZN8wxWindow12PostCreationEv+0x54)[0x7fd0b3c09c14] /lib64/libwx_gtk2u_core-2.8.so.0(_ZN19wxTopLevelWindowGTK6CreateEP8wxWindowiRK8wxStringRK7wxPointRK6wxSizelS4_+0x324)[0x7fd0b3c04754] /lib64/libwx_gtk2u_core-2.8.so.0(_ZN7wxFrame6CreateEP8wxWindowiRK8wxStringRK7wxPointRK6wxSizelS4_+0x20)[0x7fd0b3c453d0] /usr/lib64/erlang/lib/wx-0.99.2/priv/wxe_driver.so(_Z19create_dummy_windowv+0xa9)[0x7fd0b8f257c9] /usr/lib64/erlang/lib/wx-0.99.2/priv/wxe_driver.so(_ZN6WxeApp6OnInitEv+0x1d6)[0x7fd0b8f27126] /lib64/libwx_baseu-2.8.so.0(_Z7wxEntryRiPPw+0x64)[0x7fd0b37418d4] /usr/lib64/erlang/lib/wx-0.99.2/priv/wxe_driver.so(_Z13wxe_main_loopPv+0x3f)[0x7fd0b8f2545f] /usr/lib64/erlang/erts-5.9.3.1/bin/beam.smp[0x594fe0] /lib64/libpthread.so.0[0x31b8607d15] /lib64/libc.so.6(clone+0x6d)[0x31b82f246d] I believe this is R15B but I'm not sure how to find out for certain. The version number of the erts might be enough. BEAM also dumps out a memory map that I can send if you need it. Since there is a process (hmm, maybe 2 processes) for each connection, my guess is that GTK or its wrapper layer is overflowing some internal table when trying to draw that many boxes on the screen. Regards --Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From dgud@REDACTED Tue Apr 2 15:20:59 2013 From: dgud@REDACTED (Dan Gudmundsson) Date: Tue, 2 Apr 2013 15:20:59 +0200 Subject: [erlang-bugs] BEAM crash from GTK In-Reply-To: References: Message-ID: Try running observer from another (erlang) node and connect to the node you want to observe, which you should do anyway to be sure that you don't crash the server node, if something bad happens as in your case. This seems to be very early before you drawing anything, so I think this is a possible gtk bug, out of file descriptors maybe? /Dan On Mon, Apr 1, 2013 at 9:42 PM, Paul Rubin wrote: > I have an application using the Ranch tcp pool that I'm trying to test at > high concurrency. The most trivial test for this just opens a bunch of > idle connections and waits for the user to do something. With 1000 > connections it works fine and I can start the observer application. With > 10000 connections, trying to start observer crashes the BEAM: > > 1> observer:start(). > *** buffer overflow detected ***: > /usr/lib64/erlang/erts-5.9.3.1/bin/beam.smp terminated > ======= Backtrace: ========= > /lib64/libc.so.6(__fortify_fail+0x37)[0x31b830a697] > /lib64/libc.so.6[0x31b8308810] > /lib64/libc.so.6[0x31b830a607] > /lib64/libglib-2.0.so.0(g_spawn_sync+0x1cc)[0x3e230873dc] > /lib64/libglib-2.0.so.0(g_spawn_command_line_sync+0x78)[0x3e23087a98] > /lib64/libgio-2.0.so.0[0x3e258af8af] > /lib64/libgio-2.0.so.0(g_dbus_address_get_for_bus_sync+0x2ca)[0x3e258b11ca] > /lib64/libgio-2.0.so.0[0x3e258ba0be] > /lib64/libgio-2.0.so.0(g_bus_get+0x54)[0x3e258ba1b4] > /lib64/libgio-2.0.so.0(g_bus_watch_name+0xe8)[0x3e258c7eb8] > /usr/lib64/gtk-2.0/2.10.0/immodules/im-ibus.so(+0x4370)[0x7fd0b1714370] > /lib64/libgobject-2.0.so.0(g_type_class_ref+0x4d6)[0x3e2382e0b6] > /lib64/libgobject-2.0.so.0(g_object_newv+0x831)[0x3e238165a1] > /lib64/libgobject-2.0.so.0(g_object_new+0xec)[0x3e23816b3c] > > /usr/lib64/gtk-2.0/2.10.0/immodules/im-ibus.so(ibus_im_context_new+0x12)[0x7fd0b1714ed2] > /lib64/libgtk-x11-2.0.so.0[0x3e2a532b26] > /lib64/libgtk-x11-2.0.so.0[0x3e2a533409] > /lib64/libgtk-x11-2.0.so.0[0x3e2a5336a6] > > /lib64/libwx_gtk2u_core-2.8.so.0(_ZN8wxWindow12PostCreationEv+0x54)[0x7fd0b3c09c14] > > /lib64/libwx_gtk2u_core-2.8.so.0(_ZN19wxTopLevelWindowGTK6CreateEP8wxWindowiRK8wxStringRK7wxPointRK6wxSizelS4_+0x324)[0x7fd0b3c04754] > > /lib64/libwx_gtk2u_core-2.8.so.0(_ZN7wxFrame6CreateEP8wxWindowiRK8wxStringRK7wxPointRK6wxSizelS4_+0x20)[0x7fd0b3c453d0] > > /usr/lib64/erlang/lib/wx-0.99.2/priv/wxe_driver.so(_Z19create_dummy_windowv+0xa9)[0x7fd0b8f257c9] > > /usr/lib64/erlang/lib/wx-0.99.2/priv/wxe_driver.so(_ZN6WxeApp6OnInitEv+0x1d6)[0x7fd0b8f27126] > /lib64/libwx_baseu-2.8.so.0(_Z7wxEntryRiPPw+0x64)[0x7fd0b37418d4] > > /usr/lib64/erlang/lib/wx-0.99.2/priv/wxe_driver.so(_Z13wxe_main_loopPv+0x3f)[0x7fd0b8f2545f] > /usr/lib64/erlang/erts-5.9.3.1/bin/beam.smp[0x594fe0] > /lib64/libpthread.so.0[0x31b8607d15] > /lib64/libc.so.6(clone+0x6d)[0x31b82f246d] > > I believe this is R15B but I'm not sure how to find out for certain. The > version number of the erts might be enough. BEAM also dumps out a memory > map > that I can send if you need it. Since there is a process (hmm, maybe 2 > processes) > for each connection, my guess is that GTK or its wrapper layer is > overflowing some > internal table when trying to draw that many boxes on the screen. > > Regards > > --Paul > > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pan@REDACTED Tue Apr 2 18:49:05 2013 From: pan@REDACTED (Patrik Nyblom) Date: Tue, 2 Apr 2013 18:49:05 +0200 Subject: [erlang-bugs] Bug with named subpatterns in re module In-Reply-To: References: <51542B1A.2010406@erlang.org> <51546C25.3090801@erlang.org> Message-ID: <515B0C01.7020406@erlang.org> Hi! On 03/28/2013 05:52 PM, Sergei Golovan wrote: > Hi! > > On Thu, Mar 28, 2013 at 8:13 PM, Patrik Nyblom wrote: >> Well, removing dupnames might be the easiest, but as there are perl >> semantics we can imitate, I think we should give it a try! > I should say that PCRE manual describes named subpatterns using the > following regexp: > > (?Mon|Fri|Sun)(?:day)?| > (?Tue)(?:sday)?| > (?Wed)(?:nesday)?| > (?Thu)(?:rsday)?| > (?Sat)(?:urday)? > > (search 'NAMED SUBPATTERNS' in http://www.pcre.org/pcre.txt). And currently > > 1> re:run("Monday", > "(?Mon|Fri|Sun)(?:day)?|(?Tue)(?:sday)?|(?Wed)(?:nesday)?|(?Thu)(?:rsday)?|(?Sat)(?:urday)?", > [dupnames, {capture, ['DN'], list}]). > {match,[[]]} > > doesn't work. If I leave only one branch it works fine: > 2> re:run("Monday", "(?Mon|Fri|Sun)(?:day)?", [dupnames, {capture, > ['DN'], list}]). > {match,["Mon"]} Yes, it's not really PCRE's fault, it's up to the user of the library (i.e. re) not to use the one-to-one mapping when using dupnames. I shouldn't have allowed dupnames if I wasn't to handle them as I described in my last post, i.e. by digging out the full one-to-many mapping between names and subpattern indexes. The only thing i'm still wondering about is a good semantics for capturing 'all'. Maybe we shouldn't touch that and should concentrate on the capturing of specific names, but it feels like we should have an 'all_names' option... Also, I think I should bump the PCRE version while at it, there are some issues with Unicode that was discussed earlier on some of the lists... > > Cheers! Cheers, /Patrik From fredrik@REDACTED Wed Apr 3 17:07:00 2013 From: fredrik@REDACTED (Fredrik) Date: Wed, 3 Apr 2013 17:07:00 +0200 Subject: [erlang-bugs] Dialyzer bug: incorrect duplicate modules In-Reply-To: References: Message-ID: <515C4594.1060100@erlang.org> On 03/21/2013 07:40 AM, Maxim Treskin wrote: > Hello > > At Montenegro Erlang Hackaton ( > http://lanyrd.com/2013/herceg-novi-erlang-meetup/ , there were only > two people, unfortunately ) we found incorrect behaviour of Dialyzer. > > Our project erroneous had a duplicated modules with the same name, but > different content. When we check it with dialyzer it show me something > like that: > > Duplicate modules: [["/var/tmp/myproj/apps/myproj/ebin/psc_operate.beam", > > "/var/tmp/myproj/deps/somedep/ebin/amp_common_utils.beam"]] > > Obviously it is not the same modules. So I had to search this bug and > find strange behaviour in dialyzer. Function lists:zip/2 called with > two list, where first is reversed list of modules as atom, and second > is list of filepaths for modules. And this list not always contains > correspond elements. Module with name some_module1 can be has filename > like abc_module55.beam. This is the cause of error. > > This bug exists in R15B02 and R16. > > I wrote such patch to fix bug, but I don't know whether this is > solution or not, though it works fine. > > --- /opt/r16a/lib/dialyzer-2.5.4/src/dialyzer_analysis_callgraph.erl > 2013-01-31 12:55:53.210402846 +0700 > +++ dialyzer_pa/dialyzer_analysis_callgraph.erl 2013-03-21 > 13:20:46.794991889 +0700 > @@ -255,10 +255,18 @@ > CServer2 = dialyzer_codeserver:set_next_core_label(NextLabel, CServer), > case Failed =:= [] of > true -> > - NewFiles = lists:zip(lists:reverse(Modules), Files), > + %% Modules and Files have not the same order, so it is > meaningless to zip it > + %% NewFiles = lists:zip(lists:reverse(Modules), Files), > + > ModDict = > - lists:foldl(fun({Mod, F}, Dict) -> dict:append(Mod, F, Dict) end, > - dict:new(), NewFiles), > + lists:foldl(fun(F, Dict) -> > + ModFile = lists:last(filename:split(F)), > + Mod = filename:basename(ModFile, ".beam"), > + dict:append(Mod, F, Dict) end, > + dict:new(), Files), > + %% ModDict = > + %% lists:foldl(fun({Mod, F}, Dict) -> dict:append(Mod, F, > Dict) end, > + %% dict:new(), NewFiles), > check_for_duplicate_modules(ModDict); > false -> > Msg = io_lib:format("Could not scan the following file(s): ~p", > > > -- > Max Treskin > > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs Hello Maxim, Follow https://github.com/erlang/otp/wiki/submitting-patches and your patch will be reviewed! -- BR Fredrik Gustafsson Erlang OTP Team -------------- next part -------------- An HTML attachment was scrubbed... URL: From kostis@REDACTED Wed Apr 3 17:11:58 2013 From: kostis@REDACTED (Kostis Sagonas) Date: Wed, 03 Apr 2013 18:11:58 +0300 Subject: [erlang-bugs] Dialyzer bug: incorrect duplicate modules In-Reply-To: <515C4594.1060100@erlang.org> References: <515C4594.1060100@erlang.org> Message-ID: <515C46BE.9080107@cs.ntua.gr> On 04/03/13 18:07, Fredrik wrote: > > Hello Maxim, > Follow https://github.com/erlang/otp/wiki/submitting-patches and your > patch will be reviewed! We have already reviewed this patch and we have included it in some dialyzer branch which we will submit soon to this mailing list. So, you can ignore Maxim's patch. Kostis From bgustavsson@REDACTED Thu Apr 4 15:38:31 2013 From: bgustavsson@REDACTED (=?UTF-8?Q?Bj=C3=B6rn_Gustavsson?=) Date: Thu, 4 Apr 2013 15:38:31 +0200 Subject: [erlang-bugs] R16 breaks dots In-Reply-To: <3E8948CA-EBD7-48EC-A9B8-6CEB8FF9C96F@gmail.com> References: <20130329203406.GB1251@zushakon> <13603AE6-E8E6-4EAD-899F-35D9CD7D2187@gmail.com> <51576CF7.6010905@gmail.com> <3E8948CA-EBD7-48EC-A9B8-6CEB8FF9C96F@gmail.com> Message-ID: On Mon, Apr 1, 2013 at 6:34 PM, Anthony Ramine wrote: > Hello, > > I wrote a patch that allows dotted atoms everywhere in the syntax but > record expressions, where they would be ambiguous. I wrote no tests as I > have no idea where they should go. Furthermore, I think that maybe the > pretty-printer should be patched to not quote dotted atoms outside record > expressions. > > git fetch https://github.com/nox/otp.git restore-dotted-atoms > > > https://github.com/nox/otp/compare/erlang:maint...restore-dotted-atoms > > https://github.com/nox/otp/compare/erlang:maint...restore-dotted-atoms.patch > > Rejected. The Technical Board did discuss the matter of dots in atoms and decided that the support should be removed along with packages. /Bjorn -- Bj?rn Gustavsson, Erlang/OTP, Ericsson AB -------------- next part -------------- An HTML attachment was scrubbed... URL: From n.oxyde@REDACTED Thu Apr 4 15:45:58 2013 From: n.oxyde@REDACTED (Anthony Ramine) Date: Thu, 4 Apr 2013 15:45:58 +0200 Subject: [erlang-bugs] R16 breaks dots In-Reply-To: References: <20130329203406.GB1251@zushakon> <13603AE6-E8E6-4EAD-899F-35D9CD7D2187@gmail.com> <51576CF7.6010905@gmail.com> <3E8948CA-EBD7-48EC-A9B8-6CEB8FF9C96F@gmail.com> Message-ID: <3F6CD158-AF0E-4F16-AB72-4DFE84118C0B@gmail.com> Hello Bj?rn, Could we have more details about why the Technical Board decided that? Regards, -- Anthony Ramine Le 4 avr. 2013 ? 15:38, Bj?rn Gustavsson a ?crit : > > On Mon, Apr 1, 2013 at 6:34 PM, Anthony Ramine wrote: > Hello, > > I wrote a patch that allows dotted atoms everywhere in the syntax but record expressions, where they would be ambiguous. I wrote no tests as I have no idea where they should go. Furthermore, I think that maybe the pretty-printer should be patched to not quote dotted atoms outside record expressions. > > git fetch https://github.com/nox/otp.git restore-dotted-atoms > > https://github.com/nox/otp/compare/erlang:maint...restore-dotted-atoms > https://github.com/nox/otp/compare/erlang:maint...restore-dotted-atoms.patch > > > Rejected. > > The Technical Board did discuss the matter of dots in atoms and > decided that the support should be removed along with packages. > > /Bjorn > > -- > Bj?rn Gustavsson, Erlang/OTP, Ericsson AB From bgustavsson@REDACTED Thu Apr 4 15:46:58 2013 From: bgustavsson@REDACTED (=?UTF-8?Q?Bj=C3=B6rn_Gustavsson?=) Date: Thu, 4 Apr 2013 15:46:58 +0200 Subject: [erlang-bugs] [erlang-patches] Bit string generators, unsized binaries, modules and the REPL In-Reply-To: <649B6ECF-85AD-40BB-9CB1-C04DC348C499@gmail.com> References: <649B6ECF-85AD-40BB-9CB1-C04DC348C499@gmail.com> Message-ID: On Sun, Mar 31, 2013 at 4:22 PM, Anthony Ramine wrote: > > This patch implements this new error and simplifies how v3_core works with > forbidden unsized tail segments in patterns of bit string generators. > > git fetch https://github.com/nox/otp illegal-bitstring-gen-pattern > > > https://github.com/nox/otp/compare/erlang:maint...illegal-bitstring-gen-pattern > > https://github.com/nox/otp/compare/erlang:maint...illegal-bitstring-gen-pattern.patch > > Looking at the commit 5daa001 by Bj?rn Gustavsson "Don't generate multiple > tail segments in binary matching", this patch will probably by rejected as > it seems the compiler behaves as wanted by the OTP team. If this is indeed > the case, erl_eval should be fixed. > > Not really. I noticed that the implementation was flaky and fixed it. The Technical Board will discuss the matter. -- Bj?rn Gustavsson, Erlang/OTP, Ericsson AB -------------- next part -------------- An HTML attachment was scrubbed... URL: From pan@REDACTED Thu Apr 4 16:43:31 2013 From: pan@REDACTED (Patrik Nyblom) Date: Thu, 4 Apr 2013 16:43:31 +0200 Subject: [erlang-bugs] R15B01 erlang:now() jumping ~24 days into the future In-Reply-To: References: <513F2FD4.8010108@erlang.org> Message-ID: <515D9193.8080503@erlang.org> Hi! On 03/12/2013 04:36 PM, Garret Smith wrote: > > > On Mar 12, 2013 6:38 AM, "Patrik Nyblom" > wrote: > > > > Hi! > > > > There's a patched version of the R15B02 dll in my public dropbox, > under the name r15.beam.smp.dll: > > > > http://dl.dropbox.com/u/17212223/r15.beam.smp.dll > > R15B02 will work. I'll get started but it will take a couple days to > get everything built, deployed and watch for time jumps. > > Thank you for the binary! > Any progress? Does it work (just curious :)) Cheers, /Patrik > > > > > If you replace the R15 beam.smp.dll with this one, the werl slogan > should contain the version erts-5.9.2.0.1, if you could try that on > the real app, I would be immensely grateful! > > > > Cheers, > > /Patrik > > > > On 03/12/2013 02:09 PM, Vance Shipley wrote: > >> > >> C > >> > >> On Mar 5, 2013 6:56 AM, "Garret Smith" > wrote: > >>> > >>> I have been beating my head against a wall for weeks tracking down > spooky behaviour[sic] in one of our production systems. I finally > tracked it down to "jumps" in the times returned by erlang:now(), > causing all timers in the system to expire at once. I have witnessed > this bug on R15B01, both 64 and 32-bit versions running on Windows > Server 2008 R2, both on bare metal and VirtualBox VM. > >>> > >>> The time jump is always around 2126000 seconds, or a little over > 24 days. The now() time does not try to converge with os:timestamp() > as the documentation suggests, and as I confirmed it does if you just > change the system clock. > >>> > >>> Another VM running concurrently on the same machine but with > little load (diagnostic node & production node) did not time jump. > >>> > >>> Higher load seems to make the time jumps happen more often. > >>> > >>> Frequency between time jumps varies between seconds and hours, but > when a jump occurs, it is always 2126000 + (9 to 26) seconds. > >>> > >>> I never see the jump in logfile timestamps that use os:timestamp() > for tagging log messages. I had to start tracing a production node > before I caught the jump. Here are some lines from a trace, where the > timestamp in trace_ts is printed using calendar:now_to_local_time() > and then in raw tuple format: > >>> > >>> 2013-4-16 21:40:1.993399|{1366,173601,993399} > >>> 2013-4-16 21:40:1.993400|{1366,173601,993400} > >>> 2013-5-11 12:13:41.986961|{1368,299621,986961} > >>> 2013-5-11 12:13:41.986962|{1368,299621,986962} > >>> > >>> then a bit later... > >>> > >>> 2013-5-11 12:36:19.955129|{1368,300979,955129} > >>> 2013-5-11 12:36:19.955130|{1368,300979,955130} > >>> 2013-6-5 3:9:49.538830|{1370,426989,538830} > >>> 2013-6-5 3:9:49.538833|{1370,426989,538833} > >>> > >>> I captured many such jumps over the course of a day or so. > Obviously from the dates, 2 jumps happened before I started tracing. > >>> > >>> I was able to reproduce the bug, though not as efficiently as my > production system, with the following sample program: > https://gist.github.com/garret-smith/5087169 > >>> > >>> It took over an hour of runtime before the first time jump. I am > working on a better way to reproduce it at the moment, but it's hard > to test the test with a bug so intermittent. > >>> > >>> I am also testing various other VM versions. My first hope was > that this was limited to the 64-bit version where we first encountered > the problem, but a change to the 32-bit version has only made the > problem happen less often, not eliminated it. > >>> > >>> We never saw this bug with R14B03 which we were running previously > to R15B01. However, system load is different so I can't make a direct > comparison. I did notice a few significant updates to the Windows > time related code between R14B03 and R15: > >>> > >>> git log sys_time.c > >>> > >>> commit 46eb4359b05b220861453a869dc734480ec045a6 > >>> Author: Patrik Nyblom > > >>> Date: Tue Dec 6 19:07:16 2011 +0100 > >>> > >>> Emulate localtime, gmtime and mktime to enable negative time_t > >>> > >>> commit 913f05af100e98a8665bbb6168e89fbcfe4ece75 > >>> Author: Bjrn-Egil Dahlberg > > >>> Date: Fri Dec 2 15:25:06 2011 +0100 > >>> > >>> Teach windows sys_localtime_r > >>> > >>> > >>> I am completely stumped. What can I do next to help track down > the source of the bug? > >>> > >>> Thanks, > >>> Garret Smith > >>> > >>> _______________________________________________ > >>> erlang-bugs mailing list > >>> erlang-bugs@REDACTED > >>> http://erlang.org/mailman/listinfo/erlang-bugs > >>> > >> > >> > >> _______________________________________________ > >> erlang-bugs mailing list > >> erlang-bugs@REDACTED > >> http://erlang.org/mailman/listinfo/erlang-bugs > > > > > > > > _______________________________________________ > > erlang-bugs mailing list > > erlang-bugs@REDACTED > > http://erlang.org/mailman/listinfo/erlang-bugs > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From garret.smith@REDACTED Thu Apr 4 19:24:22 2013 From: garret.smith@REDACTED (Garret Smith) Date: Thu, 4 Apr 2013 10:24:22 -0700 Subject: [erlang-bugs] R15B01 erlang:now() jumping ~24 days into the future In-Reply-To: <515D9193.8080503@erlang.org> References: <513F2FD4.8010108@erlang.org> <515D9193.8080503@erlang.org> Message-ID: Your patch fixes the problem in our duration testing, but unfortunately I haven't got it on a production site yet. Hopefully "real soon now". On Thu, Apr 4, 2013 at 7:43 AM, Patrik Nyblom wrote: > Hi! > > > On 03/12/2013 04:36 PM, Garret Smith wrote: > > > On Mar 12, 2013 6:38 AM, "Patrik Nyblom" wrote: > > > > Hi! > > > > There's a patched version of the R15B02 dll in my public dropbox, under > the name r15.beam.smp.dll: > > > > http://dl.dropbox.com/u/17212223/r15.beam.smp.dll > > R15B02 will work. I'll get started but it will take a couple days to get > everything built, deployed and watch for time jumps. > > Thank you for the binary! > > Any progress? Does it work (just curious :)) > > Cheers, > /Patrik > > > > > If you replace the R15 beam.smp.dll with this one, the werl slogan > should contain the version erts-5.9.2.0.1, if you could try that on the > real app, I would be immensely grateful! > > > > Cheers, > > /Patrik > > > > On 03/12/2013 02:09 PM, Vance Shipley wrote: > >> > >> C > >> > >> On Mar 5, 2013 6:56 AM, "Garret Smith" wrote: > >>> > >>> I have been beating my head against a wall for weeks tracking down > spooky behaviour[sic] in one of our production systems. I finally tracked > it down to "jumps" in the times returned by erlang:now(), causing all > timers in the system to expire at once. I have witnessed this bug on > R15B01, both 64 and 32-bit versions running on Windows Server 2008 R2, both > on bare metal and VirtualBox VM. > >>> > >>> The time jump is always around 2126000 seconds, or a little over 24 > days. The now() time does not try to converge with os:timestamp() as the > documentation suggests, and as I confirmed it does if you just change the > system clock. > >>> > >>> Another VM running concurrently on the same machine but with little > load (diagnostic node & production node) did not time jump. > >>> > >>> Higher load seems to make the time jumps happen more often. > >>> > >>> Frequency between time jumps varies between seconds and hours, but > when a jump occurs, it is always 2126000 + (9 to 26) seconds. > >>> > >>> I never see the jump in logfile timestamps that use os:timestamp() for > tagging log messages. I had to start tracing a production node before I > caught the jump. Here are some lines from a trace, where the timestamp in > trace_ts is printed using calendar:now_to_local_time() and then in raw > tuple format: > >>> > >>> 2013-4-16 21:40:1.993399|{1366,173601,993399} > >>> 2013-4-16 21:40:1.993400|{1366,173601,993400} > >>> 2013-5-11 12:13:41.986961|{1368,299621,986961} > >>> 2013-5-11 12:13:41.986962|{1368,299621,986962} > >>> > >>> then a bit later... > >>> > >>> 2013-5-11 12:36:19.955129|{1368,300979,955129} > >>> 2013-5-11 12:36:19.955130|{1368,300979,955130} > >>> 2013-6-5 3:9:49.538830|{1370,426989,538830} > >>> 2013-6-5 3:9:49.538833|{1370,426989,538833} > >>> > >>> I captured many such jumps over the course of a day or so. Obviously > from the dates, 2 jumps happened before I started tracing. > >>> > >>> I was able to reproduce the bug, though not as efficiently as my > production system, with the following sample program: > https://gist.github.com/garret-smith/5087169 > >>> > >>> It took over an hour of runtime before the first time jump. I am > working on a better way to reproduce it at the moment, but it's hard to > test the test with a bug so intermittent. > >>> > >>> I am also testing various other VM versions. My first hope was that > this was limited to the 64-bit version where we first encountered the > problem, but a change to the 32-bit version has only made the problem > happen less often, not eliminated it. > >>> > >>> We never saw this bug with R14B03 which we were running previously to > R15B01. However, system load is different so I can't make a direct > comparison. I did notice a few significant updates to the Windows time > related code between R14B03 and R15: > >>> > >>> git log sys_time.c > >>> > >>> commit 46eb4359b05b220861453a869dc734480ec045a6 > >>> Author: Patrik Nyblom > >>> Date: Tue Dec 6 19:07:16 2011 +0100 > >>> > >>> Emulate localtime, gmtime and mktime to enable negative time_t > >>> > >>> commit 913f05af100e98a8665bbb6168e89fbcfe4ece75 > >>> Author: Bjrn-Egil Dahlberg > >>> Date: Fri Dec 2 15:25:06 2011 +0100 > >>> > >>> Teach windows sys_localtime_r > >>> > >>> > >>> I am completely stumped. What can I do next to help track down the > source of the bug? > >>> > >>> Thanks, > >>> Garret Smith > >>> > >>> _______________________________________________ > >>> erlang-bugs mailing list > >>> erlang-bugs@REDACTED > >>> http://erlang.org/mailman/listinfo/erlang-bugs > >>> > >> > >> > >> _______________________________________________ > >> erlang-bugs mailing list > >> erlang-bugs@REDACTED > >> http://erlang.org/mailman/listinfo/erlang-bugs > > > > > > > > _______________________________________________ > > erlang-bugs mailing list > > erlang-bugs@REDACTED > > http://erlang.org/mailman/listinfo/erlang-bugs > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From simon@REDACTED Fri Apr 5 11:41:44 2013 From: simon@REDACTED (Simon Smithies) Date: Fri, 5 Apr 2013 22:41:44 +1300 Subject: [erlang-bugs] erlang:now() and erlang:localtime() return different results Message-ID: Hi, Erlang newbie here ... I have two Debian linux VPSs, both largely identical except for Erlang versions, running in Virtuozzo containers with the same provider. On the VPS running Erlang R14B04, erlang:now() and erlang:localtime() return identical results (as they do on my local machine, and some others we've tried): 1> {calendar:now_to_local_time(erlang:now()), erlang:localtime()}. {{{2013,4,5},{22,12,57}},{{2013,4,5},{22,12,57}}} On the other VPS however, running Erlang R15B01, I get different results: 1> {calendar:now_to_local_time(erlang:now()), erlang:localtime()}. {{{2013,4,5},{22,35,29}},{{2013,4,5},{22,30,29}}} As background, I'm running Zotonic on both VPSs; on the R15B01 machine Zotonic is being regularly restarted by the heart process -- every 5 minutes. Have been talking to the Zotonic people about this, and in the absence of a better explanation, we're thinking this time difference might be the cause. Anyway, logging here as the difference seems like a bug in Erlang. - Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From pan@REDACTED Fri Apr 5 12:27:27 2013 From: pan@REDACTED (Patrik Nyblom) Date: Fri, 5 Apr 2013 12:27:27 +0200 Subject: [erlang-bugs] erlang:now() and erlang:localtime() return different results In-Reply-To: References: Message-ID: <515EA70F.2020603@erlang.org> On 04/05/2013 11:41 AM, Simon Smithies wrote: > Hi, > > Erlang newbie here ... > > I have two Debian linux VPSs, both largely identical except for Erlang > versions, running in Virtuozzo containers with the same provider. > > On the VPS running Erlang R14B04, erlang:now() and erlang:localtime() > return identical results (as they do on my local machine, and some > others we've tried): > 1> {calendar:now_to_local_time(erlang:now()), erlang:localtime()}. > {{{2013,4,5},{22,12,57}},{{2013,4,5},{22,12,57}}} > > On the other VPS however, running Erlang R15B01, I get different results: > 1> {calendar:now_to_local_time(erlang:now()), erlang:localtime()}. > {{{2013,4,5},{22,35,29}},{{2013,4,5},{22,30,29}}} > > As background, I'm running Zotonic on both VPSs; on the R15B01 machine > Zotonic is being regularly restarted by the heart process -- every 5 > minutes. Have been talking to the Zotonic people about this, and in > the absence of a better explanation, we're thinking this time > difference might be the cause. > This has probably nothing to do with the windows bug. If the two values (now() and localtime()) differ, it's usually because the wall clock time on the system has been adjusted after Erlang was started. (by NTP or a manual time adjustment). If the system restarts, it may have a slightly skewed time at start and then it's adjusted by some means. erlang:now() has to slowly adapt to the new world if you set the time. The realtime properties requires that time in the system does not jump, but you would not want now() and localtime() to move away from each other, why now() slowly adjusts. In this case, now() will move 1% slower until it converges with the localtime(), which means that it takes 500 minutes before they are in sync. Setting the time to a correct value before starting Erlang is the solution if this is not acceptable. Try starting another node on the Zotonic, or restarting Erlang, and you should see similar times. Or try syncing the time before Erlang starts (ntpdate). You should also be able to see in files like /var/log/messages that the wall clock jumps. > Anyway, logging here as the difference seems like a bug in Erlang. > > - Simon Cheers, /Patrik > > > > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs -------------- next part -------------- An HTML attachment was scrubbed... URL: From n.oxyde@REDACTED Wed Apr 10 00:19:31 2013 From: n.oxyde@REDACTED (Anthony Ramine) Date: Wed, 10 Apr 2013 00:19:31 +0200 Subject: [erlang-bugs] Use a set to store ref registers in beam_receive Message-ID: <312DC303-E319-48DA-A035-AEFCE3976B7D@gmail.com> Hello, In some circumstances, as when inlining code, when some optimization passes are disabled or with hand-written but semantically correct Core Erlang or BEAM assembly, a fresh reference may be live in more than one register: ... {allocate_zero,2,2}. ... {call_ext,0,{extfunc,erlang,make_ref,0}}. % Ref in [x0] ... {move,{x,0},{y,0}}. % Ref in [x0,y0] {move,{y,1},{x,0}}. % Ref in [y0] ... {move,{y,0},{x,0}}. % Ref in [x0,y0] {move,{x,0},{y,1}}. % Ref in [x0,y0,y1] {label,5}. {loop_rec,{f,6},{x,0}}. % Ref in [y0,y1] ... {loop_rec_end,{f,5}}. {label,6}. {wait,{f,5}}. ... Pass beam_receive expects a single live register for the ref when it encounters the loop_rec instruction and crashes with the following reason: $ erlc t.S ... crash reason: {{case_clause, {'EXIT', {{case_clause,[{y,1},{y,0}]}, [{beam_receive,opt_recv,5, [{file,"beam_receive.erl"},{line,154}]}, ...]}}}, ...} This patch teaches beam_receive how to use a set of registers instead of a single one when tracking fresh references, thus avoiding the crash. git fetch https://github.com/nox/otp.git fix-multiple-ref-regs https://github.com/nox/otp/compare/erlang:maint...fix-multiple-ref-regs https://github.com/nox/otp/compare/erlang:maint...fix-multiple-ref-regs.patch This is yet again something that I encountered while working on the file optimization branch file-receive-optim: https://gist.github.com/nox/2e33fe9a85e035caadda#file-t-codegen Regards, -- Anthony Ramine From fredrik@REDACTED Wed Apr 10 10:02:05 2013 From: fredrik@REDACTED (Fredrik) Date: Wed, 10 Apr 2013 10:02:05 +0200 Subject: [erlang-bugs] [erlang-patches] Use a set to store ref registers in beam_receive In-Reply-To: <312DC303-E319-48DA-A035-AEFCE3976B7D@gmail.com> References: <312DC303-E319-48DA-A035-AEFCE3976B7D@gmail.com> Message-ID: <51651C7D.8080307@erlang.org> On 04/10/2013 12:19 AM, Anthony Ramine wrote: > Hello, > > In some circumstances, as when inlining code, when some optimization passes are disabled or with hand-written but semantically correct Core Erlang or BEAM assembly, a fresh reference may be live in more than one register: > > ... > {allocate_zero,2,2}. > ... > {call_ext,0,{extfunc,erlang,make_ref,0}}. % Ref in [x0] > ... > {move,{x,0},{y,0}}. % Ref in [x0,y0] > {move,{y,1},{x,0}}. % Ref in [y0] > ... > {move,{y,0},{x,0}}. % Ref in [x0,y0] > {move,{x,0},{y,1}}. % Ref in [x0,y0,y1] > {label,5}. > {loop_rec,{f,6},{x,0}}. % Ref in [y0,y1] > ... > {loop_rec_end,{f,5}}. > {label,6}. > {wait,{f,5}}. > ... > > Pass beam_receive expects a single live register for the ref when it encounters the loop_rec instruction and crashes with the following reason: > > $ erlc t.S > ... > crash reason: {{case_clause, > {'EXIT', > {{case_clause,[{y,1},{y,0}]}, > [{beam_receive,opt_recv,5, > [{file,"beam_receive.erl"},{line,154}]}, > ...]}}}, > ...} > > This patch teaches beam_receive how to use a set of registers instead of a single one when tracking fresh references, thus avoiding the crash. > > git fetch https://github.com/nox/otp.git fix-multiple-ref-regs > > https://github.com/nox/otp/compare/erlang:maint...fix-multiple-ref-regs > https://github.com/nox/otp/compare/erlang:maint...fix-multiple-ref-regs.patch > > This is yet again something that I encountered while working on the file optimization branch file-receive-optim: > > https://gist.github.com/nox/2e33fe9a85e035caadda#file-t-codegen > > Regards, > Hello Anthony, Fetched your branch. Currently in the review state and in 'pu' branch. -- BR Fredrik Gustafsson Erlang OTP Team From bgustavsson@REDACTED Wed Apr 10 12:19:17 2013 From: bgustavsson@REDACTED (=?UTF-8?Q?Bj=C3=B6rn_Gustavsson?=) Date: Wed, 10 Apr 2013 12:19:17 +0200 Subject: [erlang-bugs] Use a set to store ref registers in beam_receive In-Reply-To: <312DC303-E319-48DA-A035-AEFCE3976B7D@gmail.com> References: <312DC303-E319-48DA-A035-AEFCE3976B7D@gmail.com> Message-ID: On Wed, Apr 10, 2013 at 12:19 AM, Anthony Ramine wrote: > > This patch teaches beam_receive how to use a set of registers instead of a > single one when tracking fresh references, thus avoiding the crash. > > git fetch https://github.com/nox/otp.git fix-multiple-ref-regs > > > https://github.com/nox/otp/compare/erlang:maint...fix-multiple-ref-regs > > https://github.com/nox/otp/compare/erlang:maint...fix-multiple-ref-regs.patch > > Could you write a test case? A copy of the receive_SUITE will be compiled with the inline option. If the inlining is not aggressive enough to provoke the bug, you can add the test case to inline_SUITE. -- Bj?rn Gustavsson, Erlang/OTP, Ericsson AB -------------- next part -------------- An HTML attachment was scrubbed... URL: From n.oxyde@REDACTED Wed Apr 10 12:22:38 2013 From: n.oxyde@REDACTED (Anthony Ramine) Date: Wed, 10 Apr 2013 12:22:38 +0200 Subject: [erlang-bugs] Use a set to store ref registers in beam_receive In-Reply-To: References: <312DC303-E319-48DA-A035-AEFCE3976B7D@gmail.com> Message-ID: Hello Bj?rn, I'm not sure I understand what you mean, should I add my example function into receive_SUITE.erl as is to test that? I'm not sure that would demonstrate anything in the long term as my other cooking patch move-let-into-seq makes the culprit code where a reference is live in two Y registers disappear. Regards, -- Anthony Ramine Le 10 avr. 2013 ? 12:19, Bj?rn Gustavsson a ?crit : > Could you write a test case? > > A copy of the receive_SUITE will be compiled with the inline option. > If the inlining is not aggressive enough to provoke the bug, you can > add the test case to inline_SUITE. > > -- > Bj?rn Gustavsson, Erlang/OTP, Ericsson AB From bgustavsson@REDACTED Thu Apr 11 16:01:31 2013 From: bgustavsson@REDACTED (=?UTF-8?Q?Bj=C3=B6rn_Gustavsson?=) Date: Thu, 11 Apr 2013 16:01:31 +0200 Subject: [erlang-bugs] Use a set to store ref registers in beam_receive In-Reply-To: References: <312DC303-E319-48DA-A035-AEFCE3976B7D@gmail.com> Message-ID: On Wed, Apr 10, 2013 at 12:22 PM, Anthony Ramine wrote: > Hello Bj?rn, > > I'm not sure I understand what you mean, should I add my example function > into receive_SUITE.erl as is to test that? I'm not sure that would > demonstrate anything in the long term as my other cooking patch > move-let-into-seq makes the culprit code where a reference is live in two Y > registers disappear. > > OK. I realize that it is hard or impossible to construct a test case that would test something that is not already tested by existing test cases. Your patch correctly fixes the bug, but I have some comments and suggestions for further simplification: The name of the RefReg variable is now misleading, since it contains a register set. (Suggested new name: RefRegSet, RefRegs, or RefSet.) The comment for opt_ref_used/4 needs to be updated. In opt_recv/5, my original code looked like: case regs_to_list(R) of [{y,_}=RefReg] -> ... The matching of {y,_} is just a cheap assertion (only added because it was almost free). Since your new code sends the register set to the opt_ref_used/4, there is no longer any need to convert the register set to a list. Thus we can write: case regs_empty(R) of false -> ... and remove the regs_to_list/1 function. Finally, for clarity I would add parenthesis in is_ref_msg_comparison/3: is_ref_msg_comparison([R1,R2], RefReg, Regs) -> (regs_is_member(R2, RefReg) andalso regs_is_member(R1, Regs)) orelse (regs_is_member(R1, RefReg) andalso regs_is_member(R2, Regs)). -- Bj?rn Gustavsson, Erlang/OTP, Ericsson AB -------------- next part -------------- An HTML attachment was scrubbed... URL: From n.oxyde@REDACTED Thu Apr 11 20:05:53 2013 From: n.oxyde@REDACTED (Anthony Ramine) Date: Thu, 11 Apr 2013 20:05:53 +0200 Subject: [erlang-bugs] Use a set to store ref registers in beam_receive In-Reply-To: References: <312DC303-E319-48DA-A035-AEFCE3976B7D@gmail.com> Message-ID: <8FB19F99-1CA0-429A-A2A6-CDBA51BB7191@gmail.com> I added a test and clarified the code following your suggestions. I noticed that RefReg wasn't a descriptive enough name but I am ashamed to admit I optimized for diff shortness, sorry. Regards, -- Anthony Ramine Le 11 avr. 2013 ? 16:01, Bj?rn Gustavsson a ?crit : > On Wed, Apr 10, 2013 at 12:22 PM, Anthony Ramine wrote: > Hello Bj?rn, > > I'm not sure I understand what you mean, should I add my example function into receive_SUITE.erl as is to test that? I'm not sure that would demonstrate anything in the long term as my other cooking patch move-let-into-seq makes the culprit code where a reference is live in two Y registers disappear. > > > OK. I realize that it is hard or impossible to > construct a test case that would test something > that is not already tested by existing test cases. > > Your patch correctly fixes the bug, but I > have some comments and suggestions > for further simplification: > > The name of the RefReg variable is now > misleading, since it contains a register set. > (Suggested new name: RefRegSet, RefRegs, > or RefSet.) > > The comment for opt_ref_used/4 needs to > be updated. > > In opt_recv/5, my original code looked like: > > case regs_to_list(R) of > [{y,_}=RefReg] -> ... > > The matching of {y,_} is just a cheap > assertion (only added because it was > almost free). > > Since your new code sends the register > set to the opt_ref_used/4, there is no > longer any need to convert the register > set to a list. Thus we can write: > > case regs_empty(R) of > false -> ... > > and remove the regs_to_list/1 function. > > Finally, for clarity I would add parenthesis > in is_ref_msg_comparison/3: > > is_ref_msg_comparison([R1,R2], RefReg, Regs) -> > (regs_is_member(R2, RefReg) andalso regs_is_member(R1, Regs)) orelse > (regs_is_member(R1, RefReg) andalso regs_is_member(R2, Regs)). > > > -- > Bj?rn Gustavsson, Erlang/OTP, Ericsson AB From bgustavsson@REDACTED Fri Apr 12 07:20:55 2013 From: bgustavsson@REDACTED (=?UTF-8?Q?Bj=C3=B6rn_Gustavsson?=) Date: Fri, 12 Apr 2013 07:20:55 +0200 Subject: [erlang-bugs] Use a set to store ref registers in beam_receive In-Reply-To: <8FB19F99-1CA0-429A-A2A6-CDBA51BB7191@gmail.com> References: <312DC303-E319-48DA-A035-AEFCE3976B7D@gmail.com> <8FB19F99-1CA0-429A-A2A6-CDBA51BB7191@gmail.com> Message-ID: Looks good! Good work on the test case. We will test for a few days in our daily builds before merging it to maint. On Thu, Apr 11, 2013 at 8:05 PM, Anthony Ramine wrote: > I added a test and clarified the code following your suggestions. > > I noticed that RefReg wasn't a descriptive enough name but I am ashamed to > admit I optimized for diff shortness, sorry. > > Regards, > > -- > Anthony Ramine > > Le 11 avr. 2013 ? 16:01, Bj?rn Gustavsson a ?crit : > > > On Wed, Apr 10, 2013 at 12:22 PM, Anthony Ramine > wrote: > > Hello Bj?rn, > > > > I'm not sure I understand what you mean, should I add my example > function into receive_SUITE.erl as is to test that? I'm not sure that would > demonstrate anything in the long term as my other cooking patch > move-let-into-seq makes the culprit code where a reference is live in two Y > registers disappear. > > > > > > OK. I realize that it is hard or impossible to > > construct a test case that would test something > > that is not already tested by existing test cases. > > > > Your patch correctly fixes the bug, but I > > have some comments and suggestions > > for further simplification: > > > > The name of the RefReg variable is now > > misleading, since it contains a register set. > > (Suggested new name: RefRegSet, RefRegs, > > or RefSet.) > > > > The comment for opt_ref_used/4 needs to > > be updated. > > > > In opt_recv/5, my original code looked like: > > > > case regs_to_list(R) of > > [{y,_}=RefReg] -> ... > > > > The matching of {y,_} is just a cheap > > assertion (only added because it was > > almost free). > > > > Since your new code sends the register > > set to the opt_ref_used/4, there is no > > longer any need to convert the register > > set to a list. Thus we can write: > > > > case regs_empty(R) of > > false -> ... > > > > and remove the regs_to_list/1 function. > > > > Finally, for clarity I would add parenthesis > > in is_ref_msg_comparison/3: > > > > is_ref_msg_comparison([R1,R2], RefReg, Regs) -> > > (regs_is_member(R2, RefReg) andalso regs_is_member(R1, Regs)) orelse > > (regs_is_member(R1, RefReg) andalso regs_is_member(R2, Regs)). > > > > > > -- > > Bj?rn Gustavsson, Erlang/OTP, Ericsson AB > > -- Bj?rn Gustavsson, Erlang/OTP, Ericsson AB -------------- next part -------------- An HTML attachment was scrubbed... URL: From n.oxyde@REDACTED Fri Apr 12 13:01:38 2013 From: n.oxyde@REDACTED (Anthony Ramine) Date: Fri, 12 Apr 2013 13:01:38 +0200 Subject: [erlang-bugs] R16 breaks dots In-Reply-To: <3F6CD158-AF0E-4F16-AB72-4DFE84118C0B@gmail.com> References: <20130329203406.GB1251@zushakon> <13603AE6-E8E6-4EAD-899F-35D9CD7D2187@gmail.com> <51576CF7.6010905@gmail.com> <3E8948CA-EBD7-48EC-A9B8-6CEB8FF9C96F@gmail.com> <3F6CD158-AF0E-4F16-AB72-4DFE84118C0B@gmail.com> Message-ID: <0664E7BF-C7C1-4F3C-8E2F-64945D1F18C2@gmail.com> Ping? -- Anthony Ramine Le 4 avr. 2013 ? 15:45, Anthony Ramine a ?crit : > Hello Bj?rn, > > Could we have more details about why the Technical Board decided that? > > Regards, > > Le 4 avr. 2013 ? 15:38, Bj?rn Gustavsson a ?crit : > >> Rejected. >> >> The Technical Board did discuss the matter of dots in atoms and >> decided that the support should be removed along with packages. >> >> /Bjorn From n.oxyde@REDACTED Tue Apr 16 12:38:44 2013 From: n.oxyde@REDACTED (Anthony Ramine) Date: Tue, 16 Apr 2013 12:38:44 +0200 Subject: [erlang-bugs] [erlang-patches] Bit string generators, unsized binaries, modules and the REPL In-Reply-To: References: <649B6ECF-85AD-40BB-9CB1-C04DC348C499@gmail.com> Message-ID: <8F53233D-010C-499F-A4C9-073F5E9A37D1@gmail.com> Ping? -- Anthony Ramine Le 4 avr. 2013 ? 15:46, Bj?rn Gustavsson a ?crit : > Not really. I noticed that the implementation was flaky and > fixed it. > > The Technical Board will discuss the matter. From pan@REDACTED Tue Apr 16 16:17:25 2013 From: pan@REDACTED (Patrik Nyblom) Date: Tue, 16 Apr 2013 16:17:25 +0200 Subject: [erlang-bugs] [erlang-patches] Bit string generators, unsized binaries, modules and the REPL In-Reply-To: <8F53233D-010C-499F-A4C9-073F5E9A37D1@gmail.com> References: <649B6ECF-85AD-40BB-9CB1-C04DC348C499@gmail.com> <8F53233D-010C-499F-A4C9-073F5E9A37D1@gmail.com> Message-ID: <516D5D75.6010903@erlang.org> Hi Anthony! On 04/16/2013 12:38 PM, Anthony Ramine wrote: > Ping? > Technical board is tomorrow afternoon. So 'pang' today, 'pong' the day after tomorrow :) /Patrik From n.oxyde@REDACTED Mon Apr 22 15:02:30 2013 From: n.oxyde@REDACTED (Anthony Ramine) Date: Mon, 22 Apr 2013 15:02:30 +0200 Subject: [erlang-bugs] [erlang-patches] Bit string generators, unsized binaries, modules and the REPL In-Reply-To: <516D5D75.6010903@erlang.org> References: <649B6ECF-85AD-40BB-9CB1-C04DC348C499@gmail.com> <8F53233D-010C-499F-A4C9-073F5E9A37D1@gmail.com> <516D5D75.6010903@erlang.org> Message-ID: Still panging? -- Anthony Ramine Le 16 avr. 2013 ? 16:17, Patrik Nyblom a ?crit : > Hi Anthony! > > On 04/16/2013 12:38 PM, Anthony Ramine wrote: >> Ping? >> > Technical board is tomorrow afternoon. So 'pang' today, 'pong' the day after tomorrow :) > > /Patrik > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs From bgustavsson@REDACTED Mon Apr 22 15:29:48 2013 From: bgustavsson@REDACTED (=?UTF-8?Q?Bj=C3=B6rn_Gustavsson?=) Date: Mon, 22 Apr 2013 15:29:48 +0200 Subject: [erlang-bugs] [erlang-patches] Bit string generators, unsized binaries, modules and the REPL In-Reply-To: <8F53233D-010C-499F-A4C9-073F5E9A37D1@gmail.com> References: <649B6ECF-85AD-40BB-9CB1-C04DC348C499@gmail.com> <8F53233D-010C-499F-A4C9-073F5E9A37D1@gmail.com> Message-ID: On Tue, Apr 16, 2013 at 12:38 PM, Anthony Ramine wrote: > Ping? > > pong. (R17) It still remains to review the patch in the usual way. Have you looked in erl_eval and eval_bits to see whether any code there can be simplified or removed? -------------- next part -------------- An HTML attachment was scrubbed... URL: From n.oxyde@REDACTED Mon Apr 22 15:40:00 2013 From: n.oxyde@REDACTED (Anthony Ramine) Date: Mon, 22 Apr 2013 15:40:00 +0200 Subject: [erlang-bugs] [erlang-patches] Bit string generators, unsized binaries, modules and the REPL In-Reply-To: References: <649B6ECF-85AD-40BB-9CB1-C04DC348C499@gmail.com> <8F53233D-010C-499F-A4C9-073F5E9A37D1@gmail.com> Message-ID: Thanks for the reply. I've looked into both when I wrote the patch and didn't notice anything that could be simplified or removed. Regards, -- Anthony Ramine Le 22 avr. 2013 ? 15:29, Bj?rn Gustavsson a ?crit : > On Tue, Apr 16, 2013 at 12:38 PM, Anthony Ramine wrote: > Ping? > > > pong. (R17) > > It still remains to review the patch in the > usual way. > > Have you looked in erl_eval and eval_bits to > see whether any code there can be simplified or > removed? > From john.ryan.bard@REDACTED Tue Apr 23 01:16:44 2013 From: john.ryan.bard@REDACTED (John Bard) Date: Mon, 22 Apr 2013 19:16:44 -0400 Subject: [erlang-bugs] disksup:get_disk_data() is returning the wrong data on my system (OS X 10.8 -- Darwin 12) Message-ID: disksup:get_disk_data() is returning garbage in OS X 10.8 (on my system) [I have verified this in R15B02, R15B03 & 16B] 1> ok = application:start(sasl). 2> ok = application:start(os_mon). 3> disksup:get_disk_data(). [{"18704735",244277768,31}] 4> The expectation is: [{"/",244277768,31}] The problem is that the format of the df command changed in OS X 10.8 for some reason (at least it is different on my machine... I checked and I don't appear to have any bash functions or aliases getting in the way). In OS X 10.7.? (a colleague's mac mini: hdd): $ df -k -t ufs,hfs Filesystem 1024-blocks Used Available Capacity Mounted on /dev/disk0s2 487546976 156364392 330926584 33% / $ The 6th column is "Mounted on". In OS X 10.8.3 (my mac book pro: ssd): $ df -k -t ufs,hfs Filesystem 1024-blocks Used Available Capacity iused ifree %iused Mounted on /dev/disk0s2 244277768 74564136 169457632 31% 18705032 42364408 31% / $ The 6th column is "iused" instead of "Mounted on". If this is a simply difference in 10.8 [darwin 12] and isn't a more complex situation than that (macs pre-installed with ssds vs. hdds? some configuration somewhere?), a fix could be: [I toyed around in R16B, but the code is pretty much the same in R15B03 and I assume the same in R15B02] in os_mon's disksup.erl: add a guard for the specific darwin in init's case statement: Port = case OS of {unix, Flavor} when Flavor==sunos4; Flavor==solaris; Flavor==freebsd; Flavor==dragonfly; Flavor==darwin; Flavor==darwin_12; Flavor==linux; Flavor==openbsd; Flavor==netbsd; Flavor==irix64; Flavor==irix -> start_portprogram(); {win32, _OSname} -> not_used; _ -> exit({unsupported_os, OS}) end, add a clause to the case statement in the get_os() function: get_os() -> case os:type() of {unix, sunos} -> case os:version() of {5,_,_} -> {unix, solaris}; {4,_,_} -> {unix, sunos4}; V -> exit({unknown_os_version, V}) end; {unix, irix64} -> {unix, irix}; {unix, darwin} -> case os:version() of {12,_,_} -> {unix, darwin_12}; _ -> {unix, darwin} end; OS -> OS end. add a function that will pattern match on the new darwin_12 atom specified in the other places and will call into a function to parse the df command differently: check_disk_space({unix, darwin_12}, Port, Threshold) -> Result = my_cmd("/bin/df -k -t ufs,hfs", Port), check_disks_darwin_12(skip_to_eol(Result), Threshold); and add that new function to parse the df command: %% Special cases like this annoy me... check_disks_darwin_12("", _Threshold) -> []; check_disks_darwin_12("\n", _Threshold) -> []; check_disks_darwin_12(Str, Threshold) -> case io_lib:fread("~s~d~d~d~d%~d~d~d%~s", Str) of {ok, [_FS, KB, _Used, _Avail, Cap, _IUsed, _IFree, _ICap, MntOn], RestStr} -> if Cap >= Threshold -> set_alarm({disk_almost_full, MntOn}, []); true -> clear_alarm({disk_almost_full, MntOn}) end, [{MntOn, KB, Cap} | check_disks_darwin_12(RestStr, Threshold)]; _Other -> check_disks_darwin_12(skip_to_eol(Str),Threshold) end. Sorry if I butchered the whitespace (gmail ate the tabs, so I had to manually fix the whitespace with consistent spaces). -- Ryan -------------- next part -------------- An HTML attachment was scrubbed... URL: From lukas@REDACTED Tue Apr 23 14:54:33 2013 From: lukas@REDACTED (Lukas Larsson) Date: Tue, 23 Apr 2013 14:54:33 +0200 Subject: [erlang-bugs] disksup:get_disk_data() is returning the wrong data on my system (OS X 10.8 -- Darwin 12) In-Reply-To: References: Message-ID: <51768489.8090108@erlang.org> Hello, I believe that this is already a known and fixed issue. The fix[1] will be part of the R16B01 release or you can get the latest maint from github. If this does not fix your issue please use git to send in your proposed fixes as described at github.com[2]. Lukas [1]: http://erlang.org/pipermail/erlang-patches/2013-February/003611.html [2]: https://github.com/erlang/otp/wiki/Submitting-patches On 23/04/13 01:16, John Bard wrote: > > disksup:get_disk_data() is returning garbage in OS X 10.8 (on my system) > > > [I have verified this in R15B02, R15B03 & 16B] > > > 1> ok = application:start(sasl). > > > > 2> ok = application:start(os_mon). > > > > 3> disksup:get_disk_data(). > > [{"18704735",244277768,31}] > > 4> > > > The expectation is: [{"/",244277768,31}] > > > The problem is that the format of the df command changed in OS X 10.8 > for some reason (at least it is different on my machine... I checked > and I don't appear to have any bash functions or aliases getting in > the way). > > > In OS X 10.7.? (a colleague's mac mini: hdd): > > $ df -k -t ufs,hfs > > Filesystem 1024-blocks Used Available Capacity Mounted on > > /dev/disk0s2 487546976 156364392 330926584 33% / > > $ > > > The 6th column is "Mounted on". > > > In OS X 10.8.3 (my mac book pro: ssd): > > $ df -k -t ufs,hfs > > Filesystem 1024-blocks Used Available Capacity iused ifree > %iused Mounted on > > /dev/disk0s2 244277768 74564136 169457632 31% 18705032 42364408 > 31% / > > $ > > > The 6th column is "iused" instead of "Mounted on". > > > > If this is a simply difference in 10.8 [darwin 12] and isn't a more > complex situation than that (macs pre-installed with ssds vs. hdds? > some configuration somewhere?), a fix could be: > > [I toyed around in R16B, but the code is pretty much the same in > R15B03 and I assume the same in R15B02] > > > in os_mon's disksup.erl: > > > add a guard for the specific darwin in init's case statement: > > > Port = case OS of > > {unix, Flavor} when Flavor==sunos4; > > Flavor==solaris; > > Flavor==freebsd; > > Flavor==dragonfly; > > Flavor==darwin; > > Flavor==darwin_12; > > Flavor==linux; > > Flavor==openbsd; > > Flavor==netbsd; > > Flavor==irix64; > > Flavor==irix -> > > start_portprogram(); > > {win32, _OSname} -> > > not_used; > > _ -> > > exit({unsupported_os, OS}) > > end, > > > add a clause to the case statement in the get_os() function: > > > get_os() -> > > case os:type() of > > {unix, sunos} -> > > case os:version() of > > {5,_,_} -> {unix, solaris}; > > {4,_,_} -> {unix, sunos4}; > > V -> exit({unknown_os_version, V}) > > end; > > {unix, irix64} -> > > {unix, irix}; > > {unix, darwin} -> > > case os:version() of > > {12,_,_} -> {unix, darwin_12}; > > _ -> {unix, darwin} > > end; > > OS -> > > OS > > end. > > > > add a function that will pattern match on the new darwin_12 atom > specified in the other places and will call into a function to parse > the df command differently: > > > check_disk_space({unix, darwin_12}, Port, Threshold) -> > > Result = my_cmd("/bin/df -k -t ufs,hfs", Port), > > check_disks_darwin_12(skip_to_eol(Result), Threshold); > > > > and add that new function to parse the df command: > > > %% Special cases like this annoy me... > > check_disks_darwin_12("", _Threshold) -> > > []; > > check_disks_darwin_12("\n", _Threshold) -> > > []; > > check_disks_darwin_12(Str, Threshold) -> > > case io_lib:fread("~s~d~d~d~d%~d~d~d%~s", Str) of > > {ok, [_FS, KB, _Used, _Avail, Cap, _IUsed, _IFree, _ICap, > MntOn], RestStr} -> > > if > > Cap >= Threshold -> > > set_alarm({disk_almost_full, MntOn}, []); > > true -> > > clear_alarm({disk_almost_full, MntOn}) > > end, > > [{MntOn, KB, Cap} | check_disks_darwin_12(RestStr, > Threshold)]; > > _Other -> > > check_disks_darwin_12(skip_to_eol(Str),Threshold) > > end. > > > > Sorry if I butchered the whitespace (gmail ate the tabs, so I had to > manually fix the whitespace with consistent spaces). > > -- > Ryan > > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric.pailleau@REDACTED Sat Apr 27 12:15:26 2013 From: eric.pailleau@REDACTED (PAILLEAU Eric) Date: Sat, 27 Apr 2013 12:15:26 +0200 Subject: [erlang-bugs] R15B02 : wish list : missing gen_fsm state view in observer application. In-Reply-To: References: <507C692A.3020502@wanadoo.fr> Message-ID: <517BA53E.6080002@wanadoo.fr> Le 16/10/2012 09:15, Dan Gudmundsson a ?crit : > I agree, that would be nice..added to the observer wishlist.. > Another tab in process info window (for gen processes) would > be good. > > A patch will make it happen sooner :-) > > /Dan Hello, Any updates about this feature in observer ? regards From n.oxyde@REDACTED Sat Apr 27 13:23:16 2013 From: n.oxyde@REDACTED (Anthony Ramine) Date: Sat, 27 Apr 2013 13:23:16 +0200 Subject: [erlang-bugs] Fix an inconsistent state in epp Message-ID: Hello, When entering an included file, epp doesn't properly set #epp.name2 like it does on init, generating a malformed file attribute with file name "" when it leaves the file. git fetch https://github.com/nox/otp.git fix-epp-file-attrs https://github.com/nox/otp/compare/erlang:maint...fix-epp-file-attrs https://github.com/nox/otp/compare/erlang:maint...fix-epp-file-attrs.patch Regards, -- Anthony Ramine From n.oxyde@REDACTED Sat Apr 27 13:36:06 2013 From: n.oxyde@REDACTED (Anthony Ramine) Date: Sat, 27 Apr 2013 13:36:06 +0200 Subject: [erlang-bugs] Fix some Makefile rules that didn't support silent rules Message-ID: <255174D9-671F-40B4-8EBD-412A57C15FBD@gmail.com> Hello, This patch fixes some Makefile rules that didn't support silent rules. git fetch https://github.com/nox/otp.git fix-silent-rules https://github.com/nox/otp/compare/erlang:maint...fix-silent-rules https://github.com/nox/otp/compare/erlang:maint...fix-silent-rules.patch Regards, -- Anthony Ramine From n.oxyde@REDACTED Sat Apr 27 15:01:17 2013 From: n.oxyde@REDACTED (Anthony Ramine) Date: Sat, 27 Apr 2013 15:01:17 +0200 Subject: [erlang-bugs] [erlang-patches] R16A: Small inconvenience with Unicode strings in the shell In-Reply-To: <31E120F8-73EB-4BF3-86AE-CBBC9085E2A1@gmail.com> References: <51098739.8020308@ninenines.eu> <5118C033.5050301@erlang.org> <31E120F8-73EB-4BF3-86AE-CBBC9085E2A1@gmail.com> Message-ID: <17102F61-F233-4E8B-8D0D-4038571DD1A3@gmail.com> Hello, I removed wcwidth from the win32 driver and added an autoconf check for the function presence, please refetch. Regards, -- Anthony Ramine Le 11 f?vr. 2013 ? 11:58, Anthony Ramine a ?crit : > Hi, > > There is no wcwidth() in Windows, I'll add some autoconf magic to conditionally > use it. > > -- > Anthony Ramine > > Le 11 f?vr. 2013 ? 10:56, Fredrik a ?crit : > >> On 02/09/2013 02:01 PM, Anthony Ramine wrote: >>> Hi, >>> >>> I've got a fix: >>> >>> git fetch https://github.com/nox/otp.git wide-chars >>> >>> https://github.com/nox/otp/compare/erlang:master...wide-chars >>> https://github.com/nox/otp/compare/erlang:master...wide-chars.patch >>> >>> I can't test the win32 part though. >>> >>> Bj?rn, should I add some autoconf machinery to know whether wcwidth() is >>> available? >>> >>> Regards, >>> >> Your patch is failing on windows: >> >> error LNK2019: unresolved external symbol wcwidth referenced in function check_buf_size >> >> >> Could you please have a look at this? >> >> -- >> >> BR Fredrik Gustafsson >> Erlang OTP Team >> > From bryan@REDACTED Mon Apr 29 03:52:32 2013 From: bryan@REDACTED (Bryan Fink) Date: Sun, 28 Apr 2013 21:52:32 -0400 Subject: [erlang-bugs] Supervisor terminate_child race Message-ID: Hi. I've been digging into an issue filed against Riak Pipe for the last couple of weeks (https://github.com/basho/riak_pipe/issues/49), and I've finally tracked it all the way to supervisor.erl. The issue manifests itself as a supervisor complaining about its child exiting "with reason noproc in context shutdown_error". Comments in supervisor:monitor_child/1 warn that this might happen if a "naughty" child unlinks from its parent. But, the child I'm working with doesn't do that. What is happening is that the child is choosing to exit on its own, while some other process is asking the supervisor to terminate it. The sequence of monitoring, unlinking, and receiving with zero timeout in supervisor:monitor_child/1 is insufficient to guarantee catching the child's EXIT signal. After the supervisor misses the EXIT signal, it receives the DOWN instead, which has reason noproc. This is not limited to 'normal' child exits. Any exit reason might be missed, so this is worse than just log spam, and can inhibit reporting and debugging. I have written two tests to demonstrate the behavior: https://gist.github.com/beerriot/28258f2a44fc482016b1 They use EQC PULSE to make the race happen more often (and deterministically for repeated runs). The exitrace_sup.erl test uses the supervisor module (with PULSE disabled) or the pulse_supervisor module (with PULSE enabled) to show the code behaving in-place (these modules share the same monitor_child/1 function). The exitrace.erl test extracts the relevant code to show its behavior specifically. To demonstrate how a non-normal child exit's reason can be lost, change exitrace_sup:start_fake_child_link/0 or exitrace:child/0 to include a call to exit(foobar). I have not yet attempted to patch the behavior. It wasn't obvious to me why the code hassled with monitors instead of just relying on the existing link, so I thought I'd ask for clarification first. Cheers, Bryan From bryan@REDACTED Mon Apr 29 04:28:04 2013 From: bryan@REDACTED (Bryan Fink) Date: Sun, 28 Apr 2013 22:28:04 -0400 Subject: [erlang-bugs] Supervisor terminate_child race In-Reply-To: References: Message-ID: On Sun, Apr 28, 2013 at 9:52 PM, Bryan Fink wrote: > Hi. I've been digging into an issue filed against Riak Pipe for the > last couple of weeks (https://github.com/basho/riak_pipe/issues/49), > and I've finally tracked it all the way to supervisor.erl. I just realized that I completely failed to mention that I've been doing my testing on R15B01. The same code exists on the head of the OTP github repo, though, so I assume the behavior still exists in the latest release. I have also just added notes to the Riak Pipe issue for easily reproducing the original issue in the wild. -Bryan From dangud@REDACTED Mon Apr 29 09:05:59 2013 From: dangud@REDACTED (Dan Gudmundsson) Date: Mon, 29 Apr 2013 09:05:59 +0200 Subject: [erlang-bugs] R15B02 : wish list : missing gen_fsm state view in observer application. In-Reply-To: <517BA53E.6080002@wanadoo.fr> References: <507C692A.3020502@wanadoo.fr> <517BA53E.6080002@wanadoo.fr> Message-ID: No I haven't got a patch yet :-) /Dan -------------- next part -------------- An HTML attachment was scrubbed... URL: From fredrik@REDACTED Mon Apr 29 09:42:06 2013 From: fredrik@REDACTED (Fredrik) Date: Mon, 29 Apr 2013 09:42:06 +0200 Subject: [erlang-bugs] [erlang-patches] Fix an inconsistent state in epp In-Reply-To: References: Message-ID: <517E244E.4050307@erlang.org> On 04/27/2013 01:23 PM, Anthony Ramine wrote: > Hello, > > When entering an included file, epp doesn't properly set #epp.name2 like it does on init, generating a malformed file attribute with file name "" when it leaves the file. > > git fetch https://github.com/nox/otp.git fix-epp-file-attrs > > https://github.com/nox/otp/compare/erlang:maint...fix-epp-file-attrs > https://github.com/nox/otp/compare/erlang:maint...fix-epp-file-attrs.patch > > Regards, > Hello, I've fetched your patch it is now located in the 'pu' branch. A review process has started. Thanks, -- BR Fredrik Gustafsson Erlang OTP Team From fredrik@REDACTED Mon Apr 29 09:44:19 2013 From: fredrik@REDACTED (Fredrik) Date: Mon, 29 Apr 2013 09:44:19 +0200 Subject: [erlang-bugs] [erlang-patches] Fix some Makefile rules that didn't support silent rules In-Reply-To: <255174D9-671F-40B4-8EBD-412A57C15FBD@gmail.com> References: <255174D9-671F-40B4-8EBD-412A57C15FBD@gmail.com> Message-ID: <517E24D3.1020607@erlang.org> On 04/27/2013 01:36 PM, Anthony Ramine wrote: > Hello, > > This patch fixes some Makefile rules that didn't support silent rules. > > git fetch https://github.com/nox/otp.git fix-silent-rules > > https://github.com/nox/otp/compare/erlang:maint...fix-silent-rules > https://github.com/nox/otp/compare/erlang:maint...fix-silent-rules.patch > > Regards, > Hello Anthony, I've fetched your patch, it is now located in the 'pu' branch. Thanks, -- BR Fredrik Gustafsson Erlang OTP Team From watson.timothy@REDACTED Mon Apr 29 12:08:14 2013 From: watson.timothy@REDACTED (Tim Watson) Date: Mon, 29 Apr 2013 11:08:14 +0100 Subject: [erlang-bugs] Supervisor terminate_child race In-Reply-To: References: Message-ID: Would it be sufficient to add a clause explicitly handling `{'DOWN', MRef, process, Child, noproc}' to the receive block? As I understand it, monitor/2 works synchronously and the 'DOWN' message will be enqueued straight away if the target process is unknown. Or is that assumption unreliable - I'd like to know if it is - and there's more to it? Unfortunately I don't have an EQC/PULSE license to be able to test this. It'd be really nice if PULSE was available in the free/mini QC for selected open source projects. Cheers, Tim 29 Apr 2013, at 02:52, Bryan Fink wrote: > Hi. I've been digging into an issue filed against Riak Pipe for the > last couple of weeks (https://github.com/basho/riak_pipe/issues/49), > and I've finally tracked it all the way to supervisor.erl. > > The issue manifests itself as a supervisor complaining about its child > exiting "with reason noproc in context shutdown_error". Comments in > supervisor:monitor_child/1 warn that this might happen if a "naughty" > child unlinks from its parent. But, the child I'm working with doesn't > do that. > > What is happening is that the child is choosing to exit on its own, > while some other process is asking the supervisor to terminate it. The > sequence of monitoring, unlinking, and receiving with zero timeout in > supervisor:monitor_child/1 is insufficient to guarantee catching the > child's EXIT signal. After the supervisor misses the EXIT signal, it > receives the DOWN instead, which has reason noproc. > > This is not limited to 'normal' child exits. Any exit reason might be > missed, so this is worse than just log spam, and can inhibit reporting > and debugging. > > I have written two tests to demonstrate the behavior: > > https://gist.github.com/beerriot/28258f2a44fc482016b1 > > They use EQC PULSE to make the race happen more often (and > deterministically for repeated runs). The exitrace_sup.erl test uses > the supervisor module (with PULSE disabled) or the pulse_supervisor > module (with PULSE enabled) to show the code behaving in-place (these > modules share the same monitor_child/1 function). The exitrace.erl > test extracts the relevant code to show its behavior specifically. To > demonstrate how a non-normal child exit's reason can be lost, change > exitrace_sup:start_fake_child_link/0 or exitrace:child/0 to include a > call to exit(foobar). > > I have not yet attempted to patch the behavior. It wasn't obvious to > me why the code hassled with monitors instead of just relying on the > existing link, so I thought I'd ask for clarification first. > > Cheers, > Bryan > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs From bryan@REDACTED Mon Apr 29 15:14:38 2013 From: bryan@REDACTED (Bryan Fink) Date: Mon, 29 Apr 2013 09:14:38 -0400 Subject: [erlang-bugs] Supervisor terminate_child race In-Reply-To: References: Message-ID: Hi, Tim. Thanks for responding. On Mon, Apr 29, 2013 at 6:08 AM, Tim Watson wrote: > Would it be sufficient to add a clause explicitly handling `{'DOWN', MRef, process, Child, noproc}' to the receive block? As I understand it, monitor/2 works synchronously and the 'DOWN' message will be enqueued straight away if the target process is unknown. Or is that assumption unreliable - I'd like to know if it is - and there's more to it? I don't think adding a DOWN-noproc clause to the receive block would be sufficient. It would silence the error, but it would also end up hiding legitimate errors about children that do actually exit with reason noproc. I believe I also now understand that monitoring (instead of depending solely on the link) is necessary precisely because of those "naughty" children. Child processes can't disable a monitor, as they can a link, so depending on DOWN instead of EXIT prevents a child from causing the supervisor to hang (as warned about in the doc of shutdown/2). > Unfortunately I don't have an EQC/PULSE license to be able to test this. It'd be really nice if PULSE was available in the free/mini QC for selected open source projects. Agreed. PULSE is a fantastic tool that has been invaluable to me the past couple of weeks. -Bryan From bryan@REDACTED Mon Apr 29 15:24:21 2013 From: bryan@REDACTED (Bryan Fink) Date: Mon, 29 Apr 2013 09:24:21 -0400 Subject: [erlang-bugs] Supervisor terminate_child race In-Reply-To: References: Message-ID: On Mon, Apr 29, 2013 at 9:14 AM, Bryan Fink wrote: > On Mon, Apr 29, 2013 at 6:08 AM, Tim Watson wrote: >> Unfortunately I don't have an EQC/PULSE license to be able to test this. It'd be really nice if PULSE was available in the free/mini QC for selected open source projects. > > Agreed. PULSE is a fantastic tool that has been invaluable to me the > past couple of weeks. I should have also mentioned that the reproduction method described in https://github.com/basho/riak_pipe/issues/49 using a live Riak node and basho_bench does not require PULSE. It's not as pure, and the error is just log output instead of a nice test-halting flag, but it does hit the issue more often than the simple test I submitted earlier does (at least on the machine I've been using). -Bryan From fritchie@REDACTED Mon Apr 29 22:01:43 2013 From: fritchie@REDACTED (Scott Lystig Fritchie) Date: Mon, 29 Apr 2013 15:01:43 -0500 Subject: [erlang-bugs] Schedulers getting "stuck", part II Message-ID: <209.1367265703@snookles.snookles.com> Hi, all. I'd originally intended to cross-post last week's message about stuck/collapsed schedulers to both erlang-questions and erlang-bugs ... but forgot to do it. So, here it is. Incorporated by reference: http://erlang.org/pipermail/erlang-questions/2013-April/073490.html Since that message, I've found that R16B on my 8 core MacBook Pro laptop can get its schedulers stuck in about one case in four with a much less timeconsuming recipe than the original recipe ... but it requires human intervention to stop & restart if collapse doesn't happen. Note that it's an 8 core box, and I'm dropping the number of online schedulers down to 5. Using 6 cores doesn't appear to be successful ... or I'm not patient enough to run it enough to see it happen. /usr/local/erlang/R16B.64bit/bin/erl +scl false -pz ebin -sname foo -eval 'N = 5, io:format("OS pid ~s\n\n", [os:getpid()]), timer:sleep(8*1000), io:format("go\n"), erlang:system_flag(schedulers_online, N), os:cmd("say yo is " ++ integer_to_list(N)), timer:sleep(12*1000), timer:tc(erlang, apply, [fun () -> XX = lists:sort(element(1,wait:run(4*100, 1024*1024, 1100, 5))), {hd(XX), lists:last(XX)} end, []]).' I run "iostat 1" in another window ... if it is reporting %user time in the low 60s, all 5 (out of 8) schedulers are working the way that they're supposed to. If you see 12-25 percent instead, you've got only one or two active schedulers. -Scott From fredrik@REDACTED Tue Apr 30 10:12:57 2013 From: fredrik@REDACTED (Fredrik) Date: Tue, 30 Apr 2013 10:12:57 +0200 Subject: [erlang-bugs] [erlang-patches] R16A: Small inconvenience with Unicode strings in the shell In-Reply-To: <17102F61-F233-4E8B-8D0D-4038571DD1A3@gmail.com> References: <51098739.8020308@ninenines.eu> <5118C033.5050301@erlang.org> <31E120F8-73EB-4BF3-86AE-CBBC9085E2A1@gmail.com> <17102F61-F233-4E8B-8D0D-4038571DD1A3@gmail.com> Message-ID: <517F7D09.2030306@erlang.org> On 04/27/2013 03:01 PM, Anthony Ramine wrote: > Hello, > > I removed wcwidth from the win32 driver and added an autoconf check for the function presence, please refetch. > > Regards, > Re-fetched. Thanks, -- BR Fredrik Gustafsson Erlang OTP Team From erlangsiri@REDACTED Tue Apr 30 15:03:02 2013 From: erlangsiri@REDACTED (Siri Hansen) Date: Tue, 30 Apr 2013 15:03:02 +0200 Subject: [erlang-bugs] Supervisor terminate_child race In-Reply-To: References: Message-ID: Hi Bryan! Thanks for reporting this. As far as I can understand, it must be the zero timeout that is the problem. I assume that the EXIT signal arrives too late. Can you confirm that? You are very welcome to contribute with a patch for this :) Regards /siri@REDACTED 2013/4/29 Bryan Fink > On Mon, Apr 29, 2013 at 9:14 AM, Bryan Fink wrote: > > On Mon, Apr 29, 2013 at 6:08 AM, Tim Watson > wrote: > >> Unfortunately I don't have an EQC/PULSE license to be able to test > this. It'd be really nice if PULSE was available in the free/mini QC for > selected open source projects. > > > > Agreed. PULSE is a fantastic tool that has been invaluable to me the > > past couple of weeks. > > I should have also mentioned that the reproduction method described in > https://github.com/basho/riak_pipe/issues/49 using a live Riak node > and basho_bench does not require PULSE. It's not as pure, and the > error is just log output instead of a nice test-halting flag, but it > does hit the issue more often than the simple test I submitted earlier > does (at least on the machine I've been using). > > -Bryan > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pan@REDACTED Tue Apr 30 16:09:28 2013 From: pan@REDACTED (Patrik Nyblom) Date: Tue, 30 Apr 2013 16:09:28 +0200 Subject: [erlang-bugs] Schedulers getting "stuck", part II In-Reply-To: <209.1367265703@snookles.snookles.com> References: <209.1367265703@snookles.snookles.com> Message-ID: <517FD098.4040706@erlang.org> Hi Scott! On 04/29/2013 10:01 PM, Scott Lystig Fritchie wrote: > Hi, all. I'd originally intended to cross-post last week's message > about stuck/collapsed schedulers to both erlang-questions and > erlang-bugs ... but forgot to do it. So, here it is. > > Incorporated by reference: > > http://erlang.org/pipermail/erlang-questions/2013-April/073490.html > > Since that message, I've found that R16B on my 8 core MacBook Pro laptop > can get its schedulers stuck in about one case in four with a much less > timeconsuming recipe than the original recipe ... but it requires human > intervention to stop & restart if collapse doesn't happen. Note that > it's an 8 core box, and I'm dropping the number of online schedulers > down to 5. Using 6 cores doesn't appear to be successful ... or I'm not > patient enough to run it enough to see it happen. Hmmm, dropping schedulers...? There seems to be a perfectly new and fresh bug in R16B when dropping schedulers. One that we've fixed in the maint branch. Could you please please please try the tip of the maint branch (i.e. what's to be R16B01)? The R16B plain "drop schedulers bug" ought to be unrelated to the misbehaving schedulers you've seen in other cases, so I just want to be sure we are not hunting a ghost with this test case, that it can really show the misbehaving schedulers that you also see in R15... > /usr/local/erlang/R16B.64bit/bin/erl +scl false -pz ebin -sname foo -eval 'N = 5, io:format("OS pid ~s\n\n", [os:getpid()]), timer:sleep(8*1000), io:format("go\n"), erlang:system_flag(schedulers_online, N), os:cmd("say yo is " ++ integer_to_list(N)), timer:sleep(12*1000), timer:tc(erlang, apply, [fun () -> XX = lists:sort(element(1,wait:run(4*100, 1024*1024, 1100, 5))), {hd(XX), lists:last(XX)} end, []]).' > > I run "iostat 1" in another window ... if it is reporting %user time in > the low 60s, all 5 (out of 8) schedulers are working the way that > they're supposed to. If you see 12-25 percent instead, you've got only > one or two active schedulers. > > -Scott Cheers, Patrik > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs From bryan@REDACTED Tue Apr 30 17:22:33 2013 From: bryan@REDACTED (Bryan Fink) Date: Tue, 30 Apr 2013 11:22:33 -0400 Subject: [erlang-bugs] Supervisor terminate_child race In-Reply-To: References: Message-ID: On Tue, Apr 30, 2013 at 9:03 AM, Siri Hansen wrote: > Hi Bryan! Hi, Siri. This has been a fun one to track down. > Thanks for reporting this. As far as I can understand, it must be the zero > timeout that is the problem. I assume that the EXIT signal arrives too late. > Can you confirm that? That does seem to be exactly it, yes. > You are very welcome to contribute with a patch for this :) I've spent some time fiddling with a couple of hacks, but they have not yet been clever enough. ;) It seems that there is no guarantee about when the EXIT signal will arrive. It might even come some amount of time after the DOWN message. Does this sound right to you, or is there something I might be overlooking? -Bryan From watson.timothy@REDACTED Tue Apr 30 18:09:43 2013 From: watson.timothy@REDACTED (Tim Watson) Date: Tue, 30 Apr 2013 17:09:43 +0100 Subject: [erlang-bugs] Supervisor terminate_child race In-Reply-To: References: Message-ID: <3161FF70-B6D0-4565-8664-2FCB9F96E08D@gmail.com> Bryan, On 30 Apr 2013, at 16:22, Bryan Fink wrote: >> Thanks for reporting this. As far as I can understand, it must be the zero >> timeout that is the problem. I assume that the EXIT signal arrives too late. >> Can you confirm that? > > That does seem to be exactly it, yes. > But twiddling the timing there is just as racy, as you've noticed, right? >> You are very welcome to contribute with a patch for this :) > > I've spent some time fiddling with a couple of hacks, but they have > not yet been clever enough. ;) It seems that there is no guarantee > about when the EXIT signal will arrive. It might even come some amount > of time after the DOWN message. > Isn't the point that the EXIT signal might /never/ come, if the child un-links, or might come *after* the 'DOWN' if the race you've located occurs? Surely you've got to be able to handle either case? We ran into something similar with our supervisor2 fork a while back, whilst terminating (multiple) simple children: http://hg.rabbitmq.com/rabbitmq-server/rev/812d71d0716c . That code is somewhat different though, not only because it was terminating multiple children (during shutdown) but also because it explicitly unlinks from the child *after* creating the monitor, and /still/ allowed for an EXIT signal to have made its way into the mailbox unexpectedly. > Does this sound right to you, or is there something I might be overlooking? > I'm very interested to see how this works out, as I've spent a while merging the upstream changes in R16B with Rabbit's supervisor2 module, and will need to integrate this fix into our codebase at some point too. Cheers Tim From bryan@REDACTED Tue Apr 30 19:34:10 2013 From: bryan@REDACTED (Bryan Fink) Date: Tue, 30 Apr 2013 13:34:10 -0400 Subject: [erlang-bugs] Supervisor terminate_child race In-Reply-To: <3161FF70-B6D0-4565-8664-2FCB9F96E08D@gmail.com> References: <3161FF70-B6D0-4565-8664-2FCB9F96E08D@gmail.com> Message-ID: On Tue, Apr 30, 2013 at 12:09 PM, Tim Watson wrote: > > On 30 Apr 2013, at 16:22, Bryan Fink wrote: > >>> Thanks for reporting this. As far as I can understand, it must be the zero >>> timeout that is the problem. I assume that the EXIT signal arrives too late. >>> Can you confirm that? >> >> That does seem to be exactly it, yes. >> > > But twiddling the timing there is just as racy, as you've noticed, right? Correct. The length of the timeout is irrelevant. The EXIT signal is not guaranteed to arrive within any specific amount of time. > >>> You are very welcome to contribute with a patch for this :) >> >> I've spent some time fiddling with a couple of hacks, but they have >> not yet been clever enough. ;) It seems that there is no guarantee >> about when the EXIT signal will arrive. It might even come some amount >> of time after the DOWN message. >> > > Isn't the point that the EXIT signal might /never/ come, if the child un-links, or might come *after* the 'DOWN' if the race you've located occurs? Surely you've got to be able to handle either case? Yes, the point of the monitor is to handle the case where the EXIT never comes (because the child unlinks). It is not the case, however, that the EXIT always arrives after the DOWN in the race I'm seeing. They might both be delayed. Handling either order is important, but the problem with this race is that only the EXIT message contains the actual exit reason when this happens. The 'noproc' in the DOWN is just saying that there was no process to monitor. > We ran into something similar with our supervisor2 fork a while back, whilst terminating (multiple) simple children: http://hg.rabbitmq.com/rabbitmq-server/rev/812d71d0716c . That code is somewhat different though, not only because it was terminating multiple children (during shutdown) but also because it explicitly unlinks from the child *after* creating the monitor, and /still/ allowed for an EXIT signal to have made its way into the mailbox unexpectedly. The monitor_child/1 function also unlinks from the child after creating the monitor. That patch looks a little bit like the fixes I was trying. Basically it's checking for an EXIT message after receiving the DOWN, just in case one is in the mailbox, yes? The problem is that it might still miss an EXIT, because it might still not have arrived yet, even though it will later. -Bryan From fritchie@REDACTED Tue Apr 30 22:34:09 2013 From: fritchie@REDACTED (Scott Lystig Fritchie) Date: Tue, 30 Apr 2013 15:34:09 -0500 Subject: [erlang-bugs] Schedulers getting "stuck", part II In-Reply-To: Message of "Tue, 30 Apr 2013 16:09:28 +0200." <517FD098.4040706@erlang.org> Message-ID: <78245.1367354049@snookles.snookles.com> Patrik Nyblom wrote: pn> Hmmm, dropping schedulers...? There seems to be a perfectly new and pn> fresh bug in R16B when dropping schedulers. One that we've fixed in pn> the maint branch. Could you please please please try the tip of the pn> maint branch (i.e. what's to be R16B01)? Today's maint branch works well, 20 out of 20 runs show all 5 schedulers in use when I use this (which uses the "nifwait" source repo, see earlier in this thread for where to find it). foreach i (`seq 1 20`) ./bin/erl -noshell -noinput +scl false -pz ~/b/src/nifwait/ebin -sname foo -eval 'N = 5, io:format("OS pid ~s\n\n", [os:getpid()]), timer:sleep(8*1000), io:format("go\n"), erlang:system_flag(schedulers_online, N), timer:sleep(12*1000), timer:tc(erlang, apply, [fun () -> XX = lists:sort(element(1,wait:run(4*100, 1024*1024, 1100, 5))), {hd(XX), lists:last(XX)} end, []]).' & sleep 45 ; kill %. end ... and then look at the %user CPU time with "iostat" or "vmstat". This 20 out of 20 iterations @ 5 cores never happens with R16B. In fact, using the same loop above, using R16B, I found 0 out of 6 iterations @ 5 cores before I gave up waiting for a good 5 core balance. -Scott From watson.timothy@REDACTED Tue Apr 30 23:13:00 2013 From: watson.timothy@REDACTED (Tim Watson) Date: Tue, 30 Apr 2013 22:13:00 +0100 Subject: [erlang-bugs] Supervisor terminate_child race In-Reply-To: References: <3161FF70-B6D0-4565-8664-2FCB9F96E08D@gmail.com> Message-ID: <83357CE5-7BFB-4857-82ED-33AC842ACBD8@gmail.com> Hi Bryan, On 30 Apr 2013, at 18:34, Bryan Fink wrote: >> >> But twiddling the timing there is just as racy, as you've noticed, right? > > Correct. The length of the timeout is irrelevant. The EXIT signal is > not guaranteed to arrive within any specific amount of time. > Indeed. Almost a halting problem this isn't it. :) >> >> Isn't the point that the EXIT signal might /never/ come, if the child un-links, or might come *after* the 'DOWN' if the race you've located occurs? Surely you've got to be able to handle either case? > > Yes, the point of the monitor is to handle the case where the EXIT > never comes (because the child unlinks). It is not the case, however, > that the EXIT always arrives after the DOWN in the race I'm seeing. > They might both be delayed. > Waiting without a timeout for the 'DOWN' is acceptable, because you've got a guarantee (via the runtime) the it *will* arrive, no matter what state the target process was in when you created the monitor. Waiting some arbitrary time for the 'EXIT' is a real problem though, because you could wait forever. > Handling either order is important, but the problem with this race is > that only the EXIT message contains the actual exit reason when this > happens. The 'noproc' in the DOWN is just saying that there was no > process to monitor. Indeed. But it could equally be true that the 'EXIT' signal was never dispatched, because the child process unlinked before it died; You can't wait forever for the 'EXIT' after you've seen a 'DOWN' with 'noproc' as the reason, so now you've got to choose how long to wait, but whatever timing works for one particular case isn't going to solve the general problem. > >> We ran into something similar with our supervisor2 fork a while back, whilst terminating (multiple) simple children: http://hg.rabbitmq.com/rabbitmq-server/rev/812d71d0716c . That code is somewhat different though, not only because it was terminating multiple children (during shutdown) but also because it explicitly unlinks from the child *after* creating the monitor, and /still/ allowed for an EXIT signal to have made its way into the mailbox unexpectedly. > > The monitor_child/1 function also unlinks from the child after > creating the monitor. That patch looks a little bit like the fixes I > was trying. Basically it's checking for an EXIT message after > receiving the DOWN, just in case one is in the mailbox, yes? That's correct. > The problem is that it might still miss an EXIT, because it might still > not have arrived yet, even though it will later. > Yes that's definitely true and we were aware of that problem, however since we know we cannot wait for the 'EXIT' forever and whatever arbitrary timeout we choose is just someone else's race condition, we decided that if the EXIT signal wasn't delivered expediently to the process' mailbox, that loosing the real exit reason was something we could live with in the worst case. Since we've started merging the R15/R16 changes in though, that code has disappeared so we're in the same boat as you guys. :) Cheers, Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: