From richardc@REDACTED Fri Jan 2 17:47:39 2009 From: richardc@REDACTED (Richard Carlsson) Date: Fri, 02 Jan 2009 17:47:39 +0100 Subject: [erlang-bugs] EUnit discards all output In-Reply-To: <3E7B459FEA684B97BC8F5091EC883EEE@JTablet2007> References: <3E7B459FEA684B97BC8F5091EC883EEE@JTablet2007> Message-ID: <495E452B.7060709@it.uu.se> John Hughes wrote: > From the eunit Users' Guide: > > *EUnit captures standard output* > > If your test code writes to the standard output, you may be surprised to > see that the text does not appear on the console when the tests are > running. This is because EUnit captures all standard output from test > functions (this also includes setup and cleanup functions, but not > generator functions), so that it can be included in the test report if > errors occur. > > OK, it says the output CAN be included in the test report if errors > occur, not that it WILL be--but I nevertheless expected the latter to > happen. When I run EUnit, however, ALL output is discarded, even output > from failing tests. Is that really the intention? Maybe I'm just doing > something wrong here--but I have not found any documented way to turn ON > reporting of output from failed tests. Ah, it seems I didn't remember to actually present that information. I.e., the presentation layer has always received the data, but didn't bother to ever print it. I just checked in a fix in the repository at https://svn.process-one.net/contribs/trunk/eunit, that prints the output (truncated if it gets too long) if there is any. In general, EUnit bug reports can be filed at https://support.process-one.net/browse/EUNIT. /Richard -- "Having users is like optimization: the wise course is to delay it." -- Paul Graham From richardc@REDACTED Fri Jan 2 23:43:41 2009 From: richardc@REDACTED (Richard Carlsson) Date: Fri, 02 Jan 2009 23:43:41 +0100 Subject: [erlang-bugs] EUnit treats a process that kills itself as a successful test In-Reply-To: References: Message-ID: <495E989D.5020507@it.uu.se> John Hughes wrote: > Here's my code: > > -module(eunit_example). > -include_lib("eunit/include/eunit.hrl"). > > exit_test() -> > exit(self(),die). > > In the shell, I run: > > 1> c(eunit_example). > {ok,eunit_example} > 2> eunit_example:test(). > Test successful. > ok > > Is that really the intention? > > Likewise, this test passes: > > spawn_test() -> > spawn_link(erlang,exit,[dying]), > timer:sleep(1). > > (The sleep is there to allow time for the child process to die, and the > exit signal to be propagated). > > Presumably the process running the test is trapping exits--but is that > really appropriate? As in this last example, crashes in child processes > won't cause the test to fail. At some point, I had the idea that it would be good to make the test processes default to trapping exits, to make it easier to write some kinds of process-spawning tests. (There was a comment to this effect in the code, but it was never documented.) This was probably misguided, so I have now checked in a change that makes the test processes non-trapping, which will probably cause less astonishment. (Trapping can still be enabled by the user where necessary). One thing to remember, though, is that unless you wrap every single test in a separate {spawn,Test}, a test process will run several tests, one at a time. (In the simplest case only one test process is ever spawned, and runs all the tests.) Hence, if the process dies, this will not simply "fail" a single test, but cause all the tests that were to be executed by that process to instead be cancelled/skipped. Hence, if a test (or group of tests) is known to be at risk of receiving an exit signal, it is best to wrap it in {spawn, Test}, so that the effect is isolated. /Richard -- "Having users is like optimization: the wise course is to delay it." -- Paul Graham From yubao.liu@REDACTED Sun Jan 4 04:15:46 2009 From: yubao.liu@REDACTED (Liu Yubao) Date: Sun, 04 Jan 2009 11:15:46 +0800 Subject: [erlang-bugs] openssl s_client hangs when accessing https service in inets application In-Reply-To: <495AE082.1050502@gmail.com> References: <495AE082.1050502@gmail.com> Message-ID: <496029E2.6030107@gmail.com> Hi, The documentation and code of inets application are not consistent, the corresponding option in {proplist_file, path()} to "SocketType" option in {file, path()} is "com_type", not "socket_type". Liu Yubao wrote: > Hi, > > The https services in inets application doesn't work, I guess > I got something wrong. Below is the steps to recur: > > a. use gen-cert.sh to generate server.pem; > (All scripts and configuration are provided at > http://jff.googlecode.com/files/inets-https-test.tar > ) > > b. execute runerl.sh and input these clauses in the erlang shell: > application:start(ssl). > application:start(inets). > > c. execute `openssl s_client -connect localhost:8443 -debug -msg`, > you can see openssl hangs after sending a CLIENT-HELLO message, > the TCP connection is established successfully but https server > doesn't response to the CLIENT-HELLO message. > > > I tested "ssl:listen" in erlang shell and succeed to communication between > openssl and erlang shell: > > application:start(ssl). > {ok, S} = ssl:listen(8443, [{certfile, "server.pem"}, {active, false}]). > {ok, S2} = ssl:accept(S). > # execute in another bash: openssl s_client -connect localhost:8443 > ssl:send(S2, <<"hello world\n">>). > # "openssl s_client" can receive this greeting. > > > I tested against the latest erlang 5.6.5 under Windows XP and 5.6.3 under > Debian Lenny. > > I'm looking forward your help! > > > Best regards, > > Liu Yubao > From jack@REDACTED Mon Jan 5 17:42:38 2009 From: jack@REDACTED (Jack Moffitt) Date: Mon, 5 Jan 2009 09:42:38 -0700 Subject: [erlang-bugs] small documentation typo in http module Message-ID: <9b58f4550901050842v47f57b96vb6cc80408da7a66d@mail.gmail.com> On http://www.erlang.org/doc/man/http.html ipv6Mode (under set_options) should be capitalized like the rest of the types. I misread this as an atom due to the lower casing and then spent a little while trying to figure out why the option had no effect. jack. From sedinin@REDACTED Tue Jan 6 09:20:00 2009 From: sedinin@REDACTED (Andrey Sedinin) Date: Tue, 6 Jan 2009 10:20:00 +0200 Subject: [erlang-bugs] Possible bug in xmerl_xsd (validating XML using XSD schema file). Message-ID: Hi, i guess it is a bug: Validate XML using schema: {ok, State } = xmerl_xsd:process_schema("test.xsd"), {Entity ,_} = xmerl_scan:file("test.xml"), xmerl_xsd:validate(Entity, State). Schema: XML: I think it should validate. Possible values: Valid Invalid but last one give an error: {error,[{[],xmerl_xsd, {empty_content_not_allowed,[{enumeration,"Valid"}, {enumeration,"Invalid"}, {enumeration,[]}]}}]} May be i wrong? Also posted here: http://www.erlang.org/pipermail/erlang-questions/2008-December/040744.html I use R12B-5 on Mac OS X 10.5.6. -- Sedinin -- ??????? From c.romain@REDACTED Thu Jan 8 02:09:22 2009 From: c.romain@REDACTED (cyril Romain) Date: Thu, 08 Jan 2009 02:09:22 +0100 Subject: [erlang-bugs] code:load_abs/1 fails for packaged modules In-Reply-To: <495800C5.6050709@laposte.net> References: <495800C5.6050709@laposte.net> Message-ID: <49655242.4020308@laposte.net> cyril Romain wrote: > _FixSuggestions_: > I think in code_server.erl the load_abs/3 function should be fix so that it: > - Successively calls try_load_module with mymodule, to.mymodule, > path.to.mymodule, stopping on sucess. Not so elegant though... > - Calls try_load_module with mymodule (it actually does). But if the > module name in object code does match mymodule, try_load_module with the > module name found in object code. So that there is at most 2 calls of > try_load_module. Problem: the object code (and the module name) is read > by a C function (in beam_load.c) and it seems not straightforward to let > Erlang know about the module name read in that object code. > - Reading the file once, and use the module name defined within; > avoiding multiple call to try_load_module. Better solution, but is it > possible ? > Here is a patch following 1st suggestion: http://www.erlang.org/pipermail/erlang-patches/2009-January/000359.html From ingela@REDACTED Thu Jan 8 09:44:45 2009 From: ingela@REDACTED (Ingela Anderton Andin) Date: Thu, 08 Jan 2009 09:44:45 +0100 Subject: [erlang-bugs] ssh:connect() documentation In-Reply-To: <46167e6a0812221647y739015ay72d10469e5d62204@mail.gmail.com> References: <46167e6a0812221647y739015ay72d10469e5d62204@mail.gmail.com> Message-ID: <4965BCFD.50705@erix.ericsson.se> Hi! Thank you for reporting this, in the latest code (not yet released) however the option name corresponds to the documentation. Regards Ingela Erlang/OTP - Ericssson Anton Krasovsky wrote: > Documentation for ssh:connect() says: > > {connect_timeout, Milliseconds | infinity} > Sets the default timeout when trying to connect to. > > however the actual option is 'timeout'. > > anton > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://www.erlang.org/mailman/listinfo/erlang-bugs > > From bertil.karlsson@REDACTED Thu Jan 8 10:29:23 2009 From: bertil.karlsson@REDACTED (Bertil Karlsson) Date: Thu, 08 Jan 2009 10:29:23 +0100 Subject: [erlang-bugs] Possible bug in xmerl_xsd (validating XML using XSD schema file). In-Reply-To: References: Message-ID: <4965C773.2040203@ericsson.com> Hi, this is a bug that will be fixed as soon as possible. /Bertil Andrey Sedinin wrote: > Hi, > > i guess it is a bug: > > Validate XML using schema: > > {ok, State } = xmerl_xsd:process_schema("test.xsd"), > {Entity ,_} = xmerl_scan:file("test.xml"), > xmerl_xsd:validate(Entity, State). > > Schema: > > > > elementFormDefault="qualified" attributeFormDefault="unqualified"> > > > > > > > > > > > > > > > > > > XML: > > > > > > I think it should validate. Possible values: > > Valid > Invalid > > > but last one give an error: > > {error,[{[],xmerl_xsd, > {empty_content_not_allowed,[{enumeration,"Valid"}, > {enumeration,"Invalid"}, > {enumeration,[]}]}}]} > > > May be i wrong? > Also posted here: http://www.erlang.org/pipermail/erlang-questions/2008-December/040744.html > > I use R12B-5 on Mac OS X 10.5.6. > > > -- > Sedinin > > -- > ??????? > > > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://www.erlang.org/mailman/listinfo/erlang-bugs From geoff.cant@REDACTED Thu Jan 8 15:36:16 2009 From: geoff.cant@REDACTED (Geoff Cant) Date: Thu, 08 Jan 2009 15:36:16 +0100 Subject: [erlang-bugs] R12B-3/64bit/smp Stuck disk_log_server Message-ID: Hi all, Mats Cronqvist suggested I take this one up on erlang-bugs. What follows is a rough transcript of a debugging session in which we suspect that the reason an ejabberd node cannot dump mnesia logs is due to the disk_log_server process being impossibly stuck in gen_server:loop/6. It would be good if someone could confirm for me that my reasoning is correct (or at least plausible) that the disk_log_server is stuck, that this is the reason why mnesia can't dump logs and that the disk_log_server is stuck in a seemingly impossible way. The client on whose cluster this occurred has seen this problem before, so we may get another chance at live debugging sometime in the near future. I would greatly appreciate any suggestions as to additional debugging techniques I could try if this problem recurs. Thank you, --Geoff Cant The erlang version information is "Erlang (BEAM) emulator version 5.6.3 [source] [64-bit] [smp:8] [async-threads:0] [hipe] [kernel-poll:false]" - the stock Debian erlang-hipe-base from lenny on amd64 hardware. (in the transcript nodenames and file paths have been slightly but consistently rewritten to obscure some private network information) We tried mnesia:dump_log() which hung, so we tried to figure out why. mnesia_controller:get_workers(2000) => {workers,[],[],<0.22676.260>} process_info(<0.22676.260>) => [{current_function,{gen,wait_resp_mon,3}}, {initial_call,{mnesia_controller,dump_and_reply,2}}, {status,waiting}, {message_queue_len,0}, {messages,[]}, {links,[<0.116.0>]}, {dictionary,[]}, {trap_exit,false}, {error_handler,error_handler}, {priority,normal}, {group_leader,<0.57.0>}, {total_heap_size,233}, {heap_size,233}, {stack_size,21}, {reductions,4311}, {garbage_collection,[{fullsweep_after,0},{minor_gcs,0}]}, {suspending,[]}] Backtrace <0.22676.260>: Program counter: 0x00007f61c0e0c2a8 (gen:wait_resp_mon/3 + 64) CP: 0x00007f61c43645d8 (gen_server:call/3 + 160) arity = 0 0x00007f60f844c108 Return addr 0x00007f61c43645d8 (gen_server:call/3 + 160) y(0) infinity y(1) #Ref<0.0.992.227032> y(2) 'ejabberd@REDACTED' 0x00007f60f844c128 Return addr 0x00007f61c049f1d0 (mnesia_log:save_decision_tab/1 + 248) y(0) infinity y(1) {close_log,decision_tab} y(2) <0.62.0> y(3) Catch 0x00007f61c43645d8 (gen_server:call/3 + 160) 0x00007f60f844c150 Return addr 0x00007f61c03c6ec8 (mnesia_dumper:perform_dump/2 + 1648) y(0) "/fake/path/ejabberd/DECISION_TAB.TMP" y(1) [] 0x00007f60f844c168 Return addr 0x00007f61c056bdd0 (mnesia_controller:dump_and_reply/2 + 152) y(0) [] y(1) [] y(2) [] y(3) 15 y(4) [] y(5) [] 0x00007f60f844c1a0 Return addr 0x000000000084bd18 () y(0) <0.116.0> Here the log dumping process appears to be waiting on gen_server:call(mnesia_monitor, {close_log,decision_tab}). process_info(<0.62.0>) => [{registered_name,mnesia_monitor}, {current_function,{disk_log,monitor_request,2}}, {initial_call,{proc_lib,init_p,5}}, {status,waiting}, {message_queue_len,34}, {messages,[{nodeup,'gc@REDACTED'}, {nodedown,'gc@REDACTED'}, {nodeup,'fakenode1-16-26-06@REDACTED'}, {nodedown,'fakenode1-16-26-06@REDACTED'}, {nodeup,'fakenode1-16-27-20@REDACTED'}, {nodedown,'fakenode1-16-27-20@REDACTED'}, {nodeup,'fakenode1-16-29-25@REDACTED'}, {nodedown,'fakenode1-16-29-25@REDACTED'}, {nodeup,'gc@REDACTED'}, {nodedown,'gc@REDACTED'}, {nodeup,'fakenode2-16-36-53@REDACTED'}, {nodeup,'gc@REDACTED'}, {nodedown,'gc@REDACTED'}, {nodeup,'gc@REDACTED'}, {nodedown,'gc@REDACTED'}, {nodeup,'gc@REDACTED'}, {nodedown,'gc@REDACTED'}, {nodeup,'gc@REDACTED'}, {nodedown,'gc@REDACTED'}, {nodeup,...}, {...}|...]}, {links,[<6749.62.0>,<6753.62.0>,<0.111.0>,<0.22677.260>, <6752.104.0>,<6747.62.0>,<6748.62.0>,<0.61.0>,<6751.62.0>, <6750.62.0>,<0.52.0>]}, {dictionary,[{'$ancestors',[mnesia_kernel_sup,mnesia_sup, <0.58.0>]}, {'$initial_call',{gen,init_it, [gen_server,<0.61.0>,<0.61.0>, {local,mnesia_monitor}, mnesia_monitor, [<0.61.0>], [{timeout,infinity}]]}}]}, {trap_exit,true}, {error_handler,error_handler}, {priority,normal}, {group_leader,<0.57.0>}, {total_heap_size,377}, {heap_size,377}, {stack_size,20}, {reductions,2326000}, {garbage_collection,[{fullsweep_after,0},{minor_gcs,0}]}, {suspending,[]}] We didn't take a backtrace of mnesia_monitor, but {current_function,{disk_log,monitor_request,2}} led us to think that mnesia_monitor was trying to close the decision_tab log file, so we tried to find out which process that was. At this point, disk_log:info(decision_tab) hung, so we tried disk_log_server:get_log_pids(decision_tab) which gave us {local,<0.22681.260>}. Backtrace <0.22681.260>: Program counter: 0x00007f61c0e0c2a8 (gen:wait_resp_mon/3 + 64) CP: 0x00007f61c43645d8 (gen_server:call/3 + 160) arity = 0 0x00007f60f75818a0 Return addr 0x00007f61c43645d8 (gen_server:call/3 + 160) y(0) infinity y(1) #Ref<0.0.992.227035> y(2) 'ejabberd@REDACTED' 0x00007f60f75818c0 Return addr 0x00007f61c03b1610 (disk_log:do_exit/4 + 440) y(0) infinity y(1) {close,<0.22681.260>} y(2) disk_log_server y(3) Catch 0x00007f61c43645d8 (gen_server:call/3 + 160) 0x00007f60f75818e8 Return addr 0x00007f61c0e23fe8 (proc_lib:init_p/5 + 400) y(0) normal y(1) [] y(2) <0.62.0> y(3) ok 0x00007f60f7581910 Return addr 0x000000000084bd18 () y(0) Catch 0x00007f61c0e24008 (proc_lib:init_p/5 + 432) y(1) disk_log y(2) init y(3) [<0.70.0>,<0.71.0>] The disk_log process for 'decision_tab' was waiting for a reply from the disk_log_server to gen_server:call(disk_log_server, {close, self()}). Backtrace disk_log_server: Program counter: 0x00007f61c4365af8 (gen_server:loop/6 + 288) CP: 0x00007f61c0e23fe8 (proc_lib:init_p/5 + 400) arity = 0 0x00007f60fb043f78 Return addr 0x00007f61c0e23fe8 (proc_lib:init_p/5 + 400) y(0) [] y(1) infinity y(2) disk_log_server y(3) {state,[]} y(4) disk_log_server y(5) <0.30.0> 0x00007f60fb043fb0 Return addr 0x000000000084bd18 () y(0) Catch 0x00007f61c0e24008 (proc_lib:init_p/5 + 432) y(1) gen y(2) init_it y(3) [gen_server,<0.30.0>,<0.30.0>,{local,disk_log_server},disk_log_server,[],[]] process_info(whereis(disk_log_server)) => [{registered_name,disk_log_server}, {current_function,{gen_server,loop,6}}, {initial_call,{proc_lib,init_p,5}}, {status,waiting}, {message_queue_len,1}, {messages,[{'$gen_call',{<0.22681.260>,#Ref<0.0.992.227035>}, {close,<0.22681.260>}}]}, {links,[<0.111.0>,<0.22677.260>,<0.22681.260>,<0.30.0>]}, {dictionary,[{<0.111.0>,latest_log}, {<0.22677.260>,previous_log}, {'$ancestors',[kernel_safe_sup,kernel_sup,<0.8.0>]}, {<0.22681.260>,decision_tab}, {'$initial_call',{gen,init_it, [gen_server,<0.30.0>,<0.30.0>, {local,disk_log_server}, disk_log_server,[],[]]}}]}, {trap_exit,true}, {error_handler,error_handler}, {priority,normal}, {group_leader,<0.7.0>}, {total_heap_size,246}, {heap_size,233}, {stack_size,12}, {reductions,2366165}, {garbage_collection,[{fullsweep_after,0},{minor_gcs,0}]}, {suspending,[]}] Which appears to be doing something impossible - blocked in the receive statement in gen_server:loop/6 with a valid message in its queue. We used process_info to check the reductions a couple of times, but they stayed the same at 2366165 over a period of at least a minute. This line of investigation is all we have as the server has now been restarted. From raimo+erlang-bugs@REDACTED Thu Jan 8 15:55:42 2009 From: raimo+erlang-bugs@REDACTED (Raimo Niskanen) Date: Thu, 8 Jan 2009 15:55:42 +0100 Subject: [erlang-bugs] A bug in file:pread or not? In-Reply-To: References: Message-ID: <20090108145542.GA15705@erix.ericsson.se> On Mon, Dec 29, 2008 at 01:31:19AM +0100, Christian wrote: > Sending a LocNum to file:pread/2 where the Size is zero returns eof > rather than an empty binary. > > 2> {ok, File} = file:open("transpose.erl", [binary, read, raw]). > {ok,{file_descriptor,prim_file,{#Port<0.93>,7}}} > 3> file:pread(File, []). > {ok,[]} > 4> file:pread(File, [{10,10}]). > {ok,[<<"ranspose).">>]} > 5> file:pread(File, [{10,1}]). > {ok,[<<"r">>]} > 6> file:pread(File, [{10,0}]). > {ok,[eof]} > > If I do: > > 8> file:pread(File, [{10, 10}, {10, 1}, {10,0}]). > {ok,[<<"ranspose).">>,<<"r">>,eof]} This seems to be a bug. file:read was corrected for this in some special case, but file:pread was forgotten then. We will most probably fix it in a bugfix release. It is now inconsistent since file:position followed by file:read does not give the same as file:pread. > > Then I see this syscalls being performed using 'strace': > > pread64(7, "ranspose).", 10, 10) = 10 > pread64(7, "r", 1, 10) = 1 > pread64(7, "", 0, 10) = 0 > > So it looks like the syscall tell you that zero bytes were read. It is > just reported as having tried to read outside the file. > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://www.erlang.org/mailman/listinfo/erlang-bugs -- / Raimo Niskanen, Erlang/OTP, Ericsson AB From dgud@REDACTED Thu Jan 8 15:55:58 2009 From: dgud@REDACTED (Dan Gudmundsson) Date: Thu, 08 Jan 2009 15:55:58 +0100 Subject: [erlang-bugs] R12B-3/64bit/smp Stuck disk_log_server In-Reply-To: References: Message-ID: <496613FE.8040003@erix.ericsson.se> I have seen it on erl-q (most of us devs read that list to), your reasoning seems valid and currently I don't have any more ideas. I have asked the emulator guys to take a look. /Dan "Mnesia" G Geoff Cant wrote: > Hi all, Mats Cronqvist suggested I take this one up on erlang-bugs. What > follows is a rough transcript of a debugging session in which we suspect > that the reason an ejabberd node cannot dump mnesia logs is due to the > disk_log_server process being impossibly stuck in gen_server:loop/6. > > It would be good if someone could confirm for me that my reasoning is > correct (or at least plausible) that the disk_log_server is stuck, that > this is the reason why mnesia can't dump logs and that the > disk_log_server is stuck in a seemingly impossible way. > > The client on whose cluster this occurred has seen this problem before, > so we may get another chance at live debugging sometime in the near > future. > > I would greatly appreciate any suggestions as to additional debugging > techniques I could try if this problem recurs. > > Thank you, > --Geoff Cant > > > The erlang version information is "Erlang (BEAM) emulator version 5.6.3 > [source] [64-bit] [smp:8] [async-threads:0] [hipe] [kernel-poll:false]" > - the stock Debian erlang-hipe-base from lenny on amd64 hardware. > > (in the transcript nodenames and file paths have been slightly but > consistently rewritten to obscure some private network information) > > We tried mnesia:dump_log() which hung, so we tried to figure out why. > > mnesia_controller:get_workers(2000) => {workers,[],[],<0.22676.260>} > > process_info(<0.22676.260>) => > [{current_function,{gen,wait_resp_mon,3}}, > {initial_call,{mnesia_controller,dump_and_reply,2}}, > {status,waiting}, > {message_queue_len,0}, > {messages,[]}, > {links,[<0.116.0>]}, > {dictionary,[]}, > {trap_exit,false}, > {error_handler,error_handler}, > {priority,normal}, > {group_leader,<0.57.0>}, > {total_heap_size,233}, > {heap_size,233}, > {stack_size,21}, > {reductions,4311}, > {garbage_collection,[{fullsweep_after,0},{minor_gcs,0}]}, > {suspending,[]}] > > Backtrace <0.22676.260>: > Program counter: 0x00007f61c0e0c2a8 (gen:wait_resp_mon/3 + 64) > CP: 0x00007f61c43645d8 (gen_server:call/3 + 160) > arity = 0 > > 0x00007f60f844c108 Return addr 0x00007f61c43645d8 (gen_server:call/3 + 160) > y(0) infinity > y(1) #Ref<0.0.992.227032> > y(2) 'ejabberd@REDACTED' > > 0x00007f60f844c128 Return addr 0x00007f61c049f1d0 (mnesia_log:save_decision_tab/1 + 248) > y(0) infinity > y(1) {close_log,decision_tab} > y(2) <0.62.0> > y(3) Catch 0x00007f61c43645d8 (gen_server:call/3 + 160) > > 0x00007f60f844c150 Return addr 0x00007f61c03c6ec8 (mnesia_dumper:perform_dump/2 + 1648) > y(0) "/fake/path/ejabberd/DECISION_TAB.TMP" > y(1) [] > > 0x00007f60f844c168 Return addr 0x00007f61c056bdd0 (mnesia_controller:dump_and_reply/2 + 152) > y(0) [] > y(1) [] > y(2) [] > y(3) 15 > y(4) [] > y(5) [] > > 0x00007f60f844c1a0 Return addr 0x000000000084bd18 () > y(0) <0.116.0> > > Here the log dumping process appears to be waiting on > gen_server:call(mnesia_monitor, {close_log,decision_tab}). > > process_info(<0.62.0>) => > [{registered_name,mnesia_monitor}, > {current_function,{disk_log,monitor_request,2}}, > {initial_call,{proc_lib,init_p,5}}, > {status,waiting}, > {message_queue_len,34}, > {messages,[{nodeup,'gc@REDACTED'}, > {nodedown,'gc@REDACTED'}, > {nodeup,'fakenode1-16-26-06@REDACTED'}, > {nodedown,'fakenode1-16-26-06@REDACTED'}, > {nodeup,'fakenode1-16-27-20@REDACTED'}, > {nodedown,'fakenode1-16-27-20@REDACTED'}, > {nodeup,'fakenode1-16-29-25@REDACTED'}, > {nodedown,'fakenode1-16-29-25@REDACTED'}, > {nodeup,'gc@REDACTED'}, > {nodedown,'gc@REDACTED'}, > {nodeup,'fakenode2-16-36-53@REDACTED'}, > {nodeup,'gc@REDACTED'}, > {nodedown,'gc@REDACTED'}, > {nodeup,'gc@REDACTED'}, > {nodedown,'gc@REDACTED'}, > {nodeup,'gc@REDACTED'}, > {nodedown,'gc@REDACTED'}, > {nodeup,'gc@REDACTED'}, > {nodedown,'gc@REDACTED'}, > {nodeup,...}, > {...}|...]}, > {links,[<6749.62.0>,<6753.62.0>,<0.111.0>,<0.22677.260>, > <6752.104.0>,<6747.62.0>,<6748.62.0>,<0.61.0>,<6751.62.0>, > <6750.62.0>,<0.52.0>]}, > {dictionary,[{'$ancestors',[mnesia_kernel_sup,mnesia_sup, > <0.58.0>]}, > {'$initial_call',{gen,init_it, > [gen_server,<0.61.0>,<0.61.0>, > {local,mnesia_monitor}, > mnesia_monitor, > [<0.61.0>], > [{timeout,infinity}]]}}]}, > {trap_exit,true}, > {error_handler,error_handler}, > {priority,normal}, > {group_leader,<0.57.0>}, > {total_heap_size,377}, > {heap_size,377}, > {stack_size,20}, > {reductions,2326000}, > {garbage_collection,[{fullsweep_after,0},{minor_gcs,0}]}, > {suspending,[]}] > > We didn't take a backtrace of mnesia_monitor, but > {current_function,{disk_log,monitor_request,2}} led us to think that > mnesia_monitor was trying to close the decision_tab log file, so we > tried to find out which process that was. At this point, > disk_log:info(decision_tab) hung, so we tried > disk_log_server:get_log_pids(decision_tab) which gave us > {local,<0.22681.260>}. > > Backtrace <0.22681.260>: > Program counter: 0x00007f61c0e0c2a8 (gen:wait_resp_mon/3 + 64) > CP: 0x00007f61c43645d8 (gen_server:call/3 + 160) > arity = 0 > > 0x00007f60f75818a0 Return addr 0x00007f61c43645d8 (gen_server:call/3 + 160) > y(0) infinity > y(1) #Ref<0.0.992.227035> > y(2) 'ejabberd@REDACTED' > > 0x00007f60f75818c0 Return addr 0x00007f61c03b1610 (disk_log:do_exit/4 + 440) > y(0) infinity > y(1) {close,<0.22681.260>} > y(2) disk_log_server > y(3) Catch 0x00007f61c43645d8 (gen_server:call/3 + 160) > > 0x00007f60f75818e8 Return addr 0x00007f61c0e23fe8 (proc_lib:init_p/5 + 400) > y(0) normal > y(1) [] > y(2) <0.62.0> > y(3) ok > > 0x00007f60f7581910 Return addr 0x000000000084bd18 () > y(0) Catch 0x00007f61c0e24008 (proc_lib:init_p/5 + 432) > y(1) disk_log > y(2) init > y(3) [<0.70.0>,<0.71.0>] > > The disk_log process for 'decision_tab' was waiting for a reply from the > disk_log_server to gen_server:call(disk_log_server, {close, self()}). > > Backtrace disk_log_server: > Program counter: 0x00007f61c4365af8 (gen_server:loop/6 + 288) > CP: 0x00007f61c0e23fe8 (proc_lib:init_p/5 + 400) > arity = 0 > > 0x00007f60fb043f78 Return addr 0x00007f61c0e23fe8 (proc_lib:init_p/5 + 400) > y(0) [] > y(1) infinity > y(2) disk_log_server > y(3) {state,[]} > y(4) disk_log_server > y(5) <0.30.0> > > 0x00007f60fb043fb0 Return addr 0x000000000084bd18 () > y(0) Catch 0x00007f61c0e24008 (proc_lib:init_p/5 + 432) > y(1) gen > y(2) init_it > y(3) > [gen_server,<0.30.0>,<0.30.0>,{local,disk_log_server},disk_log_server,[],[]] > > process_info(whereis(disk_log_server)) => > [{registered_name,disk_log_server}, > {current_function,{gen_server,loop,6}}, > {initial_call,{proc_lib,init_p,5}}, > {status,waiting}, > {message_queue_len,1}, > {messages,[{'$gen_call',{<0.22681.260>,#Ref<0.0.992.227035>}, > {close,<0.22681.260>}}]}, > {links,[<0.111.0>,<0.22677.260>,<0.22681.260>,<0.30.0>]}, > {dictionary,[{<0.111.0>,latest_log}, > {<0.22677.260>,previous_log}, > {'$ancestors',[kernel_safe_sup,kernel_sup,<0.8.0>]}, > {<0.22681.260>,decision_tab}, > {'$initial_call',{gen,init_it, > [gen_server,<0.30.0>,<0.30.0>, > {local,disk_log_server}, > disk_log_server,[],[]]}}]}, > {trap_exit,true}, > {error_handler,error_handler}, > {priority,normal}, > {group_leader,<0.7.0>}, > {total_heap_size,246}, > {heap_size,233}, > {stack_size,12}, > {reductions,2366165}, > {garbage_collection,[{fullsweep_after,0},{minor_gcs,0}]}, > {suspending,[]}] > > Which appears to be doing something impossible - blocked in the receive > statement in gen_server:loop/6 with a valid message in its queue. We > used process_info to check the reductions a couple of times, but they > stayed the same at 2366165 over a period of at least a minute. > > This line of investigation is all we have as the server has now been > restarted. > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://www.erlang.org/mailman/listinfo/erlang-bugs > From geoff.cant@REDACTED Thu Jan 8 18:44:59 2009 From: geoff.cant@REDACTED (Geoff Cant) Date: Thu, 08 Jan 2009 18:44:59 +0100 Subject: [erlang-bugs] R12B-3/64bit/smp Stuck disk_log_server In-Reply-To: (Geoff Cant's message of "Thu, 08 Jan 2009 15:36:16 +0100") References: Message-ID: Hi all, we just took another look at the cluster and discovered another stuck gen_server. This time we sent it a bogus message of '$ignore_me' - the process then woke up, processed the first message in its queue (an ejabberd internal message) and exited (as expected from the ejabberd code). It appears that this bug causes processes to sometimes not get scheduled in when they receive a message. It seems to strike randomly and subsequent messages cause the process to be scheduled properly. Most of the time this doesn't cause major problems as the affected process will receive another message in the course of normal events and will now run normally. However, sometimes this strikes the wrong process in just the wrong way (the disk_log_server case) and we get visible error behaviour. This problem has been discovered on two different machines in the same ejabberd cluster, so I don't think this is a heisenbug due to bad RAM. We're going to try replicating this with a tsung test of the same emulator package (http://packages.debian.org/lenny/erlang-base-hipe 1:12.b.3-dfsg-4) and then see if the same problem exists with a source compile of R12B-5. Thanks, --Geoff Cant The debugging session transcript follows. Running [ Pid || Pid <- erlang:processes(), element(2, erlang:process_info(Pid, current_function)) =:= {gen_server,loop,6}, element(2, erlang:process_info(Pid, status)) =:= waiting, length(element(2, erlang:process_info(Pid, message_queue))) > 0]. gave us the process <0.19313.279>: (ejabberd@REDACTED)10> process_info(pid(0,19313,279)). [{current_function,{gen_server,loop,6}}, {initial_call,{proc_lib,init_p,5}}, {status,waiting}, {message_queue_len,1}, {messages,[{timeout,#Ref<0.0.1009.52090>,activate}]}, {links,[#Port<0.15757334>,<0.235.0>,#Port<0.15757327>]}, {dictionary,[{'$ancestors',[ejabberd_receiver_sup, ejabberd_sup,<0.37.0>]}, {'$initial_call',{gen,init_it, [gen_server,<0.235.0>,<0.235.0>,ejabberd_receiver, [#Port<0.15757327>,gen_tcp,none,131072,<0.19312.279>], []]}}]}, {trap_exit,false}, {error_handler,error_handler}, {priority,normal}, {group_leader,<0.36.0>}, {total_heap_size,987}, {heap_size,987}, {stack_size,12}, {reductions,922822}, {garbage_collection,[{fullsweep_after,0},{minor_gcs,0}]}, {suspending,[]}] This process stayed at {reductions,922822} for over a minute. It was sitting on a backtrace of: Program counter: 0x00007f3374b86af8 (gen_server:loop/6 + 288) CP: 0x00007f3371644fe8 (proc_lib:init_p/5 + 400) arity = 0 0x00007f32d07fd988 Return addr 0x00007f3371644fe8 (proc_lib:init_p/5 + 400) y(0) [] y(1) infinity y(2) ejabberd_receiver y(3) {state,{tlssock,#Port<0.15757327>,#Port<0.15757329>},tls,{maxrate,32768,3.247837e+04,1231372801805878},<0.19312.279>,131072,{xml_stream_state,<0.19312.279>,#Port<0.15757334>,[{xmlelement,"stream:stream",[{"to","fake.domain"},{"xmlns","jabber:client"},{"xmlns:stream","http://etherx.jabber.org/streams"},{"xml:lang","de"},{"version","1.0"}],[]}],0,131072},infinity} y(4) <0.19313.279> y(5) <0.235.0> 0x00007f32d07fd9c0 Return addr 0x000000000084bd18 () y(0) Catch 0x00007f3371645008 (proc_lib:init_p/5 + 432) y(1) gen y(2) init_it y(3) [gen_server,<0.235.0>,<0.235.0>,ejabberd_receiver,[#Port<0.15757327>,gen_tcp,none,131072,<0.19312.279>],[]] As this process ignores messages it doesn't understand, we sent it a bogus message: pid(0,19313,279) ! '$ignore_me'. The process then logged: =ERROR REPORT==== 2009-01-08 17:33:37 === E(<0.19313.279>:ejabberd_receiver:264): ejabberd_reciever:activate_socket missed the tcp_closed event before exiting. This is the expected behaviour on receiving a message like {timeout,#Ref<0.0.1009.52090>,activate} - the one in the queue while it was stuck before we sent the '$ignore_me' message. So, it appears that this bug causes processes to sometimes not get scheduled in when they receive a message. It seems to strike randomly and subsequent messages cause the process to be scheduled properly. From masse@REDACTED Thu Jan 8 21:47:58 2009 From: masse@REDACTED (mats cronqvist) Date: Thu, 08 Jan 2009 21:47:58 +0100 Subject: [erlang-bugs] R12B-3/64bit/smp Stuck disk_log_server In-Reply-To: (Geoff Cant's message of "Thu\, 08 Jan 2009 18\:44\:59 +0100") References: Message-ID: <87wsd5ikxt.fsf@dixie.cronqvi.st> Geoff Cant writes: > Hi all, we just took another look at the cluster and discovered another > stuck gen_server. This time we sent it a bogus message of '$ignore_me' - > the process then woke up, processed the first message in its queue (an > ejabberd internal message) and exited (as expected from the ejabberd > code). > > It appears that this bug causes processes to sometimes not get > scheduled in when they receive a message. It seems to strike randomly > and subsequent messages cause the process to be scheduled properly. Ouch. You can't trust anyone these days. At least the message wasn't dropped or out of order. mats From rickard.s.green@REDACTED Mon Jan 12 13:46:22 2009 From: rickard.s.green@REDACTED (Rickard Green) Date: Mon, 12 Jan 2009 13:46:22 +0100 Subject: [erlang-bugs] R12B-3/64bit/smp Stuck disk_log_server In-Reply-To: References: Message-ID: <496B3B9E.3070407@ericsson.com> Hi Geoff, I've looked at this and found a bug that may have caused this. When a process garbage collect another process and the process being garbage collected also receives a message during the garbage collect, the process being garbage collected can end up in the state that you described. This kind of garbage collect only happen when someone calls the garbage_collect/1 BIF or when code is purged. In the case with the disk_log server being stuck I think we can rule out the purge, i.e., if it is this bug that caused your problem another process must have garbage collected the disk_log server via the garbage_collect/1 BIF. Do you have any code that may have garbage collected the disk_log server via the garbage_collect/1 BIF? The garbage collect may also have been done explicitly in the shell. Regards, Rickard Green, Erlang/OTP, Ericsson AB. Geoff Cant wrote: > Hi all, we just took another look at the cluster and discovered another > stuck gen_server. This time we sent it a bogus message of '$ignore_me' - > the process then woke up, processed the first message in its queue (an > ejabberd internal message) and exited (as expected from the ejabberd > code). > > It appears that this bug causes processes to sometimes not get > scheduled in when they receive a message. It seems to strike randomly > and subsequent messages cause the process to be scheduled properly. > > Most of the time this doesn't cause major problems as the affected > process will receive another message in the course of normal events and > will now run normally. However, sometimes this strikes the wrong process > in just the wrong way (the disk_log_server case) and we get visible > error behaviour. > > This problem has been discovered on two different machines in the same > ejabberd cluster, so I don't think this is a heisenbug due to bad RAM. > > We're going to try replicating this with a tsung test of the same > emulator package (http://packages.debian.org/lenny/erlang-base-hipe > 1:12.b.3-dfsg-4) and then see if the same problem exists with a source > compile of R12B-5. > > Thanks, > --Geoff Cant > > The debugging session transcript follows. > > Running > [ Pid > || Pid <- erlang:processes(), > element(2, erlang:process_info(Pid, current_function)) =:= {gen_server,loop,6}, > element(2, erlang:process_info(Pid, status)) =:= waiting, > length(element(2, erlang:process_info(Pid, message_queue))) > 0]. > > gave us the process <0.19313.279>: > > (ejabberd@REDACTED)10> process_info(pid(0,19313,279)). > [{current_function,{gen_server,loop,6}}, > {initial_call,{proc_lib,init_p,5}}, > {status,waiting}, > {message_queue_len,1}, > {messages,[{timeout,#Ref<0.0.1009.52090>,activate}]}, > {links,[#Port<0.15757334>,<0.235.0>,#Port<0.15757327>]}, > {dictionary,[{'$ancestors',[ejabberd_receiver_sup, > ejabberd_sup,<0.37.0>]}, > {'$initial_call',{gen,init_it, > [gen_server,<0.235.0>,<0.235.0>,ejabberd_receiver, > [#Port<0.15757327>,gen_tcp,none,131072,<0.19312.279>], > []]}}]}, > {trap_exit,false}, > {error_handler,error_handler}, > {priority,normal}, > {group_leader,<0.36.0>}, > {total_heap_size,987}, > {heap_size,987}, > {stack_size,12}, > {reductions,922822}, > {garbage_collection,[{fullsweep_after,0},{minor_gcs,0}]}, > {suspending,[]}] > > This process stayed at {reductions,922822} for over a minute. > > It was sitting on a backtrace of: > Program counter: 0x00007f3374b86af8 (gen_server:loop/6 + 288) > CP: 0x00007f3371644fe8 (proc_lib:init_p/5 + 400) > arity = 0 > > 0x00007f32d07fd988 Return addr 0x00007f3371644fe8 (proc_lib:init_p/5 + 400) > y(0) [] > y(1) infinity > y(2) ejabberd_receiver > y(3) {state,{tlssock,#Port<0.15757327>,#Port<0.15757329>},tls,{maxrate,32768,3.247837e+04,1231372801805878},<0.19312.279>,131072,{xml_stream_state,<0.19312.279>,#Port<0.15757334>,[{xmlelement,"stream:stream",[{"to","fake.domain"},{"xmlns","jabber:client"},{"xmlns:stream","http://etherx.jabber.org/streams"},{"xml:lang","de"},{"version","1.0"}],[]}],0,131072},infinity} > y(4) <0.19313.279> > y(5) <0.235.0> > > 0x00007f32d07fd9c0 Return addr 0x000000000084bd18 () > y(0) Catch 0x00007f3371645008 (proc_lib:init_p/5 + 432) > y(1) gen > y(2) init_it > y(3) [gen_server,<0.235.0>,<0.235.0>,ejabberd_receiver,[#Port<0.15757327>,gen_tcp,none,131072,<0.19312.279>],[]] > > As this process ignores messages it doesn't understand, we sent it a > bogus message: > > pid(0,19313,279) ! '$ignore_me'. > > The process then logged: > =ERROR REPORT==== 2009-01-08 17:33:37 === > E(<0.19313.279>:ejabberd_receiver:264): ejabberd_reciever:activate_socket missed the tcp_closed event > > before exiting. This is the expected behaviour on receiving a message > like {timeout,#Ref<0.0.1009.52090>,activate} - the one in the queue > while it was stuck before we sent the '$ignore_me' message. > > So, it appears that this bug causes processes to sometimes not get > scheduled in when they receive a message. It seems to strike randomly > and subsequent messages cause the process to be scheduled properly. > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://www.erlang.org/mailman/listinfo/erlang-bugs > From geoff.cant@REDACTED Tue Jan 13 15:07:30 2009 From: geoff.cant@REDACTED (Geoff Cant) Date: Tue, 13 Jan 2009 15:07:30 +0100 Subject: [erlang-bugs] R12B-3/64bit/smp Stuck disk_log_server In-Reply-To: <496B3B9E.3070407@ericsson.com> (Rickard Green's message of "Mon, 12 Jan 2009 13:46:22 +0100") References: <496B3B9E.3070407@ericsson.com> Message-ID: Hi Rickard, thank you very much - this sounds correct to me. The customer cluster is still running a cron job that effectively does lists:foreach(fun erlang:garbage_collect/1, erlang:processes()) every ten minutes. This script was introduced as a stop-gap measure when running a heavily loaded ejabberd cluster on the 32bit VM where an out of memory condition would take down the node and then the entire cluster due to some problems with cross-node monitor storms. The cluster now runs on 64bit VMs so we'll revisit the memory consumption problem and avoid using erlang:garbage_collect/1. We'll disable the script and see if the problem recurs. Once again, thank you very much - I'm always very impressed by the level of support the OTP team gives the erlang community. Cheers, --Geoff Rickard Green writes: > Hi Geoff, > > I've looked at this and found a bug that may have caused this. When a > process garbage collect another process and the process being garbage > collected also receives a message during the garbage collect, the > process being garbage collected can end up in the state that you > described. > > This kind of garbage collect only happen when someone calls the > garbage_collect/1 BIF or when code is purged. In the case with the > disk_log server being stuck I think we can rule out the purge, i.e., > if it is this bug that caused your problem another process must have > garbage collected the disk_log server via the garbage_collect/1 > BIF. Do you have any code that may have garbage collected the disk_log > server via the garbage_collect/1 BIF? The garbage collect may also > have been done explicitly in the shell. > > Regards, > Rickard Green, Erlang/OTP, Ericsson AB. From rickard.s.green@REDACTED Tue Jan 13 15:25:41 2009 From: rickard.s.green@REDACTED (Rickard Green) Date: Tue, 13 Jan 2009 15:25:41 +0100 Subject: [erlang-bugs] R12B-3/64bit/smp Stuck disk_log_server In-Reply-To: References: <496B3B9E.3070407@ericsson.com> Message-ID: <496CA465.8080302@ericsson.com> I'll prepare a source patch fixing the problem. I wont be able to post it until tomorrow, though. Regards, Rickard Green, Erlang/OTP, Ericsson AB. Geoff Cant wrote: > Hi Rickard, thank you very much - this sounds correct to me. The > customer cluster is still running a cron job that effectively does > lists:foreach(fun erlang:garbage_collect/1, erlang:processes()) every > ten minutes. > > This script was introduced as a stop-gap measure when running a heavily > loaded ejabberd cluster on the 32bit VM where an out of memory condition > would take down the node and then the entire cluster due to some > problems with cross-node monitor storms. The cluster now runs on 64bit > VMs so we'll revisit the memory consumption problem and avoid using > erlang:garbage_collect/1. > > We'll disable the script and see if the problem recurs. > > Once again, thank you very much - I'm always very impressed by the level > of support the OTP team gives the erlang community. > > Cheers, > --Geoff > > > Rickard Green writes: > >> Hi Geoff, >> >> I've looked at this and found a bug that may have caused this. When a >> process garbage collect another process and the process being garbage >> collected also receives a message during the garbage collect, the >> process being garbage collected can end up in the state that you >> described. >> >> This kind of garbage collect only happen when someone calls the >> garbage_collect/1 BIF or when code is purged. In the case with the >> disk_log server being stuck I think we can rule out the purge, i.e., >> if it is this bug that caused your problem another process must have >> garbage collected the disk_log server via the garbage_collect/1 >> BIF. Do you have any code that may have garbage collected the disk_log >> server via the garbage_collect/1 BIF? The garbage collect may also >> have been done explicitly in the shell. >> >> Regards, >> Rickard Green, Erlang/OTP, Ericsson AB. > > From rickard.s.green@REDACTED Wed Jan 14 10:59:19 2009 From: rickard.s.green@REDACTED (Rickard Green) Date: Wed, 14 Jan 2009 10:59:19 +0100 Subject: [erlang-bugs] R12B-3/64bit/smp Stuck disk_log_server In-Reply-To: <496CA465.8080302@ericsson.com> References: <496B3B9E.3070407@ericsson.com> <496CA465.8080302@ericsson.com> Message-ID: <496DB777.5090400@ericsson.com> A source patch can now be downloaded: http://www.erlang.org/download/patches/otp_src_R12B-5_OTP-7738.patch http://www.erlang.org/download/patches/otp_src_R12B-5_OTP-7738.readme Regards, Rickard Green, Erlang/OTP, Ericsson AB. Rickard Green wrote: > I'll prepare a source patch fixing the problem. I wont be able to post > it until tomorrow, though. > > Regards, > Rickard Green, Erlang/OTP, Ericsson AB. > > > Geoff Cant wrote: >> Hi Rickard, thank you very much - this sounds correct to me. The >> customer cluster is still running a cron job that effectively does >> lists:foreach(fun erlang:garbage_collect/1, erlang:processes()) every >> ten minutes. >> >> This script was introduced as a stop-gap measure when running a heavily >> loaded ejabberd cluster on the 32bit VM where an out of memory condition >> would take down the node and then the entire cluster due to some >> problems with cross-node monitor storms. The cluster now runs on 64bit >> VMs so we'll revisit the memory consumption problem and avoid using >> erlang:garbage_collect/1. >> >> We'll disable the script and see if the problem recurs. >> >> Once again, thank you very much - I'm always very impressed by the level >> of support the OTP team gives the erlang community. >> >> Cheers, >> --Geoff >> >> >> Rickard Green writes: >> >>> Hi Geoff, >>> >>> I've looked at this and found a bug that may have caused this. When a >>> process garbage collect another process and the process being garbage >>> collected also receives a message during the garbage collect, the >>> process being garbage collected can end up in the state that you >>> described. >>> >>> This kind of garbage collect only happen when someone calls the >>> garbage_collect/1 BIF or when code is purged. In the case with the >>> disk_log server being stuck I think we can rule out the purge, i.e., >>> if it is this bug that caused your problem another process must have >>> garbage collected the disk_log server via the garbage_collect/1 >>> BIF. Do you have any code that may have garbage collected the disk_log >>> server via the garbage_collect/1 BIF? The garbage collect may also >>> have been done explicitly in the shell. >>> >>> Regards, >>> Rickard Green, Erlang/OTP, Ericsson AB. >> >> > From ad.sergey@REDACTED Wed Jan 14 22:38:39 2009 From: ad.sergey@REDACTED (Sergey S) Date: Wed, 14 Jan 2009 13:38:39 -0800 Subject: [erlang-bugs] Segmentation fault when running HIPE-compilled code Message-ID: Hello. While I was playing with +native option, I run into a bug in HIPE which leads to segmentation fault. To reproduce the bug just compile the code below using HIPE and run crash:start/0. Your will see the following: Erlang (BEAM) emulator version 5.6.5 [source] [smp:2] [async-threads:0] [hipe] [kernel-poll:false] Eshell V5.6.5 (abort with ^G) 1> crash:start(). # This message will be printed only once when compiled with +native Segmentation fault Here is the code (don't look for intention of this example, it has not got that): %--------------------------------------------------- -module(crash). -export([start/0]). start() -> spawn(fun() -> init() end). init() -> repeat(10, fun() -> void end), receive after infinity -> ok end. repeat(0, _) -> ok; repeat(N, Fun) -> io:format("# This message will be printed only once when compiled with +native~n"), Fun(), repeat(N - 1, Fun). % <------ It never will be called if you use HIPE %--------------------------------------------------- The same code compiled without +native flag works well to me. I'm using Erlang R12B5. When I saw that segfault, I tried to replace "receive" statement with "timer:sleep(999999)" call, and it helped! -- Sergey From mikpe@REDACTED Thu Jan 15 09:53:57 2009 From: mikpe@REDACTED (Mikael Pettersson) Date: Thu, 15 Jan 2009 09:53:57 +0100 Subject: [erlang-bugs] Segmentation fault when running HIPE-compilled code In-Reply-To: References: Message-ID: <18798.63909.14341.336262@harpo.it.uu.se> Sergey S writes: > Hello. > > While I was playing with +native option, I run into a bug in HIPE > which leads to segmentation fault. > > To reproduce the bug just compile the code below using HIPE and run > crash:start/0. Your will see the following: > > Erlang (BEAM) emulator version 5.6.5 [source] [smp:2] > [async-threads:0] [hipe] [kernel-poll:false] > > Eshell V5.6.5 (abort with ^G) > 1> crash:start(). > # This message will be printed only once when compiled with +native > Segmentation fault > > Here is the code (don't look for intention of this example, it has not > got that): > > %--------------------------------------------------- > -module(crash). > -export([start/0]). > > start() -> > spawn(fun() -> init() end). > > init() -> > repeat(10, fun() -> void end), > receive after infinity -> ok end. > > repeat(0, _) -> > ok; > repeat(N, Fun) -> > io:format("# This message will be printed only once when compiled > with +native~n"), > Fun(), > repeat(N - 1, Fun). % <------ It never will be called if you use HIPE > %--------------------------------------------------- > > The same code compiled without +native flag works well to me. I'm > using Erlang R12B5. Please give us some information about your system: 1. Which CPU type? Is it 32- or 64-bit? 2. Which C compiler and version? 3. Which OS / distribution / version? From ad.sergey@REDACTED Thu Jan 15 11:11:03 2009 From: ad.sergey@REDACTED (Sergey S) Date: Thu, 15 Jan 2009 02:11:03 -0800 Subject: [erlang-bugs] Segmentation fault when running HIPE-compilled code In-Reply-To: <18798.63909.14341.336262@harpo.it.uu.se> References: <18798.63909.14341.336262@harpo.it.uu.se> Message-ID: Hello. I reproduced this bug on two separate computers running the same software. > Please give us some information about your system: > 1. Which CPU type? Is it 32- or 64-bit? 32-bit (i686) > 2. Which C compiler and version? GCC 4.3.2 > 3. Which OS / distribution / version? Up-to-date Archinux i686. -- Sergey. From mikpe@REDACTED Thu Jan 15 11:35:05 2009 From: mikpe@REDACTED (Mikael Pettersson) Date: Thu, 15 Jan 2009 11:35:05 +0100 Subject: [erlang-bugs] Segmentation fault when running HIPE-compilled code In-Reply-To: References: <18798.63909.14341.336262@harpo.it.uu.se> Message-ID: <18799.4441.149848.485106@harpo.it.uu.se> Sergey S writes: > Hello. > > I reproduced this bug on two separate computers running the same software. > > > Please give us some information about your system: > > 1. Which CPU type? Is it 32- or 64-bit? > 32-bit (i686) > > > 2. Which C compiler and version? > GCC 4.3.2 > > > 3. Which OS / distribution / version? > Up-to-date Archinux i686. Ok. I'll take a look at this issue tomorrow. /Mikael From ville@REDACTED Wed Jan 14 16:01:16 2009 From: ville@REDACTED (Ville Koivula) Date: Wed, 14 Jan 2009 17:01:16 +0200 Subject: [erlang-bugs] Bug in filename:dirname? Message-ID: Hi, Why is filename:dirname working differently than UNIX equivalent? koivula@REDACTED:~ % dirname "/foo" / koivula@REDACTED:~ % dirname "/foo/" / vs. 1> filename:dirname("/foo"). "/" 2> filename:dirname("/foo/"). "/foo" Best regards, Ville Koivula From ulf@REDACTED Fri Jan 16 11:18:49 2009 From: ulf@REDACTED (Ulf Wiger) Date: Fri, 16 Jan 2009 11:18:49 +0100 Subject: [erlang-bugs] R12B-3/64bit/smp Stuck disk_log_server In-Reply-To: <496B3B9E.3070407@ericsson.com> References: <496B3B9E.3070407@ericsson.com> Message-ID: <8209f740901160218v5197dc94nd1510d176a545151@mail.gmail.com> Hi Rickard, Which versions of OTP seem to have this bug? BR, Ulf W 2009/1/12 Rickard Green : > Hi Geoff, > > I've looked at this and found a bug that may have caused this. When a > process garbage collect another process and the process being garbage > collected also receives a message during the garbage collect, the > process being garbage collected can end up in the state that you described. > > This kind of garbage collect only happen when someone calls the > garbage_collect/1 BIF or when code is purged. In the case with the > disk_log server being stuck I think we can rule out the purge, i.e., if > it is this bug that caused your problem another process must have > garbage collected the disk_log server via the garbage_collect/1 BIF. Do > you have any code that may have garbage collected the disk_log server > via the garbage_collect/1 BIF? The garbage collect may also have been > done explicitly in the shell. > > Regards, > Rickard Green, Erlang/OTP, Ericsson AB. > > > Geoff Cant wrote: >> Hi all, we just took another look at the cluster and discovered another >> stuck gen_server. This time we sent it a bogus message of '$ignore_me' - >> the process then woke up, processed the first message in its queue (an >> ejabberd internal message) and exited (as expected from the ejabberd >> code). >> >> It appears that this bug causes processes to sometimes not get >> scheduled in when they receive a message. It seems to strike randomly >> and subsequent messages cause the process to be scheduled properly. >> >> Most of the time this doesn't cause major problems as the affected >> process will receive another message in the course of normal events and >> will now run normally. However, sometimes this strikes the wrong process >> in just the wrong way (the disk_log_server case) and we get visible >> error behaviour. >> >> This problem has been discovered on two different machines in the same >> ejabberd cluster, so I don't think this is a heisenbug due to bad RAM. >> >> We're going to try replicating this with a tsung test of the same >> emulator package (http://packages.debian.org/lenny/erlang-base-hipe >> 1:12.b.3-dfsg-4) and then see if the same problem exists with a source >> compile of R12B-5. >> >> Thanks, >> --Geoff Cant >> >> The debugging session transcript follows. >> >> Running >> [ Pid >> || Pid <- erlang:processes(), >> element(2, erlang:process_info(Pid, current_function)) =:= {gen_server,loop,6}, >> element(2, erlang:process_info(Pid, status)) =:= waiting, >> length(element(2, erlang:process_info(Pid, message_queue))) > 0]. >> >> gave us the process <0.19313.279>: >> >> (ejabberd@REDACTED)10> process_info(pid(0,19313,279)). >> [{current_function,{gen_server,loop,6}}, >> {initial_call,{proc_lib,init_p,5}}, >> {status,waiting}, >> {message_queue_len,1}, >> {messages,[{timeout,#Ref<0.0.1009.52090>,activate}]}, >> {links,[#Port<0.15757334>,<0.235.0>,#Port<0.15757327>]}, >> {dictionary,[{'$ancestors',[ejabberd_receiver_sup, >> ejabberd_sup,<0.37.0>]}, >> {'$initial_call',{gen,init_it, >> [gen_server,<0.235.0>,<0.235.0>,ejabberd_receiver, >> [#Port<0.15757327>,gen_tcp,none,131072,<0.19312.279>], >> []]}}]}, >> {trap_exit,false}, >> {error_handler,error_handler}, >> {priority,normal}, >> {group_leader,<0.36.0>}, >> {total_heap_size,987}, >> {heap_size,987}, >> {stack_size,12}, >> {reductions,922822}, >> {garbage_collection,[{fullsweep_after,0},{minor_gcs,0}]}, >> {suspending,[]}] >> >> This process stayed at {reductions,922822} for over a minute. >> >> It was sitting on a backtrace of: >> Program counter: 0x00007f3374b86af8 (gen_server:loop/6 + 288) >> CP: 0x00007f3371644fe8 (proc_lib:init_p/5 + 400) >> arity = 0 >> >> 0x00007f32d07fd988 Return addr 0x00007f3371644fe8 (proc_lib:init_p/5 + 400) >> y(0) [] >> y(1) infinity >> y(2) ejabberd_receiver >> y(3) {state,{tlssock,#Port<0.15757327>,#Port<0.15757329>},tls,{maxrate,32768,3.247837e+04,1231372801805878},<0.19312.279>,131072,{xml_stream_state,<0.19312.279>,#Port<0.15757334>,[{xmlelement,"stream:stream",[{"to","fake.domain"},{"xmlns","jabber:client"},{"xmlns:stream","http://etherx.jabber.org/streams"},{"xml:lang","de"},{"version","1.0"}],[]}],0,131072},infinity} >> y(4) <0.19313.279> >> y(5) <0.235.0> >> >> 0x00007f32d07fd9c0 Return addr 0x000000000084bd18 () >> y(0) Catch 0x00007f3371645008 (proc_lib:init_p/5 + 432) >> y(1) gen >> y(2) init_it >> y(3) [gen_server,<0.235.0>,<0.235.0>,ejabberd_receiver,[#Port<0.15757327>,gen_tcp,none,131072,<0.19312.279>],[]] >> >> As this process ignores messages it doesn't understand, we sent it a >> bogus message: >> >> pid(0,19313,279) ! '$ignore_me'. >> >> The process then logged: >> =ERROR REPORT==== 2009-01-08 17:33:37 === >> E(<0.19313.279>:ejabberd_receiver:264): ejabberd_reciever:activate_socket missed the tcp_closed event >> >> before exiting. This is the expected behaviour on receiving a message >> like {timeout,#Ref<0.0.1009.52090>,activate} - the one in the queue >> while it was stuck before we sent the '$ignore_me' message. >> >> So, it appears that this bug causes processes to sometimes not get >> scheduled in when they receive a message. It seems to strike randomly >> and subsequent messages cause the process to be scheduled properly. >> >> _______________________________________________ >> erlang-bugs mailing list >> erlang-bugs@REDACTED >> http://www.erlang.org/mailman/listinfo/erlang-bugs >> > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://www.erlang.org/mailman/listinfo/erlang-bugs > From rickard.s.green@REDACTED Fri Jan 16 12:11:51 2009 From: rickard.s.green@REDACTED (Rickard Green S) Date: Fri, 16 Jan 2009 12:11:51 +0100 Subject: [erlang-bugs] R12B-3/64bit/smp Stuck disk_log_server References: <496B3B9E.3070407@ericsson.com> <8209f740901160218v5197dc94nd1510d176a545151@mail.gmail.com> Message-ID: <1E5CB28D9F205A4CA167F2173F2C693401260CAA@esealmw115.eemea.ericsson.se> Unfortunately all versions of the smp emulator. R11 as well as R12. Regards, Rickard Rickard Green, Erlang/OTP, Ericsson AB. ________________________________ Fr?n: ulf.wiger@REDACTED genom Ulf Wiger Skickat: fr 2009-01-16 11:18 Till: Rickard Green S Kopia: Geoff Cant; erlang-bugs@REDACTED ?mne: Re: [erlang-bugs] R12B-3/64bit/smp Stuck disk_log_server Hi Rickard, Which versions of OTP seem to have this bug? BR, Ulf W 2009/1/12 Rickard Green : > Hi Geoff, > > I've looked at this and found a bug that may have caused this. When a > process garbage collect another process and the process being garbage > collected also receives a message during the garbage collect, the > process being garbage collected can end up in the state that you described. > > This kind of garbage collect only happen when someone calls the > garbage_collect/1 BIF or when code is purged. In the case with the > disk_log server being stuck I think we can rule out the purge, i.e., if > it is this bug that caused your problem another process must have > garbage collected the disk_log server via the garbage_collect/1 BIF. Do > you have any code that may have garbage collected the disk_log server > via the garbage_collect/1 BIF? The garbage collect may also have been > done explicitly in the shell. > > Regards, > Rickard Green, Erlang/OTP, Ericsson AB. > > > Geoff Cant wrote: >> Hi all, we just took another look at the cluster and discovered another >> stuck gen_server. This time we sent it a bogus message of '$ignore_me' - >> the process then woke up, processed the first message in its queue (an >> ejabberd internal message) and exited (as expected from the ejabberd >> code). >> >> It appears that this bug causes processes to sometimes not get >> scheduled in when they receive a message. It seems to strike randomly >> and subsequent messages cause the process to be scheduled properly. >> >> Most of the time this doesn't cause major problems as the affected >> process will receive another message in the course of normal events and >> will now run normally. However, sometimes this strikes the wrong process >> in just the wrong way (the disk_log_server case) and we get visible >> error behaviour. >> >> This problem has been discovered on two different machines in the same >> ejabberd cluster, so I don't think this is a heisenbug due to bad RAM. >> >> We're going to try replicating this with a tsung test of the same >> emulator package (http://packages.debian.org/lenny/erlang-base-hipe >> 1:12.b.3-dfsg-4) and then see if the same problem exists with a source >> compile of R12B-5. >> >> Thanks, >> --Geoff Cant >> >> The debugging session transcript follows. >> >> Running >> [ Pid >> || Pid <- erlang:processes(), >> element(2, erlang:process_info(Pid, current_function)) =:= {gen_server,loop,6}, >> element(2, erlang:process_info(Pid, status)) =:= waiting, >> length(element(2, erlang:process_info(Pid, message_queue))) > 0]. >> >> gave us the process <0.19313.279>: >> >> (ejabberd@REDACTED)10> process_info(pid(0,19313,279)). >> [{current_function,{gen_server,loop,6}}, >> {initial_call,{proc_lib,init_p,5}}, >> {status,waiting}, >> {message_queue_len,1}, >> {messages,[{timeout,#Ref<0.0.1009.52090>,activate}]}, >> {links,[#Port<0.15757334>,<0.235.0>,#Port<0.15757327>]}, >> {dictionary,[{'$ancestors',[ejabberd_receiver_sup, >> ejabberd_sup,<0.37.0>]}, >> {'$initial_call',{gen,init_it, >> [gen_server,<0.235.0>,<0.235.0>,ejabberd_receiver, >> [#Port<0.15757327>,gen_tcp,none,131072,<0.19312.279>], >> []]}}]}, >> {trap_exit,false}, >> {error_handler,error_handler}, >> {priority,normal}, >> {group_leader,<0.36.0>}, >> {total_heap_size,987}, >> {heap_size,987}, >> {stack_size,12}, >> {reductions,922822}, >> {garbage_collection,[{fullsweep_after,0},{minor_gcs,0}]}, >> {suspending,[]}] >> >> This process stayed at {reductions,922822} for over a minute. >> >> It was sitting on a backtrace of: >> Program counter: 0x00007f3374b86af8 (gen_server:loop/6 + 288) >> CP: 0x00007f3371644fe8 (proc_lib:init_p/5 + 400) >> arity = 0 >> >> 0x00007f32d07fd988 Return addr 0x00007f3371644fe8 (proc_lib:init_p/5 + 400) >> y(0) [] >> y(1) infinity >> y(2) ejabberd_receiver >> y(3) {state,{tlssock,#Port<0.15757327>,#Port<0.15757329>},tls,{maxrate,32768,3.247837e+04,1231372801805878},<0.19312.279>,131072,{xml_stream_state,<0.19312.279>,#Port<0.15757334>,[{xmlelement,"stream:stream",[{"to","fake.domain"},{"xmlns","jabber:client"},{"xmlns:stream","http://etherx.jabber.org/streams"},{"xml:lang","de"},{"version","1.0"}],[]}],0,131072},infinity} >> y(4) <0.19313.279> >> y(5) <0.235.0> >> >> 0x00007f32d07fd9c0 Return addr 0x000000000084bd18 () >> y(0) Catch 0x00007f3371645008 (proc_lib:init_p/5 + 432) >> y(1) gen >> y(2) init_it >> y(3) [gen_server,<0.235.0>,<0.235.0>,ejabberd_receiver,[#Port<0.15757327>,gen_tcp,none,131072,<0.19312.279>],[]] >> >> As this process ignores messages it doesn't understand, we sent it a >> bogus message: >> >> pid(0,19313,279) ! '$ignore_me'. >> >> The process then logged: >> =ERROR REPORT==== 2009-01-08 17:33:37 === >> E(<0.19313.279>:ejabberd_receiver:264): ejabberd_reciever:activate_socket missed the tcp_closed event >> >> before exiting. This is the expected behaviour on receiving a message >> like {timeout,#Ref<0.0.1009.52090>,activate} - the one in the queue >> while it was stuck before we sent the '$ignore_me' message. >> >> So, it appears that this bug causes processes to sometimes not get >> scheduled in when they receive a message. It seems to strike randomly >> and subsequent messages cause the process to be scheduled properly. >> >> _______________________________________________ >> erlang-bugs mailing list >> erlang-bugs@REDACTED >> http://www.erlang.org/mailman/listinfo/erlang-bugs >> > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://www.erlang.org/mailman/listinfo/erlang-bugs > From mikpe@REDACTED Fri Jan 16 18:43:45 2009 From: mikpe@REDACTED (Mikael Pettersson) Date: Fri, 16 Jan 2009 18:43:45 +0100 Subject: [erlang-bugs] Segmentation fault when running HIPE-compilled code In-Reply-To: <18799.4441.149848.485106@harpo.it.uu.se> References: <18798.63909.14341.336262@harpo.it.uu.se> <18799.4441.149848.485106@harpo.it.uu.se> Message-ID: <18800.51025.11729.439954@harpo.it.uu.se> Mikael Pettersson writes: > Sergey S writes: > > Hello. > > > > I reproduced this bug on two separate computers running the same software. > > > > > Please give us some information about your system: > > > 1. Which CPU type? Is it 32- or 64-bit? > > 32-bit (i686) > > > > > 2. Which C compiler and version? > > GCC 4.3.2 > > > > > 3. Which OS / distribution / version? > > Up-to-date Archinux i686. > > Ok. I'll take a look at this issue tomorrow. I've been able to reproduce the bug, and it's memory corruption caused by an invalid optimisation performed by the compiler. I've notified the rest of the HiPE team about the issue and hopefully someone will know how to fix it (unfortunately it's in a part of the compiler I'm not familiar with). The combination of having a 'receive after infinity' after a heap allocation (the fun expression) is what's triggering the bug, so if you can put them in separate functions or move this one function to a non-natively compiled module you should be able to work around the bug for the time being. From ad.sergey@REDACTED Sat Jan 17 00:23:39 2009 From: ad.sergey@REDACTED (Sergey S) Date: Fri, 16 Jan 2009 15:23:39 -0800 Subject: [erlang-bugs] Segmentation fault when running HIPE-compilled code In-Reply-To: <18800.51025.11729.439954@harpo.it.uu.se> References: <18798.63909.14341.336262@harpo.it.uu.se> <18799.4441.149848.485106@harpo.it.uu.se> <18800.51025.11729.439954@harpo.it.uu.se> Message-ID: Hello. > Mikael Pettersson writes: > > I've notified the rest of the HiPE team about the issue and > hopefully someone will know how to fix it (unfortunately it's > in a part of the compiler I'm not familiar with). Thanks for that! I believe the people who are writing HiPE will fix that! -- Sergey From pguyot@REDACTED Mon Jan 19 23:11:41 2009 From: pguyot@REDACTED (Paul Guyot) Date: Mon, 19 Jan 2009 23:11:41 +0100 Subject: [erlang-bugs] Allow C nodes to be visible Message-ID: Hello, C nodes have limited functionalities. Some capabilities can be implemented using the current ei_connect interface, but these capabilities are not advertised when the node is connected to other nodes. In particular, C nodes are always hidden, and this prevents the inclusion of a global name service or of process groups, among other things. The attached patch is a minimal change to allow visible C nodes. The ei_connect and ei_accept are provided with a new variant that takes a flags parameter that determines which capabilities are to be exposed upon connection to a particular node. The new functions are: int ei_connect_tmo_flags(ei_cnode* ec, char *nodename, unsigned ms, unsigned flags); int ei_xconnect_tmo_flags(ei_cnode* ec, Erl_IpAddr adr, char *alivename, unsigned ms, unsigned flags); int ei_accept_tmo_flags(ei_cnode* ec, int lfd, ErlConnect *conp, unsigned ms, unsigned flags); All other functions have exactly the same behavior. In particular, the default flags are passed as they used to be whenever the original functions are used. To make a C node visible, the caller just needs to pass the default set of flags or-ed with DFLAG_PUBLISHED. I realize the proposed patch is minimalistic and the interface is very rough. There are two reasons. First, providing flags to ei_connect* and ei_accept* functions is experimental. It has many consequences, and a simple "visible" boolean would probably make the API user believe that making a C node visible is as simple as passing true, which of course it isn't. Second, a full implementation of the required protocols, including the global protocol, in C, would imply a large development we do not plan to undertake (our implementation is not in C) and a high maintenance cost. As a result, it seemed that a minimalistic, backward compatible and coherent change, is the easier to include upstream. Yet, we believe that such a patch would be useful for parallel efforts that consists in bridging erlang with other languages (e.g. .NET). Regards, Paul -------------- next part -------------- A non-text attachment was scrubbed... Name: patch-connect-flags.diff Type: application/octet-stream Size: 5671 bytes Desc: not available URL: -------------- next part -------------- From ad.sergey@REDACTED Tue Jan 20 00:41:43 2009 From: ad.sergey@REDACTED (Sergey S) Date: Mon, 19 Jan 2009 15:41:43 -0800 Subject: [erlang-bugs] A small omission in OTP Design Principles example 6.2.1 Message-ID: Hello. I don't think it will be the most important comment, but... I think that example illustrating the approach to create OTP compatible processes by using proc_lib and sys modules should contain system_code_change/4. "erl -man sys" says "The Module must export system_continue/3, system_terminate/4, and system_code_change/4 (see below)." -- Sergey From mikpe@REDACTED Wed Jan 21 21:33:39 2009 From: mikpe@REDACTED (Mikael Pettersson) Date: Wed, 21 Jan 2009 21:33:39 +0100 Subject: [erlang-bugs] Segmentation fault when running HIPE-compilled code In-Reply-To: References: Message-ID: <18807.34467.894973.40219@harpo.it.uu.se> Sergey S writes: > Hello. > > While I was playing with +native option, I run into a bug in HIPE > which leads to segmentation fault. > > To reproduce the bug just compile the code below using HIPE and run > crash:start/0. Your will see the following: > > Erlang (BEAM) emulator version 5.6.5 [source] [smp:2] > [async-threads:0] [hipe] [kernel-poll:false] > > Eshell V5.6.5 (abort with ^G) > 1> crash:start(). > # This message will be printed only once when compiled with +native > Segmentation fault > > Here is the code (don't look for intention of this example, it has not > got that): > > %--------------------------------------------------- > -module(crash). > -export([start/0]). > > start() -> > spawn(fun() -> init() end). > > init() -> > repeat(10, fun() -> void end), > receive after infinity -> ok end. > > repeat(0, _) -> > ok; > repeat(N, Fun) -> > io:format("# This message will be printed only once when compiled > with +native~n"), > Fun(), > repeat(N - 1, Fun). % <------ It never will be called if you use HIPE > %--------------------------------------------------- > > The same code compiled without +native flag works well to me. I'm > using Erlang R12B5. Thanks for reporting this bug. Here's the patch fixing this bug for R12B-5. There was an omission in the liveness information for the native compiler's RTL intermediate representation, which in this very specific case caused it to lose the fact that the heap pointer register was live into the recursive function calls, which in turn caused the 'fun() -> void end' function closure object to be corrupted. /Mikael Pettersson The HiPE group --- otp_src_R12B-5/lib/hipe/amd64/hipe_amd64_registers.erl.~1~ 2007-11-26 19:59:44.000000000 +0100 +++ otp_src_R12B-5/lib/hipe/amd64/hipe_amd64_registers.erl 2009-01-21 14:54:23.000000000 +0100 @@ -268,8 +268,7 @@ tailcall_clobbered() -> % tailcall crap | fp_call_clobbered()]. live_at_return() -> - [{?RAX,tagged} - ,{?RSP,untagged} + [{?RSP,untagged} ,{?PROC_POINTER,untagged} ,{?FCALLS,untagged} ,{?HEAP_LIMIT,untagged} --- otp_src_R12B-5/lib/hipe/rtl/hipe_rtl.erl.~1~ 2008-11-04 11:51:39.000000000 +0100 +++ otp_src_R12B-5/lib/hipe/rtl/hipe_rtl.erl 2009-01-21 14:54:36.000000000 +0100 @@ -882,15 +882,17 @@ args(I) -> #alub{} -> [alub_src1(I), alub_src2(I)]; #branch{} -> [branch_src1(I), branch_src2(I)]; #call{} -> + Args = call_arglist(I) ++ hipe_rtl_arch:call_used(), case call_is_known(I) of - false -> [call_fun(I)|call_arglist(I)]; - true -> call_arglist(I) + false -> [call_fun(I) | Args]; + true -> Args end; #comment{} -> []; #enter{} -> + Args = enter_arglist(I) ++ hipe_rtl_arch:tailcall_used(), case enter_is_known(I) of - false -> hipe_rtl_arch:add_ra_reg([enter_fun(I)|enter_arglist(I)]); - true -> hipe_rtl_arch:add_ra_reg(enter_arglist(I)) + false -> [enter_fun(I) | Args]; + true -> Args end; #fconv{} -> [fconv_src(I)]; #fixnumop{} -> [fixnumop_src(I)]; @@ -910,7 +912,7 @@ args(I) -> #move{} -> [move_src(I)]; #multimove{} -> multimove_srclist(I); #phi{} -> phi_args(I); - #return{} -> hipe_rtl_arch:add_ra_reg(return_varlist(I)); + #return{} -> return_varlist(I) ++ hipe_rtl_arch:return_used(); #store{} -> [store_base(I), store_offset(I), store_src(I)]; #switch{} -> [switch_src(I)] end. @@ -924,7 +926,7 @@ defines(Instr) -> #alu{} -> [alu_dst(Instr)]; #alub{} -> [alub_dst(Instr)]; #branch{} -> []; - #call{} -> call_dstlist(Instr); + #call{} -> call_dstlist(Instr) ++ hipe_rtl_arch:call_defined(); #comment{} -> []; #enter{} -> []; #fconv{} -> [fconv_dst(Instr)]; @@ -990,7 +992,7 @@ subst_uses(Subst, I) -> end; #comment{} -> I; - #enter{} -> %% XXX: Check why ra_reg is added in uses() but not updated here + #enter{} -> case enter_is_known(I) of false -> I0 = enter_fun_update(I, subst1(Subst, enter_fun(I))), --- otp_src_R12B-5/lib/hipe/rtl/hipe_rtl_arch.erl.~1~ 2008-06-10 14:47:41.000000000 +0200 +++ otp_src_R12B-5/lib/hipe/rtl/hipe_rtl_arch.erl 2009-01-21 14:56:26.000000000 +0100 @@ -22,9 +22,12 @@ heap_pointer/0, heap_limit/0, fcalls/0, - add_ra_reg/1, reg_name/1, is_precoloured/1, + call_defined/0, + call_used/0, + tailcall_used/0, + return_used/0, live_at_return/0, endianess/0, load_big_2/4, @@ -164,22 +167,6 @@ fcalls_from_pcb() -> Reg = hipe_rtl:mk_new_reg(), {pcb_load(Reg, ?P_FCALLS), Reg, pcb_store(?P_FCALLS, Reg)}. --spec(add_ra_reg/1 :: ([X]) -> [X]). - -add_ra_reg(Rest) -> - case get(hipe_target_arch) of - ultrasparc -> - [hipe_rtl:mk_reg(hipe_sparc_registers:return_address()) | Rest]; - powerpc -> - Rest; % do not include LR: it's not a normal register - arm -> - [hipe_rtl:mk_reg(hipe_arm_registers:lr()) | Rest]; - x86 -> - Rest; - amd64 -> - Rest - end. - reg_name(Reg) -> case get(hipe_target_arch) of ultrasparc -> @@ -225,6 +212,18 @@ is_precolored_regnum(RegNum) -> hipe_amd64_registers:is_precoloured(RegNum) end. +call_defined() -> + call_used(). + +call_used() -> + live_at_return(). + +tailcall_used() -> + call_used(). + +return_used() -> + tailcall_used(). + live_at_return() -> case get(hipe_target_arch) of ultrasparc -> --- otp_src_R12B-5/lib/hipe/x86/hipe_x86_registers.erl.~1~ 2007-11-26 19:58:49.000000000 +0100 +++ otp_src_R12B-5/lib/hipe/x86/hipe_x86_registers.erl 2009-01-21 14:53:33.000000000 +0100 @@ -224,14 +224,8 @@ all_x87_pseudos() -> {4,double}, {5,double}, {6,double}]. live_at_return() -> - [{?EAX,tagged} - %% XXX: should the following (fixed) regs be included or not? - ,{?ESP,untagged} + [{?ESP,untagged} ,{?PROC_POINTER,untagged} - %% Lets try not! - %% If these are included they will interfere with other - %% temps during regalloc, but regs FCALLS and HEAP_LIMIT - %% don't even exist at regalloc. ,{?FCALLS,untagged} ,{?HEAP_LIMIT,untagged} | ?LIST_HP_LIVE_AT_RETURN From ad.sergey@REDACTED Wed Jan 21 23:05:18 2009 From: ad.sergey@REDACTED (Sergey S) Date: Wed, 21 Jan 2009 14:05:18 -0800 Subject: [erlang-bugs] Segmentation fault when running HIPE-compilled code In-Reply-To: <18807.34467.894973.40219@harpo.it.uu.se> References: <18807.34467.894973.40219@harpo.it.uu.se> Message-ID: Hello. > Here's the patch fixing this bug for R12B-5. Thanks for the patch! -- Sergey From jason@REDACTED Sat Jan 24 18:59:53 2009 From: jason@REDACTED (Jason Davies) Date: Sat, 24 Jan 2009 17:59:53 +0000 Subject: [erlang-bugs] Bug in http:request(): No port set in automatically-added "Host:" header Message-ID: <88408FFC-FD95-4189-8922-02320F087E4D@jasondavies.com> There is a bug in inets http:request(): it automatically adds a "Host:" header to comply with HTTP/1.1 but it doesn't add the port number. This causes 301/302 redirects to fail on servers where the redirect URL is generated using the "Host:" request header. See http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.23 Thanks, -- Jason Davies www.jasondavies.com From lfredlund@REDACTED Mon Jan 26 13:43:17 2009 From: lfredlund@REDACTED (=?ISO-8859-1?Q?Lars-=C5ke_Fredlund?=) Date: Mon, 26 Jan 2009 13:43:17 +0100 Subject: [erlang-bugs] core_lint:module/1 problem Message-ID: <497DAFE5.9010003@fi.upm.es> Version: otp R12B-5 (not patched) Problem: Applying core_lint:module/1 to a core erlang module generated by compile:file(FileSpec,[to_core,binary] (without problems) results in the error message: *** Core Erlang ERROR in module schedule: illegal guard expression in reschedule/1 Source code and core erlang code for function attached. (apparently the checks for correct guards are too strict for try... guards). /Lars-Ake Fredlund -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: schedule.core URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: schedule.erl Type: text/x-erlang Size: 1557 bytes Desc: not available URL: From rvirding@REDACTED Mon Jan 26 13:58:04 2009 From: rvirding@REDACTED (Robert Virding) Date: Mon, 26 Jan 2009 13:58:04 +0100 Subject: [erlang-bugs] core_lint:module/1 problem In-Reply-To: <497DAFE5.9010003@fi.upm.es> References: <497DAFE5.9010003@fi.upm.es> Message-ID: <3dbc6d1c0901260458x16eb57bh85e9702c93812ea4@mail.gmail.com> 2009/1/26 Lars-?ke Fredlund > Version: otp R12B-5 (not patched) > Problem: > Applying core_lint:module/1 to a core erlang module generated by > compile:file(FileSpec,[to_core,binary] (without problems) > results in the error message: > *** Core Erlang ERROR in module schedule: illegal guard expression in > reschedule/1 > > Source code and core erlang code for function attached. > (apparently the checks for correct guards are too strict for try... > guards). > > /Lars-Ake Fredlund > Without having looked at the actual code I can say that some of the core support modules, core_lint and core_parse for example, don't always follow the latest core development. This is because they are not actually used by the compiler, it *knows* it's generated code is correct. Another problem I had with LFE is that some of the core optimisation passes assumed that the core module was generated in the same way as the from the erlang compiler, in some cases they couldn't handle general core. This has now been fixed. Basically, both of these are due to core erlang not really being a language in its own right but a pass in the compiler. Whether it should be like that is another question. Robert -------------- next part -------------- An HTML attachment was scrubbed... URL: From lfredlund@REDACTED Mon Jan 26 14:38:02 2009 From: lfredlund@REDACTED (=?ISO-8859-1?Q?Lars-=C5ke_Fredlund?=) Date: Mon, 26 Jan 2009 14:38:02 +0100 Subject: [erlang-bugs] core_lint:module/1 problem In-Reply-To: <3dbc6d1c0901260458x16eb57bh85e9702c93812ea4@mail.gmail.com> References: <497DAFE5.9010003@fi.upm.es> <3dbc6d1c0901260458x16eb57bh85e9702c93812ea4@mail.gmail.com> Message-ID: <497DBCBA.5080106@fi.upm.es> Robert Virding wrote: > 2009/1/26 Lars-?ke Fredlund > > > Version: otp R12B-5 (not patched) > Problem: > Applying core_lint:module/1 to a core erlang module generated by > compile:file(FileSpec,[to_core,binary] (without problems) > results in the error message: > *** Core Erlang ERROR in module schedule: illegal guard > expression in reschedule/1 > > Source code and core erlang code for function attached. > (apparently the checks for correct guards are too strict for > try... guards). > > /Lars-Ake Fredlund > > > Without having looked at the actual code I can say that some of the > core support modules, core_lint and core_parse for example, don't > always follow the latest core development. This is because they are > not actually used by the compiler, it *knows* it's generated code is > correct. > > Another problem I had with LFE is that some of the core optimisation > passes assumed that the core module was generated in the same way as > the from the erlang compiler, in some cases they couldn't handle > general core. This has now been fixed. > Yes, I don't check core_lint normally, but in the interest of improving things for the future I submitted the bug report. Robert, any experience on how the Core Erlang code should be written so that the optimisation passes optimise well? (or does things work ok now, without any need to adapt the code structure to the optimisers?) /Lars-Ake Fredlund From bgustavsson@REDACTED Tue Jan 27 14:40:07 2009 From: bgustavsson@REDACTED (Bjorn Gustavsson) Date: Tue, 27 Jan 2009 14:40:07 +0100 Subject: [erlang-bugs] core_lint:module/1 problem In-Reply-To: <497DAFE5.9010003@fi.upm.es> References: <497DAFE5.9010003@fi.upm.es> Message-ID: <6672d0160901270540r290b569by843ede5f14ccd53@mail.gmail.com> 2009/1/26 Lars-?ke Fredlund : > Version: otp R12B-5 (not patched) > Problem: > Applying core_lint:module/1 to a core erlang module generated by > compile:file(FileSpec,[to_core,binary] (without problems) > results in the error message: > *** Core Erlang ERROR in module schedule: illegal guard expression in > reschedule/1 > > Source code and core erlang code for function attached. > (apparently the checks for correct guards are too strict for try... guards). Your source code is not complete, so I can't run it through the compiler. However, for the R13 release I have extended the test suites to also test the core_lint pass and I have fixed all bugs exposed in core_lint as a result of that. /Bj?rn -- Bj?rn Gustavsson, Erlang/OTP, Ericsson AB From steven.charles.davis@REDACTED Tue Jan 27 15:04:32 2009 From: steven.charles.davis@REDACTED (Steve Davis) Date: Tue, 27 Jan 2009 08:04:32 -0600 Subject: [erlang-bugs] Crash on attempted use of float_to_binary/2 Message-ID: <497F1470.3000805@gmail.com> Hi, During my learning process I was trying to convert floats to binary, I inadvisedly tried the following: to_binary(X) when is_float(X) -> erlang:float_to_binary(X, 64). I do understand that this BIF has been removed, and it did throw an expected "erlang:float_to_binary/2 not defined" error BUT within 20 seconds of running that code my PC "bluescreened" (for the first time in over 3 years). I am not at all certain it's reproducible but I don't want to risk trying it again on my machine... but I do strongly suspect that the underlying c code for this bif remains, and this caused the observed result. System details: Erlang R12B-5/erts 5.6.5 Windows XP SP3 Acer Ferrari 3400 (a laptop) Regards, Steve From mikpe@REDACTED Tue Jan 27 17:29:54 2009 From: mikpe@REDACTED (Mikael Pettersson) Date: Tue, 27 Jan 2009 17:29:54 +0100 Subject: [erlang-bugs] Crash on attempted use of float_to_binary/2 In-Reply-To: <497F1470.3000805@gmail.com> References: <497F1470.3000805@gmail.com> Message-ID: <18815.13954.191620.129693@harpo.it.uu.se> Steve Davis writes: > Hi, > > During my learning process I was trying to convert floats to binary, I > inadvisedly tried the following: > > to_binary(X) when is_float(X) -> erlang:float_to_binary(X, 64). > > I do understand that this BIF has been removed, and it did throw an > expected "erlang:float_to_binary/2 not defined" error BUT within 20 > seconds of running that code my PC "bluescreened" (for the first time in > over 3 years). > > I am not at all certain it's reproducible but I don't want to risk > trying it again on my machine... but I do strongly suspect that the > underlying c code for this bif remains, and this caused the observed result. > > System details: > Erlang R12B-5/erts 5.6.5 > Windows XP SP3 > Acer Ferrari 3400 (a laptop) I am unable to reproduce anything but the benign "not defined" error with R12B-5 on Solaris 9, MacOSX 10.3, and Windows XP 64 Professional. There is no float_to_binary of any kind in R12B-5, the only reference to it is a documentation note that it has been removed. A Windows bluescreen can happen due to any number of reasons, mostly hardware or kernel/driver bugs, but a bug in the Erlang VM should not be able to trigger it (that would in itself be a kernel bug). IOW, I think this is pure coincidence. From steven.charles.davis@REDACTED Tue Jan 27 19:04:26 2009 From: steven.charles.davis@REDACTED (Steve Davis) Date: Tue, 27 Jan 2009 12:04:26 -0600 Subject: [erlang-bugs] Crash on attempted use of float_to_binary/2 In-Reply-To: <18815.13954.191620.129693@harpo.it.uu.se> References: <497F1470.3000805@gmail.com> <18815.13954.191620.129693@harpo.it.uu.se> Message-ID: <497F4CAA.7010503@gmail.com> Hi Mikael, It does rather sound like a coincidence, then - perhaps something else is going on with my machine. I'm sorry to have wasted your time unnecessarily. BR, /s Mikael Pettersson wrote: > Steve Davis writes: > > Hi, > > > > During my learning process I was trying to convert floats to binary, I > > inadvisedly tried the following: > > > > to_binary(X) when is_float(X) -> erlang:float_to_binary(X, 64). > > > > I do understand that this BIF has been removed, and it did throw an > > expected "erlang:float_to_binary/2 not defined" error BUT within 20 > > seconds of running that code my PC "bluescreened" (for the first time in > > over 3 years). > > > > I am not at all certain it's reproducible but I don't want to risk > > trying it again on my machine... but I do strongly suspect that the > > underlying c code for this bif remains, and this caused the observed result. > > > > System details: > > Erlang R12B-5/erts 5.6.5 > > Windows XP SP3 > > Acer Ferrari 3400 (a laptop) > > I am unable to reproduce anything but the benign "not defined" error > with R12B-5 on Solaris 9, MacOSX 10.3, and Windows XP 64 Professional. > > There is no float_to_binary of any kind in R12B-5, the only reference > to it is a documentation note that it has been removed. > > A Windows bluescreen can happen due to any number of reasons, mostly > hardware or kernel/driver bugs, but a bug in the Erlang VM should not > be able to trigger it (that would in itself be a kernel bug). > > IOW, I think this is pure coincidence. > From mikpe@REDACTED Tue Jan 27 20:40:52 2009 From: mikpe@REDACTED (Mikael Pettersson) Date: Tue, 27 Jan 2009 20:40:52 +0100 Subject: [erlang-bugs] Crash on attempted use of float_to_binary/2 In-Reply-To: <497F4CAA.7010503@gmail.com> References: <497F1470.3000805@gmail.com> <18815.13954.191620.129693@harpo.it.uu.se> <497F4CAA.7010503@gmail.com> Message-ID: <18815.25412.565359.953125@harpo.it.uu.se> Steve Davis writes: > It does rather sound like a coincidence, then - perhaps something else > is going on with my machine. I'm sorry to have wasted your time > unnecessarily. Don't worry about it. It's better to have a false bug report than to miss a report on an actual bug. /Mikael From adam@REDACTED Thu Jan 29 10:23:39 2009 From: adam@REDACTED (Adam Lindberg) Date: Thu, 29 Jan 2009 09:23:39 +0000 (GMT) Subject: [erlang-bugs] [link] Possible bug in io:fread Message-ID: <31544426.1501233221019877.JavaMail.root@zimbra> Not my finding, I'm only re-posting it here. See: http://stackoverflow.com/questions/473327/unexpected-behavior-of-iofread-in-erlang Cheers, Adam From jwecker@REDACTED Fri Jan 30 19:21:57 2009 From: jwecker@REDACTED (Joseph Wecker) Date: Fri, 30 Jan 2009 11:21:57 -0700 (MST) Subject: [erlang-bugs] inet_gethost (small) memory leak Message-ID: I was running valgrind on my erlang program to find some memory leaks in a port program. As expected, the erlang vm itself was very tight (compared to some python that ended up getting profiled, my C program, and even sed- throwing up memory leak errors all over the place). For what it's worth though, there was one very small memory leak that Erlang itself was generating. This may be a known problem already, or it may be too small to care- but it's the only one I saw: ==31273== 21 bytes in 1 blocks are definitely lost in loss record 1 of 2 ==31273== at 0x4025D2E: malloc (vg_replace_malloc.c:207) ==31273== by 0x4025EAF: realloc (vg_replace_malloc.c:429) ==31273== by 0x80497EE: (within /usr/lib/erlang/erts-5.6.3/bin/inet_gethost) ==31273== by 0x804AF06: (within /usr/lib/erlang/erts-5.6.3/bin/inet_gethost) ==31273== by 0x804BB34: (within /usr/lib/erlang/erts-5.6.3/bin/inet_gethost) ==31273== by 0x4084684: (below main) (in /lib/tls/i686/cmov/libc-2.8.90.so) ==31275== ==31275== 21 bytes in 1 blocks are definitely lost in loss record 1 of 2 ==31275== at 0x4025D2E: malloc (vg_replace_malloc.c:207) ==31275== by 0x4025EAF: realloc (vg_replace_malloc.c:429) ==31275== by 0x80497EE: (within /usr/lib/erlang/erts-5.6.3/bin/inet_gethost) ==31275== by 0x804AF06: (within /usr/lib/erlang/erts-5.6.3/bin/inet_gethost) ==31275== by 0x804BB34: (within /usr/lib/erlang/erts-5.6.3/bin/inet_gethost) ==31275== by 0x4084684: (below main) (in /lib/tls/i686/cmov/libc-2.8.90.so) etc. Wouldn't be surprised if it was coming from a system call- there may be nothing you can do about it, but I thought I'd bring it up as I couldn't find reference to this anywhere. -Joseph