From egil@REDACTED Thu Aug 1 19:59:19 2013 From: egil@REDACTED (=?ISO-8859-1?Q?Bj=F6rn-Egil_Dahlberg?=) Date: Thu, 1 Aug 2013 19:59:19 +0200 Subject: [erlang-bugs] R16B01's monitor delivery is broken? In-Reply-To: References: <15464.1374612273@snookles.snookles.com> Message-ID: <51FAA1F7.8000704@erlang.org> We have confirmed that this problem indeed exists and we think we understand what is happening. The problem has a low probability of occurring, though obviously reproducible, and pretty serious if it occurs. I won't go into too much details, but the uniqueness of the process identifier can be compromised, i.e. it will not be unique. In essence a process might get an identifier of an already terminated process (or an already living one though I haven't confirmed that), the mapping is then overwritten, and by inspecting this identifier it will look dead. Signals or messages will not be sent since it is "dead" or sent to an unsuspecting (wrong) process. The mappings of id's and process pointers has become inconsistent. It's a bit more complicated than that but in a nutshell that's what's happening. What is needed for this to occur? A wrapping of the entire "free-list-ring" of identifiers (size of max processes) while one thread is in progress of doing an atomic read, some shift and masking, and then a write for creating an identifier. *Highly unlikely* but definitely a race. I.e. while one thread is doing a read, shift/mask, and write to memory the other threads has to create and terminate 262144 processes (or whatever the limit is set to, but that is the default) If the thread is scheduled out by the OS, or a hyperthread switch occurs because of a mem-stall (we're dealing with membarriers here after all so it might be a thing) between the read and write the likelihood of an incident increases. Also, by lowering max-process-limit in the system the likelihood increases. We think we have a solution for this and initial tests show no evidence of uniqueness problem after the fix. I think we will have a fix out in maint next week. Using R16B01 together with the "+P legacy" is a workaround for this issue. The legacy option uses the old way and does not suffer from this problem. Thank you Scott, and to the rest of you at Basho for reporting this. Regards, Bj?rn-Egil On 2013-07-23 23:19, Bj?rn-Egil Dahlberg wrote: > True, that seems suspicious. > > The vacation for Rickard is going great I think. Last I heard from > him, he was diving round ?land (literally "island-land") in > south-eastern sweden. It will be a few weeks before he's back. > > In the meanwhile it is fairly lonely here at OTP, today we were two > persons at the office, and there is a lot of stuff to do. I will have > a quick look at it and verify but will probably let Rickard deal with > it when he comes back. > > Thanks for a great summary and drill down of the problem! > > Regards, > Bj?rn-Egil > > > 2013/7/23 Scott Lystig Fritchie > > > Hi, everyone. Hope your summer vacations are going well. I have some > bad news for Rickard, at least. > > SHA: e794251f8e54d6697e1bcc360471fd76b20c7748 > Author: Rickard Green > > Date: Thu May 30 2013 07:56:31 GMT-0500 (CDT) > Subject: Merge branch 'rickard/ptab-id-alloc/OTP-11077' into maint > Parent: 22685099ace9802016bf6203c525702084717d72 > Parent: 5c039a1fb4979314912dc3af6626d8d7a1c73993 > Merge branch 'rickard/ptab-id-alloc/OTP-11077' into maint > > * rickard/ptab-id-alloc/OTP-11077: > Introduce a better id allocation algorithm for PTabs > > This commit appears to break monitor delivery? And it may or may > not be > causing processes to die for reasons that we cannot see or understand. > > Run with R15B03-1, the example code in test6.erl is merely slow: > > https://gist.github.com/jtuple/aa4830a0ff0a94f69484/raw/02adc518e225f263a7e25d339ec7200ef2dda491/test6.erl > > On my 4 core/8 HT core MacBook Pro, R15B03-1 cannot go above 200% CPU > utilization, and the execution time is correspondingly slooow. But it > appears to work correctly. > > erl -eval '[begin io:format("Iteration ~p at ~p\n", > [X,time()]), test6:go() end || X <- lists:seq(1, 240)].' > > When run with R16B, it's *much* faster. CPU utilization above 750% > confirms that it's going faster. And it appears to work correctly. > > However, when run with R16B01, we see non-deterministic hangs on > both OS > X and various Linux platforms. CPU consumption by the "beam.smp" > process drops to 0, and the next cycle of the list comprehension never > starts. > > Thanks to the magic of Git, it's pretty clear that the commit above is > broken. The commit before it appears to work well (i.e., does not > hang). > > SHA: 22685099ace9802016bf6203c525702084717d72 > Author: Anders Svensson > > Date: Wed May 29 2013 11:46:10 GMT-0500 (CDT) > Subject: Merge branch > 'anders/diameter/watchdog_function_clause/OTP-11115' into maint > > Using R16B01 together with the "+P legacy" flag does not hang. > But this > problem has given us at Basho enough ... caution ... that we will be > much more cautious about moving our app packaging from R15B* to R16B*. > > Several seconds after CPU consumption drops to 0%, then I trigger the > creation of a "erl_crash.dump" file using erlang:halt("bummer"). If I > look at that file, then the process "Spawned as: test6:do_work2/0" > says > that there are active unidirectional links (i.e., monitors), but there > is one process on that list that does not have a corresponding > "=proc:" entry in the dump ... which strongly > suggests > to me that the process is dead. Using DTrace, I've been able to > establish that the dead process is indeed alive at one time and > has been > scheduled & descheduled at least once. So there are really two > mysteries: > > 1. Why is one of the test6:indirect_proxy/1 processes dying > unexpectedly? (The monitor doesn't fire, SASL isn't logging any > errors, > etc.) > > 2. Why isn't a monitor message being delivered? > > Many thanks to Joe Blomstedt, Evan Vigil-McClanahan, Andrew Thompson, > Steve Vinoski, and Sean Cribbs for their sleuthing work. > > -Scott > > --- snip --- snip --- snip --- snip --- snip --- > > R15B03 lock count analysis, FWIW: > > lock id #tries #collisions collisions [%] time > [us] duration [%] > ----- --- ------- ------------ --------------- > ---------- ------------- > proc_tab 1 1280032 1266133 98.9142 60642804 > 557.0583 > run_queue 8 3617608 12874 0.3559 261722 > 2.4042 > sys_tracers 1 1280042 6445 0.5035 19365 > 0.1779 > pix_lock 256 4480284 1213 0.0271 9777 > 0.0898 > timeofday 1 709955 1187 0.1672 3216 > 0.0295 > [......] > > --- snip --- snip --- snip --- snip --- snip --- > > =proc:<0.29950.154> > State: Waiting > Spawned as: test6:do_work2/0 > Spawned by: <0.48.0> > Started: Tue Jul 23 04:50:54 2013 > Message queue length: 0 > Number of heap fragments: 0 > Heap fragment data: 0 > Link list: [{from,<0.48.0>,#Ref<0.0.19.96773>}, > {to,<0.32497.154>,#Ref<0.0.19.96797>}, > {to,<0.1184.155>,#Ref<0.0.19.96796>}, > {to,<0.31361.154>,#Ref<0.0.19.96799>}, > {to,<0.32019.154>,#Ref<0.0.19.96801>}, > {to,<0.32501.154>,#Ref<0.0.19.96800>}, > {to,<0.1352.155>,#Ref<0.0.19.96803>}, > {to,<0.32415.154>,#Ref<0.0.19.96805>}, > {to,<0.504.155>,#Ref<0.0.19.96804>}, > {to,<0.87.155>,#Ref<0.0.19.96802>}, > {to,<0.776.155>,#Ref<0.0.19.96798>}] > Reductions: 45 > Stack+heap: 233 > OldHeap: 0 > Heap unused: 155 > OldHeap unused: 0 > Memory: 3472 > Program counter: 0x000000001e1504d0 (test6:do_work2/0 + 184) > CP: 0x0000000000000000 (invalid) > arity = 0 > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs > > > > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs -------------- next part -------------- An HTML attachment was scrubbed... URL: From tony@REDACTED Thu Aug 1 20:06:12 2013 From: tony@REDACTED (Tony Rogvall) Date: Thu, 1 Aug 2013 20:06:12 +0200 Subject: [erlang-bugs] A funny bug Message-ID: <6AD0BB5D-9A64-4641-9F85-CBE14BDEC39E@rogvall.se> I was inspecting the new, and long awaited for, fix of 'receive' in erl_eval. I could not help myself to wonder what would happened if: 5> self() ! x, prim_eval:'receive'(fun(X) -> receive Y -> Y end end, 1000). Bus error: 10 I know you should not use prim_eval in this manner, but I tend to ignore recommendations. But this construct should probably not crash the VM. /Tony -------------- next part -------------- An HTML attachment was scrubbed... URL: From egil@REDACTED Thu Aug 1 20:22:12 2013 From: egil@REDACTED (=?ISO-8859-1?Q?Bj=F6rn-Egil_Dahlberg?=) Date: Thu, 1 Aug 2013 20:22:12 +0200 Subject: [erlang-bugs] R16B01's monitor delivery is broken? In-Reply-To: <51FAA1F7.8000704@erlang.org> References: <15464.1374612273@snookles.snookles.com> <51FAA1F7.8000704@erlang.org> Message-ID: <51FAA754.6090600@erlang.org> On 2013-08-01 19:59, Bj?rn-Egil Dahlberg wrote: > > We have confirmed that this problem indeed exists and we think we > understand what is happening. > > The problem has a low probability of occurring, though obviously > reproducible, and pretty serious if it occurs. > > I won't go into too much details, but the uniqueness of the process > identifier can be compromised, i.e. it will not be unique. In essence > a process might get an identifier of an already terminated process (or > an already living one though I haven't confirmed that), the mapping is > then overwritten, and by inspecting this identifier it will look dead. > Signals or messages will not be sent since it is "dead" or sent to an > unsuspecting (wrong) process. The mappings of id's and process > pointers has become inconsistent. It's a bit more complicated than > that but in a nutshell that's what's happening. > > What is needed for this to occur? A wrapping of the entire > "free-list-ring" of identifiers (size of max processes) while one > thread is in progress of doing an atomic read, some shift and masking, > and then a write for creating an identifier. *Highly unlikely* but > definitely a race. I.e. while one thread is doing a read, shift/mask, > and write to memory the other threads has to create and terminate > 262144 processes (or whatever the limit is set to, but that is the > default) I think I tried to simplify this explanation too much. The race occurs when the process is deleted and writes to the free-list and a new process is created which is 262144 "generations/spawns" after the deleted process and reads from the free-list in between the terminating process read-shift/mask-write. Anyway details .. it's a race. > > If the thread is scheduled out by the OS, or a hyperthread switch > occurs because of a mem-stall (we're dealing with membarriers here > after all so it might be a thing) between the read and write the > likelihood of an incident increases. Also, by lowering > max-process-limit in the system the likelihood increases. > > We think we have a solution for this and initial tests show no > evidence of uniqueness problem after the fix. I think we will have a > fix out in maint next week. > > Using R16B01 together with the "+P legacy" is a workaround for this > issue. The legacy option uses the old way and does not suffer from > this problem. > > Thank you Scott, and to the rest of you at Basho for reporting this. > > Regards, > Bj?rn-Egil > > > On 2013-07-23 23:19, Bj?rn-Egil Dahlberg wrote: >> True, that seems suspicious. >> >> The vacation for Rickard is going great I think. Last I heard from >> him, he was diving round ?land (literally "island-land") in >> south-eastern sweden. It will be a few weeks before he's back. >> >> In the meanwhile it is fairly lonely here at OTP, today we were two >> persons at the office, and there is a lot of stuff to do. I will have >> a quick look at it and verify but will probably let Rickard deal with >> it when he comes back. >> >> Thanks for a great summary and drill down of the problem! >> >> Regards, >> Bj?rn-Egil >> >> >> 2013/7/23 Scott Lystig Fritchie > > >> >> Hi, everyone. Hope your summer vacations are going well. I have >> some >> bad news for Rickard, at least. >> >> SHA: e794251f8e54d6697e1bcc360471fd76b20c7748 >> Author: Rickard Green > > >> Date: Thu May 30 2013 07:56:31 GMT-0500 (CDT) >> Subject: Merge branch 'rickard/ptab-id-alloc/OTP-11077' into >> maint >> Parent: 22685099ace9802016bf6203c525702084717d72 >> Parent: 5c039a1fb4979314912dc3af6626d8d7a1c73993 >> Merge branch 'rickard/ptab-id-alloc/OTP-11077' into maint >> >> * rickard/ptab-id-alloc/OTP-11077: >> Introduce a better id allocation algorithm for PTabs >> >> This commit appears to break monitor delivery? And it may or may >> not be >> causing processes to die for reasons that we cannot see or >> understand. >> >> Run with R15B03-1, the example code in test6.erl is merely slow: >> >> https://gist.github.com/jtuple/aa4830a0ff0a94f69484/raw/02adc518e225f263a7e25d339ec7200ef2dda491/test6.erl >> >> On my 4 core/8 HT core MacBook Pro, R15B03-1 cannot go above 200% CPU >> utilization, and the execution time is correspondingly slooow. >> But it >> appears to work correctly. >> >> erl -eval '[begin io:format("Iteration ~p at ~p\n", >> [X,time()]), test6:go() end || X <- lists:seq(1, 240)].' >> >> When run with R16B, it's *much* faster. CPU utilization above 750% >> confirms that it's going faster. And it appears to work correctly. >> >> However, when run with R16B01, we see non-deterministic hangs on >> both OS >> X and various Linux platforms. CPU consumption by the "beam.smp" >> process drops to 0, and the next cycle of the list comprehension >> never >> starts. >> >> Thanks to the magic of Git, it's pretty clear that the commit >> above is >> broken. The commit before it appears to work well (i.e., does not >> hang). >> >> SHA: 22685099ace9802016bf6203c525702084717d72 >> Author: Anders Svensson > > >> Date: Wed May 29 2013 11:46:10 GMT-0500 (CDT) >> Subject: Merge branch >> 'anders/diameter/watchdog_function_clause/OTP-11115' into maint >> >> Using R16B01 together with the "+P legacy" flag does not hang. >> But this >> problem has given us at Basho enough ... caution ... that we will be >> much more cautious about moving our app packaging from R15B* to >> R16B*. >> >> Several seconds after CPU consumption drops to 0%, then I trigger the >> creation of a "erl_crash.dump" file using erlang:halt("bummer"). >> If I >> look at that file, then the process "Spawned as: >> test6:do_work2/0" says >> that there are active unidirectional links (i.e., monitors), but >> there >> is one process on that list that does not have a corresponding >> "=proc:" entry in the dump ... which strongly >> suggests >> to me that the process is dead. Using DTrace, I've been able to >> establish that the dead process is indeed alive at one time and >> has been >> scheduled & descheduled at least once. So there are really two >> mysteries: >> >> 1. Why is one of the test6:indirect_proxy/1 processes dying >> unexpectedly? (The monitor doesn't fire, SASL isn't logging any >> errors, >> etc.) >> >> 2. Why isn't a monitor message being delivered? >> >> Many thanks to Joe Blomstedt, Evan Vigil-McClanahan, Andrew Thompson, >> Steve Vinoski, and Sean Cribbs for their sleuthing work. >> >> -Scott >> >> --- snip --- snip --- snip --- snip --- snip --- >> >> R15B03 lock count analysis, FWIW: >> >> lock id #tries #collisions collisions [%] time >> [us] duration [%] >> ----- --- ------- ------------ --------------- >> ---------- ------------- >> proc_tab 1 1280032 1266133 98.9142 60642804 >> 557.0583 >> run_queue 8 3617608 12874 0.3559 261722 >> 2.4042 >> sys_tracers 1 1280042 6445 0.5035 19365 >> 0.1779 >> pix_lock 256 4480284 1213 0.0271 9777 >> 0.0898 >> timeofday 1 709955 1187 0.1672 3216 >> 0.0295 >> [......] >> >> --- snip --- snip --- snip --- snip --- snip --- >> >> =proc:<0.29950.154> >> State: Waiting >> Spawned as: test6:do_work2/0 >> Spawned by: <0.48.0> >> Started: Tue Jul 23 04:50:54 2013 >> Message queue length: 0 >> Number of heap fragments: 0 >> Heap fragment data: 0 >> Link list: [{from,<0.48.0>,#Ref<0.0.19.96773>}, >> {to,<0.32497.154>,#Ref<0.0.19.96797>}, >> {to,<0.1184.155>,#Ref<0.0.19.96796>}, >> {to,<0.31361.154>,#Ref<0.0.19.96799>}, >> {to,<0.32019.154>,#Ref<0.0.19.96801>}, >> {to,<0.32501.154>,#Ref<0.0.19.96800>}, >> {to,<0.1352.155>,#Ref<0.0.19.96803>}, >> {to,<0.32415.154>,#Ref<0.0.19.96805>}, >> {to,<0.504.155>,#Ref<0.0.19.96804>}, >> {to,<0.87.155>,#Ref<0.0.19.96802>}, >> {to,<0.776.155>,#Ref<0.0.19.96798>}] >> Reductions: 45 >> Stack+heap: 233 >> OldHeap: 0 >> Heap unused: 155 >> OldHeap unused: 0 >> Memory: 3472 >> Program counter: 0x000000001e1504d0 (test6:do_work2/0 + 184) >> CP: 0x0000000000000000 (invalid) >> arity = 0 >> _______________________________________________ >> erlang-bugs mailing list >> erlang-bugs@REDACTED >> http://erlang.org/mailman/listinfo/erlang-bugs >> >> >> >> >> _______________________________________________ >> erlang-bugs mailing list >> erlang-bugs@REDACTED >> http://erlang.org/mailman/listinfo/erlang-bugs > > > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs -------------- next part -------------- An HTML attachment was scrubbed... URL: From n.oxyde@REDACTED Thu Aug 1 22:29:19 2013 From: n.oxyde@REDACTED (Anthony Ramine) Date: Thu, 1 Aug 2013 22:29:19 +0200 Subject: [erlang-bugs] A funny bug In-Reply-To: <6AD0BB5D-9A64-4641-9F85-CBE14BDEC39E@rogvall.se> References: <6AD0BB5D-9A64-4641-9F85-CBE14BDEC39E@rogvall.se> Message-ID: <28C886FB-1767-45F7-B2DC-7F796298589A@gmail.com> Hello, It's not that you should not use prim_eval in this particular manner, you should not use prim_eval at all. This is probably not the only primitive that can make the VM segfault. That being said, when I implemented that patch the function didn't call a given closure but an hard-coded remote function in prim_eval; maybe we should put that back? Regards, -- Anthony Ramine Le 1 ao?t 2013 ? 20:06, Tony Rogvall a ?crit : > I was inspecting the new, and long awaited for, fix of 'receive' in erl_eval. > > I could not help myself to wonder what would happened if: > > 5> self() ! x, prim_eval:'receive'(fun(X) -> receive Y -> Y end end, 1000). > > Bus error: 10 > > I know you should not use prim_eval in this manner, but I tend to ignore recommendations. > But this construct should probably not crash the VM. > > /Tony > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs From n.oxyde@REDACTED Fri Aug 2 12:54:59 2013 From: n.oxyde@REDACTED (Anthony Ramine) Date: Fri, 2 Aug 2013 12:54:59 +0200 Subject: [erlang-bugs] R16B01's monitor delivery is broken? In-Reply-To: <51FAA754.6090600@erlang.org> References: <15464.1374612273@snookles.snookles.com> <51FAA1F7.8000704@erlang.org> <51FAA754.6090600@erlang.org> Message-ID: <78E4CAE9-38E9-4050-97BF-395A63BD7406@gmail.com> Hello, Talking about explanations and whatnot, next time such an important algorithm is modified, could we have more than a one-liner commit message? Thank you. Here is a nice idea: when you review your own commits from the OTP team, consider a complete stranger wrote it and require from yourself the same kind of explanatory commit messages you would from such a third party entity. I hate useless commit messages. Regards, -- Anthony Ramine Le 1 ao?t 2013 ? 20:22, Bj?rn-Egil Dahlberg a ?crit : > On 2013-08-01 19:59, Bj?rn-Egil Dahlberg wrote: >> >> We have confirmed that this problem indeed exists and we think we understand what is happening. >> >> The problem has a low probability of occurring, though obviously reproducible, and pretty serious if it occurs. >> >> I won't go into too much details, but the uniqueness of the process identifier can be compromised, i.e. it will not be unique. In essence a process might get an identifier of an already terminated process (or an already living one though I haven't confirmed that), the mapping is then overwritten, and by inspecting this identifier it will look dead. Signals or messages will not be sent since it is "dead" or sent to an unsuspecting (wrong) process. The mappings of id's and process pointers has become inconsistent. It's a bit more complicated than that but in a nutshell that's what's happening. >> >> What is needed for this to occur? A wrapping of the entire "free-list-ring" of identifiers (size of max processes) while one thread is in progress of doing an atomic read, some shift and masking, and then a write for creating an identifier. *Highly unlikely* but definitely a race. I.e. while one thread is doing a read, shift/mask, and write to memory the other threads has to create and terminate 262144 processes (or whatever the limit is set to, but that is the default) > I think I tried to simplify this explanation too much. The race occurs when the process is deleted and writes to the free-list and a new process is created which is 262144 "generations/spawns" after the deleted process and reads from the free-list in between the terminating process read-shift/mask-write. Anyway details .. it's a race. > >> >> If the thread is scheduled out by the OS, or a hyperthread switch occurs because of a mem-stall (we're dealing with membarriers here after all so it might be a thing) between the read and write the likelihood of an incident increases. Also, by lowering max-process-limit in the system the likelihood increases. >> >> We think we have a solution for this and initial tests show no evidence of uniqueness problem after the fix. I think we will have a fix out in maint next week. >> >> Using R16B01 together with the "+P legacy" is a workaround for this issue. The legacy option uses the old way and does not suffer from this problem. >> >> Thank you Scott, and to the rest of you at Basho for reporting this. >> >> Regards, >> Bj?rn-Egil >> >> >> On 2013-07-23 23:19, Bj?rn-Egil Dahlberg wrote: >>> True, that seems suspicious. >>> >>> The vacation for Rickard is going great I think. Last I heard from him, he was diving round ?land (literally "island-land") in south-eastern sweden. It will be a few weeks before he's back. >>> >>> In the meanwhile it is fairly lonely here at OTP, today we were two persons at the office, and there is a lot of stuff to do. I will have a quick look at it and verify but will probably let Rickard deal with it when he comes back. >>> >>> Thanks for a great summary and drill down of the problem! >>> >>> Regards, >>> Bj?rn-Egil >>> >>> >>> 2013/7/23 Scott Lystig Fritchie >>> Hi, everyone. Hope your summer vacations are going well. I have some >>> bad news for Rickard, at least. >>> >>> SHA: e794251f8e54d6697e1bcc360471fd76b20c7748 >>> Author: Rickard Green >>> Date: Thu May 30 2013 07:56:31 GMT-0500 (CDT) >>> Subject: Merge branch 'rickard/ptab-id-alloc/OTP-11077' into maint >>> Parent: 22685099ace9802016bf6203c525702084717d72 >>> Parent: 5c039a1fb4979314912dc3af6626d8d7a1c73993 >>> Merge branch 'rickard/ptab-id-alloc/OTP-11077' into maint >>> >>> * rickard/ptab-id-alloc/OTP-11077: >>> Introduce a better id allocation algorithm for PTabs >>> >>> This commit appears to break monitor delivery? And it may or may not be >>> causing processes to die for reasons that we cannot see or understand. >>> >>> Run with R15B03-1, the example code in test6.erl is merely slow: >>> >>> https://gist.github.com/jtuple/aa4830a0ff0a94f69484/raw/02adc518e225f263a7e25d339ec7200ef2dda491/test6.erl >>> >>> On my 4 core/8 HT core MacBook Pro, R15B03-1 cannot go above 200% CPU >>> utilization, and the execution time is correspondingly slooow. But it >>> appears to work correctly. >>> >>> erl -eval '[begin io:format("Iteration ~p at ~p\n", [X,time()]), test6:go() end || X <- lists:seq(1, 240)].' >>> >>> When run with R16B, it's *much* faster. CPU utilization above 750% >>> confirms that it's going faster. And it appears to work correctly. >>> >>> However, when run with R16B01, we see non-deterministic hangs on both OS >>> X and various Linux platforms. CPU consumption by the "beam.smp" >>> process drops to 0, and the next cycle of the list comprehension never >>> starts. >>> >>> Thanks to the magic of Git, it's pretty clear that the commit above is >>> broken. The commit before it appears to work well (i.e., does not >>> hang). >>> >>> SHA: 22685099ace9802016bf6203c525702084717d72 >>> Author: Anders Svensson >>> Date: Wed May 29 2013 11:46:10 GMT-0500 (CDT) >>> Subject: Merge branch 'anders/diameter/watchdog_function_clause/OTP-11115' into maint >>> >>> Using R16B01 together with the "+P legacy" flag does not hang. But this >>> problem has given us at Basho enough ... caution ... that we will be >>> much more cautious about moving our app packaging from R15B* to R16B*. >>> >>> Several seconds after CPU consumption drops to 0%, then I trigger the >>> creation of a "erl_crash.dump" file using erlang:halt("bummer"). If I >>> look at that file, then the process "Spawned as: test6:do_work2/0" says >>> that there are active unidirectional links (i.e., monitors), but there >>> is one process on that list that does not have a corresponding >>> "=proc:" entry in the dump ... which strongly suggests >>> to me that the process is dead. Using DTrace, I've been able to >>> establish that the dead process is indeed alive at one time and has been >>> scheduled & descheduled at least once. So there are really two >>> mysteries: >>> >>> 1. Why is one of the test6:indirect_proxy/1 processes dying >>> unexpectedly? (The monitor doesn't fire, SASL isn't logging any errors, >>> etc.) >>> >>> 2. Why isn't a monitor message being delivered? >>> >>> Many thanks to Joe Blomstedt, Evan Vigil-McClanahan, Andrew Thompson, >>> Steve Vinoski, and Sean Cribbs for their sleuthing work. >>> >>> -Scott >>> >>> --- snip --- snip --- snip --- snip --- snip --- >>> >>> R15B03 lock count analysis, FWIW: >>> >>> lock id #tries #collisions collisions [%] time [us] duration [%] >>> ----- --- ------- ------------ --------------- ---------- ------------- >>> proc_tab 1 1280032 1266133 98.9142 60642804 557.0583 >>> run_queue 8 3617608 12874 0.3559 261722 2.4042 >>> sys_tracers 1 1280042 6445 0.5035 19365 0.1779 >>> pix_lock 256 4480284 1213 0.0271 9777 0.0898 >>> timeofday 1 709955 1187 0.1672 3216 0.0295 >>> [......] >>> >>> --- snip --- snip --- snip --- snip --- snip --- >>> >>> =proc:<0.29950.154> >>> State: Waiting >>> Spawned as: test6:do_work2/0 >>> Spawned by: <0.48.0> >>> Started: Tue Jul 23 04:50:54 2013 >>> Message queue length: 0 >>> Number of heap fragments: 0 >>> Heap fragment data: 0 >>> Link list: [{from,<0.48.0>,#Ref<0.0.19.96773>}, {to,<0.32497.154>,#Ref<0.0.19.96797>}, {to,<0.1184.155>,#Ref<0.0.19.96796>}, {to,<0.31361.154>,#Ref<0.0.19.96799>}, {to,<0.32019.154>,#Ref<0.0.19.96801>}, {to,<0.32501.154>,#Ref<0.0.19.96800>}, {to,<0.1352.155>,#Ref<0.0.19.96803>}, {to,<0.32415.154>,#Ref<0.0.19.96805>}, {to,<0.504.155>,#Ref<0.0.19.96804>}, {to,<0.87.155>,#Ref<0.0.19.96802>}, {to,<0.776.155>,#Ref<0.0.19.96798>}] >>> Reductions: 45 >>> Stack+heap: 233 >>> OldHeap: 0 >>> Heap unused: 155 >>> OldHeap unused: 0 >>> Memory: 3472 >>> Program counter: 0x000000001e1504d0 (test6:do_work2/0 + 184) >>> CP: 0x0000000000000000 (invalid) >>> arity = 0 >>> _______________________________________________ >>> erlang-bugs mailing list >>> erlang-bugs@REDACTED >>> http://erlang.org/mailman/listinfo/erlang-bugs >>> >>> >>> >>> _______________________________________________ >>> erlang-bugs mailing list >>> >>> erlang-bugs@REDACTED >>> http://erlang.org/mailman/listinfo/erlang-bugs >> >> >> >> _______________________________________________ >> erlang-bugs mailing list >> >> erlang-bugs@REDACTED >> http://erlang.org/mailman/listinfo/erlang-bugs > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs From tony@REDACTED Fri Aug 2 14:03:56 2013 From: tony@REDACTED (Tony Rogvall) Date: Fri, 2 Aug 2013 14:03:56 +0200 Subject: [erlang-bugs] A funny bug In-Reply-To: <28C886FB-1767-45F7-B2DC-7F796298589A@gmail.com> References: <6AD0BB5D-9A64-4641-9F85-CBE14BDEC39E@rogvall.se> <28C886FB-1767-45F7-B2DC-7F796298589A@gmail.com> Message-ID: <614DCCF0-DB18-4848-BCBB-B4891F80D30D@rogvall.se> On 1 aug 2013, at 22:29, Anthony Ramine wrote: > Hello, > > It's not that you should not use prim_eval in this particular manner, you should not use prim_eval at all. This is probably not the only primitive that can make the VM segfault. Well I am, indirectly, using prim_eval:'receive'/2 when I am executing "receive Y -> Y end" from the shell. I think that this should still be allowed. (need a smily here? I guess, ok here it is ;-) > > That being said, when I implemented that patch the function didn't call a given closure but an hard-coded remote function in prim_eval; maybe we should put that back? I like prim_eval:'receive'/2 the way it is right now. But, for example, a badarg when trying to do receive within the function closure would be nice. I guess OTP team can figure out a lightweight way of doing this? Regards /Tony > > Regards, > > -- > Anthony Ramine > > Le 1 ao?t 2013 ? 20:06, Tony Rogvall a ?crit : > >> I was inspecting the new, and long awaited for, fix of 'receive' in erl_eval. >> >> I could not help myself to wonder what would happened if: >> >> 5> self() ! x, prim_eval:'receive'(fun(X) -> receive Y -> Y end end, 1000). >> >> Bus error: 10 >> >> I know you should not use prim_eval in this manner, but I tend to ignore recommendations. >> But this construct should probably not crash the VM. >> >> /Tony >> >> _______________________________________________ >> erlang-bugs mailing list >> erlang-bugs@REDACTED >> http://erlang.org/mailman/listinfo/erlang-bugs > "Installing applications can lead to corruption over time. Applications gradually write over each other's libraries, partial upgrades occur, user and system errors happen, and minute changes may be unnoticeable and difficult to fix" -------------- next part -------------- An HTML attachment was scrubbed... URL: From n.oxyde@REDACTED Fri Aug 2 15:20:47 2013 From: n.oxyde@REDACTED (Anthony Ramine) Date: Fri, 2 Aug 2013 15:20:47 +0200 Subject: [erlang-bugs] A funny bug In-Reply-To: <614DCCF0-DB18-4848-BCBB-B4891F80D30D@rogvall.se> References: <6AD0BB5D-9A64-4641-9F85-CBE14BDEC39E@rogvall.se> <28C886FB-1767-45F7-B2DC-7F796298589A@gmail.com> <614DCCF0-DB18-4848-BCBB-B4891F80D30D@rogvall.se> Message-ID: Replied inline. -- Anthony Ramine Le 2 ao?t 2013 ? 14:03, Tony Rogvall a ?crit : > > On 1 aug 2013, at 22:29, Anthony Ramine wrote: > >> Hello, >> >> It's not that you should not use prim_eval in this particular manner, you should not use prim_eval at all. This is probably not the only primitive that can make the VM segfault. > > Well I am, indirectly, using prim_eval:'receive'/2 when I am executing "receive Y -> Y end" from the shell. > I think that this should still be allowed. (need a smily here? I guess, ok here it is ;-) There is a reason why undocumented stuff is undocumented, smiley or not. >> That being said, when I implemented that patch the function didn't call a given closure but an hard-coded remote function in prim_eval; maybe we should put that back? > > I like prim_eval:'receive'/2 the way it is right now. But, for example, a badarg when trying to do receive within the function closure would be nice. > I guess OTP team can figure out a lightweight way of doing this? There is no such a way, apart from making an extra check in the VM and thus slowing down any receive code. > Regards > > /Tony > >> >> Regards, >> >> -- >> Anthony Ramine >> >> Le 1 ao?t 2013 ? 20:06, Tony Rogvall a ?crit : >> >>> I was inspecting the new, and long awaited for, fix of 'receive' in erl_eval. >>> >>> I could not help myself to wonder what would happened if: >>> >>> 5> self() ! x, prim_eval:'receive'(fun(X) -> receive Y -> Y end end, 1000). >>> >>> Bus error: 10 >>> >>> I know you should not use prim_eval in this manner, but I tend to ignore recommendations. >>> But this construct should probably not crash the VM. >>> >>> /Tony >>> >>> _______________________________________________ >>> erlang-bugs mailing list >>> erlang-bugs@REDACTED >>> http://erlang.org/mailman/listinfo/erlang-bugs >> > > "Installing applications can lead to corruption over time. Applications gradually write over each other's libraries, partial upgrades occur, user and system errors happen, and minute changes may be unnoticeable and difficult to fix" > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tony@REDACTED Fri Aug 2 15:39:55 2013 From: tony@REDACTED (Tony Rogvall) Date: Fri, 2 Aug 2013 15:39:55 +0200 Subject: [erlang-bugs] A funny bug In-Reply-To: References: <6AD0BB5D-9A64-4641-9F85-CBE14BDEC39E@rogvall.se> <28C886FB-1767-45F7-B2DC-7F796298589A@gmail.com> <614DCCF0-DB18-4848-BCBB-B4891F80D30D@rogvall.se> Message-ID: <266F75A8-3E1A-406F-8877-2477647E6C2B@rogvall.se> On 2 aug 2013, at 15:20, Anthony Ramine wrote: > Replied inline. > > -- > Anthony Ramine > > Le 2 ao?t 2013 ? 14:03, Tony Rogvall a ?crit : > >> >> On 1 aug 2013, at 22:29, Anthony Ramine wrote: >> >>> Hello, >>> >>> It's not that you should not use prim_eval in this particular manner, you should not use prim_eval at all. This is probably not the only primitive that can make the VM segfault. >> >> Well I am, indirectly, using prim_eval:'receive'/2 when I am executing "receive Y -> Y end" from the shell. >> I think that this should still be allowed. (need a smily here? I guess, ok here it is ;-) > > There is a reason why undocumented stuff is undocumented, smiley or not. Laziness ? > >>> That being said, when I implemented that patch the function didn't call a given closure but an hard-coded remote function in prim_eval; maybe we should put that back? >> >> I like prim_eval:'receive'/2 the way it is right now. But, for example, a badarg when trying to do receive within the function closure would be nice. >> I guess OTP team can figure out a lightweight way of doing this? > > There is no such a way, apart from making an extra check in the VM and thus slowing down any receive code. Well, a recursive interpreted call could be fixed without slowing down "any receive code" /Tony >> Regards >> >> /Tony >> >>> >>> Regards, >>> >>> -- >>> Anthony Ramine >>> >>> Le 1 ao?t 2013 ? 20:06, Tony Rogvall a ?crit : >>> >>>> I was inspecting the new, and long awaited for, fix of 'receive' in erl_eval. >>>> >>>> I could not help myself to wonder what would happened if: >>>> >>>> 5> self() ! x, prim_eval:'receive'(fun(X) -> receive Y -> Y end end, 1000). >>>> >>>> Bus error: 10 >>>> >>>> I know you should not use prim_eval in this manner, but I tend to ignore recommendations. >>>> But this construct should probably not crash the VM. >>>> >>>> /Tony >>>> >>>> _______________________________________________ >>>> erlang-bugs mailing list >>>> erlang-bugs@REDACTED >>>> http://erlang.org/mailman/listinfo/erlang-bugs >>> >> >> "Installing applications can lead to corruption over time. Applications gradually write over each other's libraries, partial upgrades occur, user and system errors happen, and minute changes may be unnoticeable and difficult to fix" >> >> >> > "Installing applications can lead to corruption over time. Applications gradually write over each other's libraries, partial upgrades occur, user and system errors happen, and minute changes may be unnoticeable and difficult to fix" -------------- next part -------------- An HTML attachment was scrubbed... URL: From n.oxyde@REDACTED Fri Aug 2 15:55:05 2013 From: n.oxyde@REDACTED (Anthony Ramine) Date: Fri, 2 Aug 2013 15:55:05 +0200 Subject: [erlang-bugs] A funny bug In-Reply-To: <266F75A8-3E1A-406F-8877-2477647E6C2B@rogvall.se> References: <6AD0BB5D-9A64-4641-9F85-CBE14BDEC39E@rogvall.se> <28C886FB-1767-45F7-B2DC-7F796298589A@gmail.com> <614DCCF0-DB18-4848-BCBB-B4891F80D30D@rogvall.se> <266F75A8-3E1A-406F-8877-2477647E6C2B@rogvall.se> Message-ID: <856DDC77-AA97-428B-8C68-BC4DE88618C0@gmail.com> The fact that it is an interpreted call is irrelevant, the same snippet of code compiled triggers a segfault. If you want to fix it, I don't see any other way than making the loop_rec instruction itself fail if there is already a receive loop in progress. The function is not documented not because of laziness (I always document things that should be documented) but because it just shouldn't be used outside of erl_eval. Anyone interested in the details of prim_eval should just look at the commit message I wrote when I introduced it. -- Anthony Ramine Le 2 ao?t 2013 ? 15:39, Tony Rogvall a ?crit : > Well, a recursive interpreted call could be fixed without slowing down "any receive code" From ulf@REDACTED Fri Aug 2 16:57:10 2013 From: ulf@REDACTED (Ulf Wiger) Date: Fri, 2 Aug 2013 16:57:10 +0200 Subject: [erlang-bugs] A funny bug In-Reply-To: <856DDC77-AA97-428B-8C68-BC4DE88618C0@gmail.com> References: <6AD0BB5D-9A64-4641-9F85-CBE14BDEC39E@rogvall.se> <28C886FB-1767-45F7-B2DC-7F796298589A@gmail.com> <614DCCF0-DB18-4848-BCBB-B4891F80D30D@rogvall.se> <266F75A8-3E1A-406F-8877-2477647E6C2B@rogvall.se> <856DDC77-AA97-428B-8C68-BC4DE88618C0@gmail.com> Message-ID: <7DAE1921-064D-41C5-B9AE-EE4513FDF842@feuerlabs.com> On 2 Aug 2013, at 15:55, Anthony Ramine wrote: > The function is not documented not because of laziness (I always document things that should be documented) but because it just shouldn't be used outside of erl_eval. Anyone interested in the details of prim_eval should just look at the commit message I wrote when I introduced it. Well, many of us have felt that it's an unfortunate shortcoming of Erlang that receive clauses cannot be parameterized. I created plain_fsm back in 2004 as an explicit workaround for this (using a parse_transform). https://github.com/uwiger/plain_fsm/blob/master/doc/plain_fsm.md So arguably, a way to parameterize receive *should* be available, and *should* be documented. I'm not saying that prim_eval:'receive'/2 is that very thing that should be documented, but it comes close enough that Erlang wizards like Tony should not only be excused for playing around with it, but should be *expected* to. ;-) I aim to play with it myself, once I have some spare time. BR, Ulf Ulf Wiger, Co-founder & Developer Advocate, Feuerlabs Inc. http://feuerlabs.com From eric.pailleau@REDACTED Sun Aug 4 00:03:07 2013 From: eric.pailleau@REDACTED (PAILLEAU Eric) Date: Sun, 04 Aug 2013 00:03:07 +0200 Subject: [erlang-bugs] R17A - Bug : unwanted semi-colon in generated erlang module after yecc compilation. Message-ID: <51FD7E1B.8020102@wanadoo.fr> 2> yecc:file("test",[]). test.yrl: Warning: conflicts: 111 shift/reduce, 0 reduce/reduce {ok,"test.erl"} 3> c:c("test"). test.erl:1167: syntax error before: ';' Yes, the generated test.erl have a semi-colon before yeccerror . (compilation ok by removing it...). --8<------------------------------------------------- snip... snip ; yeccpars2_24(_, _, _, _, T, _, _) -> yeccerror(T). snip... snip --8<------------------------------------------------- in lib/parsetools/src/yecc.erl the problem comes from delim/2 function called in output_state_actions_fini/2 --8<------------------------------------------------- snip... snip output_state_actions_fini(State, St0) -> %% Backward compatible. St10 = delim(St0, false), St = fwrite(St10, <<"yeccpars2_~w(_, _, _, _, T, _, _) ->\n">>, [State]), fwrite(St, <<" yeccerror(T).\n\n">>, []). snip... snip delim(St, true) -> St; delim(St, false) -> fwrite(St, <<";\n">>, []). snip... snip --8<------------------------------------------------- May be the delim/2 function should get 'true' as second argument, but the global code is a bit hard to understand and I suppose the author should be a better bugfixer... furtherover, I go in vacation and won't have time to look at this ;>) . comments indicates changes on yeccerror() in yecc.erl since 1.4, parsetools-2.0.4 . May be this introduce this bug. Helas, so far, I can't say if this bug is a consequence of my parser or not. I get same error with R16B01 . best regards. From peppe@REDACTED Tue Aug 6 17:47:19 2013 From: peppe@REDACTED (Peter Andersson) Date: Tue, 6 Aug 2013 17:47:19 +0200 Subject: [erlang-bugs] common_test + test_server_io errors In-Reply-To: References: <5A095356-16E8-4224-BA03-AA58C30FC28D@gmail.com> <51CC4E35.8090408@erlang.org> <3B4CC6D8-4CB5-4236-AEDC-CBAF51A9E7AE@gmail.com> <51DFB044.50302@erlang.org> Message-ID: <52011A87.5080203@erlang.org> Hi Tim, A call to ct:log/2 when Common Test is not running (i.e. before or after the execution of the ct_run program or the ct:run_test/1 function), is ignored, both in R15 and R16. For calls to ct:log/2 when Common Test is running, this is the change from R15 to R16: In R15: If ct:log/2 is called on a test case process, or one that has inherited the group leader from a test case process (typically one spawned from a test case), the string is printed to the test case log. This is true if the test case in question is still running. Otherwise the log file is no longer open. And if the latter, Common Test will either print the string to the current test case log instead, if one (and only one!) test case is executing, otherwise print the string to the CT Framework log. This means that in R15, printouts by means of ct:log/2 from processes unknown to Common Test, end up either in a test case log or in the CT Framework log. In R16: If ct:log/2 is called on a test case process, or one that has inherited the group leader from a test case process, the string is printed to the test case log (as in R15). If the test case in question is finished and the log file has been closed, the string is printed to the "unexpected i/o" log (via the test_server_io process) instead. Printouts by means of ct:log/2 from processes unknown to Common Test, also always end up in the unexpected i/o log. Printouts from CT hook functions are "safe" in the sense that they execute sequentially pre/post test suite/group/case execution. It's not possible that a call to ct:log/2 from a hook function "comes in too soon or late", i.e. gets handled before/after Common Test has started/finished executing. In general, if one knows that Common Test is running (which is the time between the CT hook init and terminate call, or the start_logging and stop_logging event message), it is safe to call ct:log/2 or ct:pal/2 from anywhere and find the data in either the test case logs or in the unexpected i/o log (depending on the group leader setting). Printouts that happen before or after the execution of a test suite, group or case, end up in the unexpected i/o log. The reason for the exit you reported initially, is that if a log call happens during startup or shutdown of Common Test, then, during a short window, it's possible that Common Test fails to communicate with Test Server and crashes. Before I try to answer your question below, I need to understand better what you want to happen to the log printouts that take place during your configuration/setup phase (before the test run starts) and/or during the teardown phase (when Common Test has shut down). If Common Test - in an offline mode (i.e. not running) - should attempt to write incoming ct:log/2 strings to a file, the best it can do really, is to write/append them to say a circular log file in the current working directory. This is possible, but it will be difficult to know which printouts belong to which test runs when analyzing the logs, and as far as I understand, this is the sort of thing you're trying to avoid anyway. As far as I see, it's quite possible to do something clever with log printouts that happen *before* Common Test starts. They could be buffered in a temporary file then read and resent by a CT hook init function so that this data ends up first in the unexpected i/o log for the test run. The problem here is what to do with printouts that happen *after* Common Test has stopped but before your teardown is finished. Another possibility could maybe be, if possible, to change the order of the whole session so that Common Test is always started before your configuration/setup (CT hooks can for example be added dynamically) and not stopped until teardown is also finished. Perhaps an init and terminate function in a high prio CT hook module could be used to synchronize this. Sounds feasible to me. Let me know if I understand your problem correctly and tell me what ideas/requests you have and let's move on from there. Best regards, Peter Ericsson AB, Erlang/OTP Tim Watson wrote: > So chaps, I've found the commit that altered the IO handling in > test_server (in fact, the addition of test_server_io). To clarify, > prior to the addition of test_server_io, calls to ct:log/2 (and > friends, e.g., ct:pal/2 and so on) would succeed even if no test was > running and end up being handled as if they resided in before/after > suite and/or before/after testcase functions. Now it seems that I've > got to vet all the processes that might end up calling ct:log/2 > (indirectly via my event manager) somehow, but there's no proper API > to determine whether or not it is safe to do so. Having all my > debug/info level testing framework logs emitted to the HTML files was > a big reason for choosing common_test, so I'm loth to redirect them > elsewhere. My code is basically doing lots of custom (data driven) > setup/teardown before and after test suites and test cases, and even > though some of this runs before (or during) the common_test test run > is started, I *really* don't want to have to create yet another file > location that needs to be inspected when tests fail. I'm also not keen > on filling up stdio with lots of logging noise. > > Any ideas how I can work around this situation without shooting myself > in the head/foot? ;) > > Cheers, > Tim > > On 12 Jul 2013, at 08:36, Tim Watson wrote: > >> Hi Lukas, thanks for letting me know! >> >> Cheers, >> Tim >> >> On 12 Jul 2013, at 08:29, Lukas Larsson > > wrote: >> >>> Hello Tim, >>> >>> Peter is currently away enjoying the sunny summer here in Sweden. >>> I'm sure he will get back to you when he comes back! >>> >>> Lukas >>> On 12/07/13 02:03, Tim Watson wrote: >>>> On 1 July 2013 10:25, Tim Watson >>> > wrote:> We should try to rule >>>> out that there's a bug that causes test_server >>>> >>>> > failure. >>>> >>>> How can I assist in verifying that? >>>> >>>> >>>> Any more news on this? Is there anything more I can do to assist? >>>> >>>> Cheers, >>>> Tim >>>> >>>> >>>> _______________________________________________ >>>> erlang-bugs mailing list >>>> erlang-bugs@REDACTED >>>> http://erlang.org/mailman/listinfo/erlang-bugs >>>> >>> > From ingela.anderton.andin@REDACTED Wed Aug 7 15:04:27 2013 From: ingela.anderton.andin@REDACTED (Ingela Anderton Andin) Date: Wed, 7 Aug 2013 15:04:27 +0200 Subject: [erlang-bugs] {header,1} inconsistency between TCP and SSL In-Reply-To: <20130720032509.GF27534@hijacked.us> References: <20130720032509.GF27534@hijacked.us> Message-ID: <520245DB.6040602@erix.ericsson.se> Hi! Andrew Thompson wrote: > Today I noticed a difference in behaviour of the {header, 1} option when > using TCP and SSL in erlang releases R15B02 and newer: > > https://gist.github.com/Vagabond/dabecf53ac8b4317e51c > > As you can see, SSL in {header, 1} mode no longer includes the empty > binary as the second element in the list. > > I believe this change was made in this commit: > > https://github.com/erlang/otp/commit/8f97b428eb8f2fb89c3f9ec348f577304b1b9131 > > If you change that back, things work the same as TCP again, but all the > header_decode tests in ssl_packet_SUITE start to fail. > > I'm simply going to stop using {header,1} and just use the bit syntax, > since I notice that Ingela considers it to be a silly option, but I > wanted to at least point the inconsistency out, for posterity. > > Thank you for pointing this out. This is option is quite old and was invented before the bitsyntax. Nowadays just using the bitsyntax is a better option. The change was made to conform to how inet (e.i. gen_tcp) handles the header 1 option but alas it seems we fixed it in one way and brok it in another. Some old things could have been better documented ;) Regards Ingela Erlang/OTP team - Ericsson AB From watson.timothy@REDACTED Thu Aug 8 10:41:21 2013 From: watson.timothy@REDACTED (Tim Watson) Date: Thu, 8 Aug 2013 09:41:21 +0100 Subject: [erlang-bugs] common_test + test_server_io errors In-Reply-To: <52011A87.5080203@erlang.org> References: <5A095356-16E8-4224-BA03-AA58C30FC28D@gmail.com> <51CC4E35.8090408@erlang.org> <3B4CC6D8-4CB5-4236-AEDC-CBAF51A9E7AE@gmail.com> <51DFB044.50302@erlang.org> <52011A87.5080203@erlang.org> Message-ID: <7413B698-86F6-472E-AFD6-E0034C41F885@gmail.com> Hi Peter! Now you've caught me on vacation! :) I'll read through this properly ASAP and get back to you. Thanks! Tim On 6 Aug 2013, at 16:47, Peter Andersson wrote: > Hi Tim, > > A call to ct:log/2 when Common Test is not running (i.e. before or after > the execution of the ct_run program or the ct:run_test/1 function), is > ignored, both in R15 and R16. > > For calls to ct:log/2 when Common Test is running, this is the change > from R15 to R16: > > In R15: > If ct:log/2 is called on a test case process, or one that has inherited > the group leader from a test case process (typically one spawned from a > test case), the string is printed to the test case log. This is true if > the test case in question is still running. Otherwise the log file is no > longer open. And if the latter, Common Test will either print the string > to the current test case log instead, if one (and only one!) test case > is executing, otherwise print the string to the CT Framework log. This > means that in R15, printouts by means of ct:log/2 from processes unknown > to Common Test, end up either in a test case log or in the CT Framework > log. > > In R16: > If ct:log/2 is called on a test case process, or one that has inherited > the group leader from a test case process, the string is printed to the > test case log (as in R15). If the test case in question is finished and > the log file has been closed, the string is printed to the "unexpected > i/o" log (via the test_server_io process) instead. Printouts by means of > ct:log/2 from processes unknown to Common Test, also always end up in > the unexpected i/o log. > > Printouts from CT hook functions are "safe" in the sense that they > execute sequentially pre/post test suite/group/case execution. It's not > possible that a call to ct:log/2 from a hook function "comes in too soon > or late", i.e. gets handled before/after Common Test has > started/finished executing. > > In general, if one knows that Common Test is running (which is the time > between the CT hook init and terminate call, or the start_logging and > stop_logging event message), it is safe to call ct:log/2 or ct:pal/2 > from anywhere and find the data in either the test case logs or in the > unexpected i/o log (depending on the group leader setting). Printouts > that happen before or after the execution of a test suite, group or > case, end up in the unexpected i/o log. > > The reason for the exit you reported initially, is that if a log call > happens during startup or shutdown of Common Test, then, during a short > window, it's possible that Common Test fails to communicate with Test > Server and crashes. > > Before I try to answer your question below, I need to understand better > what you want to happen to the log printouts that take place during your > configuration/setup phase (before the test run starts) and/or during the > teardown phase (when Common Test has shut down). If Common Test - in an > offline mode (i.e. not running) - should attempt to write incoming > ct:log/2 strings to a file, the best it can do really, is to > write/append them to say a circular log file in the current working > directory. This is possible, but it will be difficult to know which > printouts belong to which test runs when analyzing the logs, and as far > as I understand, this is the sort of thing you're trying to avoid anyway. > > As far as I see, it's quite possible to do something clever with log > printouts that happen *before* Common Test starts. They could be > buffered in a temporary file then read and resent by a CT hook init > function so that this data ends up first in the unexpected i/o log for > the test run. The problem here is what to do with printouts that happen > *after* Common Test has stopped but before your teardown is finished. > Another possibility could maybe be, if possible, to change the order of > the whole session so that Common Test is always started before your > configuration/setup (CT hooks can for example be added dynamically) and > not stopped until teardown is also finished. Perhaps an init and > terminate function in a high prio CT hook module could be used to > synchronize this. Sounds feasible to me. > > Let me know if I understand your problem correctly and tell me what > ideas/requests you have and let's move on from there. > > Best regards, > Peter > > Ericsson AB, Erlang/OTP > > Tim Watson wrote: >> So chaps, I've found the commit that altered the IO handling in >> test_server (in fact, the addition of test_server_io). To clarify, >> prior to the addition of test_server_io, calls to ct:log/2 (and >> friends, e.g., ct:pal/2 and so on) would succeed even if no test was >> running and end up being handled as if they resided in before/after >> suite and/or before/after testcase functions. Now it seems that I've >> got to vet all the processes that might end up calling ct:log/2 >> (indirectly via my event manager) somehow, but there's no proper API >> to determine whether or not it is safe to do so. Having all my >> debug/info level testing framework logs emitted to the HTML files was >> a big reason for choosing common_test, so I'm loth to redirect them >> elsewhere. My code is basically doing lots of custom (data driven) >> setup/teardown before and after test suites and test cases, and even >> though some of this runs before (or during) the common_test test run >> is started, I *really* don't want to have to create yet another file >> location that needs to be inspected when tests fail. I'm also not keen >> on filling up stdio with lots of logging noise. >> >> Any ideas how I can work around this situation without shooting myself >> in the head/foot? ;) >> >> Cheers, >> Tim >> >> On 12 Jul 2013, at 08:36, Tim Watson wrote: >> >>> Hi Lukas, thanks for letting me know! >>> >>> Cheers, >>> Tim >>> >>> On 12 Jul 2013, at 08:29, Lukas Larsson >> > wrote: >>> >>>> Hello Tim, >>>> >>>> Peter is currently away enjoying the sunny summer here in Sweden. >>>> I'm sure he will get back to you when he comes back! >>>> >>>> Lukas >>>> On 12/07/13 02:03, Tim Watson wrote: >>>>> On 1 July 2013 10:25, Tim Watson >>>> > wrote:> We should try to rule >>>>> out that there's a bug that causes test_server >>>>> >>>>>> failure. >>>>> >>>>> How can I assist in verifying that? >>>>> >>>>> >>>>> Any more news on this? Is there anything more I can do to assist? >>>>> >>>>> Cheers, >>>>> Tim >>>>> >>>>> >>>>> _______________________________________________ >>>>> erlang-bugs mailing list >>>>> erlang-bugs@REDACTED >>>>> http://erlang.org/mailman/listinfo/erlang-bugs >>>>> >>>> >> > From colanderman@REDACTED Sat Aug 10 06:11:26 2013 From: colanderman@REDACTED (Chris King) Date: Sat, 10 Aug 2013 00:11:26 -0400 Subject: [erlang-bugs] dialyzer false positive io_lib:fread In-Reply-To: References: Message-ID: Hi, Re-sending this, as it seems (after a month checking the archive) that this mailing list silently rejects e-mails from non-subscribers? (There's no mention of this behavior in the listinfo page http://erlang.org/mailman/listinfo/erlang-bugs.) dialyzer produces a false positive when analyzing io_lib:fread with a ~a argument ? it believes (erroneously) that the parsed value will be a string, when in fact it will be an atom. This does not occur with io:fread, or with io_lib:fread with an integer argument. The below test program exemplifies this; dialyzer claims that bugged/1 cannot return, when in fact calling bugged("foo") returns normally in the interpreter. I would be glad to supply a patch but I haven't the slightest clue where to start looking (this seems like either an easy fix, in an "exceptions" list somewhere, or a complex fix deep inside dialyzer). -module(dialyzer_bug). -export([bugged/1, not_bugged1/0, not_bugged2/1]). bugged(S) -> case io_lib:fread("~a", S) of {ok, [Atom], _} when is_atom(Atom) -> Atom end. not_bugged1() -> case io:fread("foo", "~a") of {ok, [Atom]} when is_atom(Atom) -> Atom end. not_bugged2(S) -> case io_lib:fread("~d", S) of {ok, [Integer], _} when is_integer(Integer) -> Integer end. From kostis@REDACTED Sun Aug 11 00:20:05 2013 From: kostis@REDACTED (Kostis Sagonas) Date: Sun, 11 Aug 2013 01:20:05 +0300 Subject: [erlang-bugs] dialyzer false positive io_lib:fread In-Reply-To: References: Message-ID: <5206BC95.9060902@cs.ntua.gr> On 08/10/2013 07:11 AM, Chris King wrote: > Hi, > > Re-sending this, as it seems (after a month checking the archive) that > this mailing list silently rejects e-mails from non-subscribers? > (There's no mention of this behavior in the listinfo page > http://erlang.org/mailman/listinfo/erlang-bugs.) > > > dialyzer produces a false positive when analyzing io_lib:fread with a ~a > argument ? it believes (erroneously) that the parsed value will be a > string, when in fact it will be an atom. This does not occur with > io:fread, or with io_lib:fread with an integer argument. > > The below test program exemplifies this; dialyzer claims that bugged/1 > cannot return, when in fact calling bugged("foo") returns normally in > the interpreter. > > I would be glad to supply a patch but I haven't the slightest clue where > to start looking (this seems like either an easy fix, in an "exceptions" > list somewhere, or a complex fix deep inside dialyzer). The behaviour you are experiencing is a side-effect of the type and spec declarations that exist in modules io_lib and io_lib_fread (*) (*) Aside: is there a really good reason why io_lib_fread is a separate module with cyclic dependencies to io_lib and polluting the module name space, instead of being part of io_lib? In io_lib, the fread/2 function is defined as: fread(Chars, Format) -> io_lib_fread:fread(Chars, Format). and in io_lib_fread the spec of fread/2 reads: -spec fread(Format, String) -> Result when Format :: string(), String :: string(), Result :: {'ok', InputList :: io_lib:chars(), LeftOverChars :: string()} ... where the io_lib:chars() type is defined as: -type chars() :: [char() | chars()]. mentioning nowhere that the InputList also possibly contains atoms instead of just chars(), i.e. short integers. Note that the fact that the spec of io_lib:fread/2 reads: -spec fread(Format, String) -> Result when Format :: string(), String :: string(), Result :: {'ok', InputList :: [term()], LeftOverChars :: string()} ... is irrelevant since dialyzer will take the strongest type information it infers when spec declarations are too loose. Anyway, I am not sure whether the intention of the library developer is to document the possibility to return an atom list or not in that position, so no patch from me either. Hope this helps someone at OTP to fix this, possibly also folding the io_lib_fread module into io_lib the process. Kostis From essen@REDACTED Sun Aug 11 15:19:37 2013 From: essen@REDACTED (=?ISO-8859-1?Q?Lo=EFc_Hoguin?=) Date: Sun, 11 Aug 2013 15:19:37 +0200 Subject: [erlang-bugs] [erlang-questions] Possibly bug in cth_log_redirect? In-Reply-To: References: Message-ID: <52078F69.8050307@ninenines.eu> On 08/09/2013 10:12 PM, Max Lapshin wrote: > When tests are running in parallel, LogFun in cth_log_redirect is > changed to ct_log: > > https://github.com/erlang/otp/blob/3021fca734f71f8bae966ab67f1400d37f8927bc/lib/common_test/src/cth_log_redirect.erl#L49 > > Problem is that it must be not ct_log, but tc_log: > > https://github.com/erlang/otp/blob/3021fca734f71f8bae966ab67f1400d37f8927bc/lib/common_test/src/ct_logs.erl#L44 I am also hit by this issue. Full error message: =ERROR REPORT==== 11-Aug-2013::15:16:09 === ** gen_event handler cth_log_redirect crashed. ** Was installed in error_logger ** Last event was: {error,<0.390.0>, {emulator,"~s~n", ["Error in process <0.620.0> on node 'ct@REDACTED' with exit value: {{<<18 bytes>>,{stacktrace,[{http_errors,handle,2,[{file,\"test/http_SUITE_data/http_errors.erl\"},{line,37}]},{cowboy_handler,handler_handle,4,[{file,\"src/cowboy_handler.erl\"},{line,115}]},{cowboy_protocol,execute,4,[{file,\"src/cowbo... \n"]}} ** When handler state == ct_log ** Reason == {'function not exported', [{ct_logs,ct_log, [error_logger,50,"System", ["\n",61,"ERROR REPORT",61,61,61,61,32,"11",45,"Aug", 45,"2013",58,58,"17",58,"16",58,"09",32,61,61,61,"\n", "Error in process <0.620.0> on node 'ct@REDACTED' with exit value: {{<<18 bytes>>,{stacktrace,[{http_errors,handle,2,[{file,\"test/http_SUITE_data/http_errors.erl\"},{line,37}]},{cowboy_handler,handler_handle,4,[{file,\"src/cowboy_handler.erl\"},{line,115}]},{cowboy_protocol,execute,4,[{file,\"src/cowbo... \n", "\n"], []], []}, {cth_log_redirect,handle_event,2, [{file,"cth_log_redirect.erl"},{line,91}]}, {gen_event,server_update,4, [{file,"gen_event.erl"},{line,522}]}, {gen_event,server_notify,4, [{file,"gen_event.erl"},{line,504}]}, {gen_event,handle_msg,5,[{file,"gen_event.erl"},{line,266}]}, {proc_lib,init_p_do_apply,3, [{file,"proc_lib.erl"},{line,239}]}]} -- Lo?c Hoguin Erlang Cowboy Nine Nines http://ninenines.eu From peppe@REDACTED Mon Aug 12 10:20:35 2013 From: peppe@REDACTED (Peter Andersson) Date: Mon, 12 Aug 2013 10:20:35 +0200 Subject: [erlang-bugs] [erlang-questions] Possibly bug in cth_log_redirect? In-Reply-To: <52078F69.8050307@ninenines.eu> References: <52078F69.8050307@ninenines.eu> Message-ID: <52089AD3.4080600@erlang.org> Thanks for reporting this, guys! We'll fix it asap. /Peter Ericsson AB, Erlang/OTP Lo?c Hoguin wrote: > On 08/09/2013 10:12 PM, Max Lapshin wrote: > > When tests are running in parallel, LogFun in cth_log_redirect is > > changed to ct_log: > > > > https://github.com/erlang/otp/blob/3021fca734f71f8bae966ab67f1400d37f8927bc/lib/common_test/src/cth_log_redirect.erl#L49 > > > > Problem is that it must be not ct_log, but tc_log: > > > > https://github.com/erlang/otp/blob/3021fca734f71f8bae966ab67f1400d37f8927bc/lib/common_test/src/ct_logs.erl#L44 > > I am also hit by this issue. > > Full error message: > > =ERROR REPORT==== 11-Aug-2013::15:16:09 === > ** gen_event handler cth_log_redirect crashed. > ** Was installed in error_logger > ** Last event was: {error,<0.390.0>, > {emulator,"~s~n", > ["Error in process <0.620.0> on > node 'ct@REDACTED' with exit value: {{<<18 > bytes>>,{stacktrace,[{http_errors,handle,2,[{file,\"test/http_SUITE_data/http_errors.erl\"},{line,37}]},{cowboy_handler,handler_handle,4,[{file,\"src/cowboy_handler.erl\"},{line,115}]},{cowboy_protocol,execute,4,[{file,\"src/cowbo... > \n"]}} > ** When handler state == ct_log > ** Reason == {'function not exported', > [{ct_logs,ct_log, > [error_logger,50,"System", > ["\n",61,"ERROR > REPORT",61,61,61,61,32,"11",45,"Aug", > > 45,"2013",58,58,"17",58,"16",58,"09",32,61,61,61,"\n", > "Error in process <0.620.0> on node > 'ct@REDACTED' with exit value: {{<<18 > bytes>>,{stacktrace,[{http_errors,handle,2,[{file,\"test/http_SUITE_data/http_errors.erl\"},{line,37}]},{cowboy_handler,handler_handle,4,[{file,\"src/cowboy_handler.erl\"},{line,115}]},{cowboy_protocol,execute,4,[{file,\"src/cowbo... > \n", > "\n"], > []], > []}, > {cth_log_redirect,handle_event,2, > [{file,"cth_log_redirect.erl"},{line,91}]}, > {gen_event,server_update,4, > [{file,"gen_event.erl"},{line,522}]}, > {gen_event,server_notify,4, > [{file,"gen_event.erl"},{line,504}]}, > > {gen_event,handle_msg,5,[{file,"gen_event.erl"},{line,266}]}, > {proc_lib,init_p_do_apply,3, > [{file,"proc_lib.erl"},{line,239}]}]} > > > From rr@REDACTED Tue Aug 13 05:40:40 2013 From: rr@REDACTED (Rick Reed) Date: Mon, 12 Aug 2013 20:40:40 -0700 Subject: [erlang-bugs] efile_drv & async thread key Message-ID: It looks to me as though there's a bit of a problem in the way efile_drv.c generates the key that's used to select an async driver queue. It uses the address of the port which on our system is 8-byte aligned. Meanwhile, erl_async.c does a simple mod operation with the number of async threads, so the number of threads that can actually be used by file operations is 1/8th of the number configured. I suspect this isn't intended. Rr -------------- next part -------------- An HTML attachment was scrubbed... URL: From lukas@REDACTED Tue Aug 13 09:40:55 2013 From: lukas@REDACTED (Lukas Larsson) Date: Tue, 13 Aug 2013 09:40:55 +0200 Subject: [erlang-bugs] efile_drv & async thread key In-Reply-To: References: Message-ID: <5209E307.6030806@erlang.org> Hello Rick! Which version of Erlang are you using? From R16B (I think), the ErlDrvPort datatype no longer is a pointer to the port struct. Instead it is the slot id into the port table and those ids should contain all values. I did a quick test on my computer running the latest on maint on github and seem to get a full spread over all async threads. Lukas On 13/08/13 05:40, Rick Reed wrote: > It looks to me as though there's a bit of a problem in the way > efile_drv.c generates the > key that's used to select an async driver queue. It uses the address > of the port which > on our system is 8-byte aligned. Meanwhile, erl_async.c does a simple > mod operation > with the number of async threads, so the number of threads that can > actually be used > by file operations is 1/8th of the number configured. I suspect this > isn't intended. > > Rr > > > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs -------------- next part -------------- An HTML attachment was scrubbed... URL: From lukas@REDACTED Tue Aug 13 10:05:01 2013 From: lukas@REDACTED (Lukas Larsson) Date: Tue, 13 Aug 2013 10:05:01 +0200 Subject: [erlang-bugs] efile_drv & async thread key In-Reply-To: <5209E307.6030806@erlang.org> References: <5209E307.6030806@erlang.org> Message-ID: <5209E8AD.5000208@erlang.org> Sigh, apparently I spoke too soon. I remembered incorrectly about the change. It was in R16B that ErlDrvPort became a ptr and it was an id before R16B. Anyways, it is odd that the ptr is 8 bit aligned on you system. On mine (Ubuntu 13.04, x86_64) the ptrs are not aligned and the load is nicely distributed among async threads. If I remember correctly you are using FreeBSD on x86_64? I'll check if I can reproduce the behavior you are seeing on our FreeBSD machine. Lukas On 13/08/13 09:40, Lukas Larsson wrote: > Hello Rick! > > Which version of Erlang are you using? From R16B (I think), the > ErlDrvPort datatype no longer is a pointer to the port struct. Instead > it is the slot id into the port table and those ids should contain all > values. I did a quick test on my computer running the latest on maint > on github and seem to get a full spread over all async threads. > > Lukas > > On 13/08/13 05:40, Rick Reed wrote: >> It looks to me as though there's a bit of a problem in the way >> efile_drv.c generates the >> key that's used to select an async driver queue. It uses the address >> of the port which >> on our system is 8-byte aligned. Meanwhile, erl_async.c does a >> simple mod operation >> with the number of async threads, so the number of threads that can >> actually be used >> by file operations is 1/8th of the number configured. I suspect this >> isn't intended. >> >> Rr >> >> >> >> _______________________________________________ >> erlang-bugs mailing list >> erlang-bugs@REDACTED >> http://erlang.org/mailman/listinfo/erlang-bugs > > > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs -------------- next part -------------- An HTML attachment was scrubbed... URL: From lukas@REDACTED Tue Aug 13 13:52:19 2013 From: lukas@REDACTED (Lukas Larsson) Date: Tue, 13 Aug 2013 13:52:19 +0200 Subject: [erlang-bugs] efile_drv & async thread key In-Reply-To: <5209E8AD.5000208@erlang.org> References: <5209E307.6030806@erlang.org> <5209E8AD.5000208@erlang.org> Message-ID: <520A1DF3.8050708@erlang.org> And there it is, conclusive proof that I should not be debugging Rickard's code before lunch. Found the issue, will create a fix for it. As a workaround for R16B you can use a prime number as the number of async threads :) Lukas On 13/08/13 10:05, Lukas Larsson wrote: > Sigh, apparently I spoke too soon. > > I remembered incorrectly about the change. It was in R16B that > ErlDrvPort became a ptr and it was an id before R16B. Anyways, it is > odd that the ptr is 8 bit aligned on you system. On mine (Ubuntu > 13.04, x86_64) the ptrs are not aligned and the load is nicely > distributed among async threads. If I remember correctly you are using > FreeBSD on x86_64? I'll check if I can reproduce the behavior you are > seeing on our FreeBSD machine. > > Lukas > > On 13/08/13 09:40, Lukas Larsson wrote: >> Hello Rick! >> >> Which version of Erlang are you using? From R16B (I think), the >> ErlDrvPort datatype no longer is a pointer to the port struct. >> Instead it is the slot id into the port table and those ids should >> contain all values. I did a quick test on my computer running the >> latest on maint on github and seem to get a full spread over all >> async threads. >> >> Lukas >> >> On 13/08/13 05:40, Rick Reed wrote: >>> It looks to me as though there's a bit of a problem in the way >>> efile_drv.c generates the >>> key that's used to select an async driver queue. It uses the >>> address of the port which >>> on our system is 8-byte aligned. Meanwhile, erl_async.c does a >>> simple mod operation >>> with the number of async threads, so the number of threads that can >>> actually be used >>> by file operations is 1/8th of the number configured. I suspect >>> this isn't intended. >>> >>> Rr >>> >>> >>> >>> _______________________________________________ >>> erlang-bugs mailing list >>> erlang-bugs@REDACTED >>> http://erlang.org/mailman/listinfo/erlang-bugs >> >> >> >> _______________________________________________ >> erlang-bugs mailing list >> erlang-bugs@REDACTED >> http://erlang.org/mailman/listinfo/erlang-bugs > > > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs -------------- next part -------------- An HTML attachment was scrubbed... URL: From rr@REDACTED Wed Aug 14 02:21:04 2013 From: rr@REDACTED (Rick Reed) Date: Tue, 13 Aug 2013 17:21:04 -0700 Subject: [erlang-bugs] efile_drv & async thread key In-Reply-To: <520A1DF3.8050708@erlang.org> References: <5209E307.6030806@erlang.org> <5209E8AD.5000208@erlang.org> <520A1DF3.8050708@erlang.org> Message-ID: I assume the reason for keying the file requests is to prevent a single port from soaking up all the async threads? Rr On Tue, Aug 13, 2013 at 4:52 AM, Lukas Larsson wrote: > And there it is, conclusive proof that I should not be debugging > Rickard's code before lunch. > > Found the issue, will create a fix for it. As a workaround for R16B you > can use a prime number as the number of async threads :) > > Lukas > > > On 13/08/13 10:05, Lukas Larsson wrote: > > Sigh, apparently I spoke too soon. > > I remembered incorrectly about the change. It was in R16B that ErlDrvPort > became a ptr and it was an id before R16B. Anyways, it is odd that the ptr > is 8 bit aligned on you system. On mine (Ubuntu 13.04, x86_64) the ptrs are > not aligned and the load is nicely distributed among async threads. If I > remember correctly you are using FreeBSD on x86_64? I'll check if I can > reproduce the behavior you are seeing on our FreeBSD machine. > > Lukas > > On 13/08/13 09:40, Lukas Larsson wrote: > > Hello Rick! > > Which version of Erlang are you using? From R16B (I think), the ErlDrvPort > datatype no longer is a pointer to the port struct. Instead it is the slot > id into the port table and those ids should contain all values. I did a > quick test on my computer running the latest on maint on github and seem to > get a full spread over all async threads. > > Lukas > > On 13/08/13 05:40, Rick Reed wrote: > > It looks to me as though there's a bit of a problem in the way efile_drv.c > generates the > key that's used to select an async driver queue. It uses the address of > the port which > on our system is 8-byte aligned. Meanwhile, erl_async.c does a simple mod > operation > with the number of async threads, so the number of threads that can > actually be used > by file operations is 1/8th of the number configured. I suspect this > isn't intended. > > Rr > > > > _______________________________________________ > erlang-bugs mailing listerlang-bugs@REDACTED://erlang.org/mailman/listinfo/erlang-bugs > > > > > _______________________________________________ > erlang-bugs mailing listerlang-bugs@REDACTED://erlang.org/mailman/listinfo/erlang-bugs > > > > > _______________________________________________ > erlang-bugs mailing listerlang-bugs@REDACTED://erlang.org/mailman/listinfo/erlang-bugs > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From james@REDACTED Tue Aug 13 14:03:01 2013 From: james@REDACTED (James Wheare) Date: Tue, 13 Aug 2013 13:03:01 +0100 Subject: [erlang-bugs] Binary memory reuse issue in unicode:characters_to_list Message-ID: Just found this extremely unexpected behaviour when using binary pattern matching and unicode:characters_to_list http://pastebin.com/7EYEhu0Z Given a 2 byte binary, e.g. <<65,128>> (65 = letter "A", 128 = invalid standalone utf8 byte) <> = <<65,128>>, Char = 65, Rest = <<128>>. unicode:characters_to_list(Rest) should error, with {error, [], <<128>>} but instead is giving {error, [], "A"} unicode:characters_to_list(<<128>>) produces the desired result even though it should be identical. Making a copy will also give the desired result: Rest2 = <>, unicode:characters_to_list(Rest). Is this related to binary optimisations detailed here? http://www.erlang.org/doc/efficiency_guide/binaryhandling.html Seems like a bug in the unicode nif. Note that it's not reproducing on all environments, even given the same erlang version. Even 2 identical linux vms running under virtualbox but on 2 separate host machines produced different results (one showed the bug, one didn't) From pan@REDACTED Wed Aug 14 10:29:38 2013 From: pan@REDACTED (Patrik Nyblom) Date: Wed, 14 Aug 2013 10:29:38 +0200 Subject: [erlang-bugs] Binary memory reuse issue in unicode:characters_to_list In-Reply-To: References: Message-ID: <520B3FF2.2020607@erlang.org> Hi! This bug was fixed in the latest release. See https://github.com/erlang/otp/commit/0ebffb2b55bd1870bfbe0ea47aa94017d7917084 for details. Cheers, Patrik On 08/13/2013 02:03 PM, James Wheare wrote: > Just found this extremely unexpected behaviour when using binary > pattern matching and unicode:characters_to_list > > http://pastebin.com/7EYEhu0Z > > Given a 2 byte binary, e.g. <<65,128>> (65 = letter "A", 128 = invalid > standalone utf8 byte) > > <> = <<65,128>>, > Char = 65, > Rest = <<128>>. > > unicode:characters_to_list(Rest) should error, with {error, [], > <<128>>} but instead is giving {error, [], "A"} > > unicode:characters_to_list(<<128>>) produces the desired result even > though it should be identical. > > Making a copy will also give the desired result: > Rest2 = <>, > unicode:characters_to_list(Rest). > > Is this related to binary optimisations detailed here? > http://www.erlang.org/doc/efficiency_guide/binaryhandling.html > > Seems like a bug in the unicode nif. > > Note that it's not reproducing on all environments, even given the > same erlang version. Even 2 identical linux vms running under > virtualbox but on 2 separate host machines produced different results > (one showed the bug, one didn't) > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs From pan@REDACTED Wed Aug 14 10:36:34 2013 From: pan@REDACTED (Patrik Nyblom) Date: Wed, 14 Aug 2013 10:36:34 +0200 Subject: [erlang-bugs] efile_drv & async thread key In-Reply-To: References: <5209E307.6030806@erlang.org> <5209E8AD.5000208@erlang.org> <520A1DF3.8050708@erlang.org> Message-ID: <520B4192.2060003@erlang.org> Hi Rick! On 08/14/2013 02:21 AM, Rick Reed wrote: > I assume the reason for keying the file requests is to prevent a > single port from > soaking up all the async threads? Yes, and it's also important that requests for the same file "descriptor" end up in she same async queue. So we need to store a fixed key in the file descriptor structure. I think I will hash the pointer to create the key, not just shift away the "zero-bits", you never know which icky patterns an allocator can create that will distribute the jobs unevenly. The key will only be calculated upon opening, so there will be minimal performance hit due to the more complicated calculations. Thanks for reporting - this could cause severe performance issues in applications! Cheers, Patrik > > Rr > > > On Tue, Aug 13, 2013 at 4:52 AM, Lukas Larsson > wrote: > > And there it is, conclusive proof that I should not be debugging > Rickard's code before lunch. > > Found the issue, will create a fix for it. As a workaround for > R16B you can use a prime number as the number of async threads :) > > Lukas > > > On 13/08/13 10:05, Lukas Larsson wrote: >> Sigh, apparently I spoke too soon. >> >> I remembered incorrectly about the change. It was in R16B that >> ErlDrvPort became a ptr and it was an id before R16B. Anyways, it >> is odd that the ptr is 8 bit aligned on you system. On mine >> (Ubuntu 13.04, x86_64) the ptrs are not aligned and the load is >> nicely distributed among async threads. If I remember correctly >> you are using FreeBSD on x86_64? I'll check if I can reproduce >> the behavior you are seeing on our FreeBSD machine. >> >> Lukas >> >> On 13/08/13 09:40, Lukas Larsson wrote: >>> Hello Rick! >>> >>> Which version of Erlang are you using? From R16B (I think), the >>> ErlDrvPort datatype no longer is a pointer to the port struct. >>> Instead it is the slot id into the port table and those ids >>> should contain all values. I did a quick test on my computer >>> running the latest on maint on github and seem to get a full >>> spread over all async threads. >>> >>> Lukas >>> >>> On 13/08/13 05:40, Rick Reed wrote: >>>> It looks to me as though there's a bit of a problem in the way >>>> efile_drv.c generates the >>>> key that's used to select an async driver queue. It uses the >>>> address of the port which >>>> on our system is 8-byte aligned. Meanwhile, erl_async.c does a >>>> simple mod operation >>>> with the number of async threads, so the number of threads that >>>> can actually be used >>>> by file operations is 1/8th of the number configured. I >>>> suspect this isn't intended. >>>> >>>> Rr >>>> >>>> >>>> >>>> _______________________________________________ >>>> erlang-bugs mailing list >>>> erlang-bugs@REDACTED >>>> http://erlang.org/mailman/listinfo/erlang-bugs >>> >>> >>> >>> _______________________________________________ >>> erlang-bugs mailing list >>> erlang-bugs@REDACTED >>> http://erlang.org/mailman/listinfo/erlang-bugs >> >> >> >> _______________________________________________ >> erlang-bugs mailing list >> erlang-bugs@REDACTED >> http://erlang.org/mailman/listinfo/erlang-bugs > > > > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs -------------- next part -------------- An HTML attachment was scrubbed... URL: From watson.timothy@REDACTED Wed Aug 14 11:58:06 2013 From: watson.timothy@REDACTED (Tim Watson) Date: Wed, 14 Aug 2013 10:58:06 +0100 Subject: [erlang-bugs] common_test + test_server_io errors In-Reply-To: <52011A87.5080203@erlang.org> References: <5A095356-16E8-4224-BA03-AA58C30FC28D@gmail.com> <51CC4E35.8090408@erlang.org> <3B4CC6D8-4CB5-4236-AEDC-CBAF51A9E7AE@gmail.com> <51DFB044.50302@erlang.org> <52011A87.5080203@erlang.org> Message-ID: Hi Peter, Thanks for getting back to me with this. Now I've had chance to look at it, and to experiment with some different options, I can explain a bit more about what I think is going on. On 6 August 2013 16:47, Peter Andersson wrote: > Printouts from CT hook functions are "safe" in the sense that they > execute sequentially pre/post test suite/group/case execution. It's not > possible that a call to ct:log/2 from a hook function "comes in too soon > or late", i.e. gets handled before/after Common Test has > started/finished executing. > > > In general, if one knows that Common Test is running (which is the time > between the CT hook init and terminate call, or the start_logging and > stop_logging event message), it is safe to call ct:log/2 or ct:pal/2 > from anywhere and find the data in either the test case logs or in the > unexpected i/o log (depending on the group leader setting). > > > The reason for the exit you reported initially, is that if a log call > happens during startup or shutdown of Common Test, then, during a short > window, it's possible that Common Test fails to communicate with Test > Server and crashes. > > So I've definitely been hitting a race condition in my code here. I tried adding/removing the event hander that routes logging messages to ct:{log,pal}/2 around the {start,stop}_logging events, however that didn't help at all. In my ct_hook module, the call that triggers this explosion occurs inside the init/1 function: %% from systest_cth.erl init(systest, Opts) -> case application:start(systest, permanent) of {error, {already_started, systest}} -> systest:reset(); {error, _Reason}=Err -> Err; ok -> ok end, etc .... %% from systest.erl reset() -> %% both these operations are synchronous systest_watchdog:reset(), systest_results:reset(), ok. %% systest_watchdog.erl reset() -> gen_server:call(?MODULE, reset). Further down in systest_watchdog, the handle_call/3 clause that deals with the 'reset' signal ends up calling the systest_log, which leads to a gen_event handler calling ct:pal/2. According to what you've said above, about the timing of logging attempts within the bounds of a ct_hook's init/terminate functions, that seems like it should work shouldn't it? Attempting to trigger the logging by only (de)registering the ct:pal handler in response to the {start,stop}_logging events didn't though. I guess there must be another potential race there - however I haven't attempted to do that via the hook's init/terminate functions - I'll try that now and let you know if it resolves my issues. Before I try to answer your question below, I need to understand better > what you want to happen to the log printouts that take place during your > configuration/setup phase (before the test run starts) and/or during the > teardown phase (when Common Test has shut down). I think/hope that all my setup/teardown is synchronous with regards each test run - that is to say I allow for parallel test cases within suites and I don't actually return from the hook's systest_cth:stop/4 callback until all the resources configured at that scope (be it suite, group, testcase) have been killed and reported (via monitors) as dead. If Common Test - in an > offline mode (i.e. not running) - should attempt to write incoming > ct:log/2 strings to a file, the best it can do really, is to > write/append them to say a circular log file in the current working > directory. This is possible, but it will be difficult to know which > printouts belong to which test runs when analyzing the logs, and as far > as I understand, this is the sort of thing you're trying to avoid anyway. > > Indeed. As a temporary work-around, I've stopped logging internal events to the common_test logs and put them in a separate log file instead, but that's not really what I wanted to end up with. > As far as I see, it's quite possible to do something clever with log > printouts that happen *before* Common Test starts. They could be > buffered in a temporary file then read and resent by a CT hook init > function so that this data ends up first in the unexpected i/o log for > the test run. The problem here is what to do with printouts that happen > *after* Common Test has stopped but before your teardown is finished. > Another possibility could maybe be, if possible, to change the order of > the whole session so that Common Test is always started before your > configuration/setup (CT hooks can for example be added dynamically) and > not stopped until teardown is also finished. Perhaps an init and > terminate function in a high prio CT hook module could be used to > synchronize this. Sounds feasible to me. > > Let me know if I understand your problem correctly and tell me what > ideas/requests you have and let's move on from there. > > I'm going to try again with a high prio hook to turn on/off the logging handler and see if this works. By all accounts it sounds as if it should. The other thing I might do to alleviate this problem is swallow any exception from ct:pal/2, which is *cough* bad form as a rule, but in this case might actually be the right thing to do. Let me experiment with those two options first and get back to you. It might be that all I end up asking for is a bit more info in the documentation explaining the constraints. I'll try to post back later today. Cheers, Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: From watson.timothy@REDACTED Wed Aug 14 13:09:41 2013 From: watson.timothy@REDACTED (Tim Watson) Date: Wed, 14 Aug 2013 12:09:41 +0100 Subject: [erlang-bugs] common_test + test_server_io errors In-Reply-To: References: <5A095356-16E8-4224-BA03-AA58C30FC28D@gmail.com> <51CC4E35.8090408@erlang.org> <3B4CC6D8-4CB5-4236-AEDC-CBAF51A9E7AE@gmail.com> <51DFB044.50302@erlang.org> <52011A87.5080203@erlang.org> Message-ID: Hi Peter, "In general, if one knows that Common Test is running (which is the time between the CT hook init and terminate call, or the start_logging and stop_logging event message), it is safe to call ct:log/2 or ct:pal/2 from anywhere and find the data in either the test case logs or in the unexpected i/o log (depending on the group leader setting)." After some initial testing, I can confirm that this does not work as expected. I changed my code to only add the event (log) handler which executes ct:pal/2 inside the hook's init/2 callback, however the problem still persists! My hook and the corresponding call chain now looks like this: %% systest_cth hook init(systest, Opts) -> case application:start(systest, permanent) of {error, {already_started, systest}} -> io:format("starting ct log!~n"), systest_ct_log:start(), systest:reset(); {error, _Reason}=Err -> Err; ok -> ok end, etc .... %% systest_ct_log:start start() -> ok = systest_log:start(ct, systest_ct_log, common_test). %% systest_log:start/3 start(Id, Mod, Output) -> gen_event:add_handler(systest_event_log, {?MODULE, Id}, [Id, Mod, Output]). %% systest_ct_log:write_log/4 write_log(EvId, _Fd, What, Args) -> ct:log("[" ++ as_string(EvId) ++ "] " ++ as_string(What), Args). When I execute a test run with this code in place however, I still get the crash, though the io:format/2 notice that I'm starting the ct log appears first: Common Test starting (cwd is /home/t4/work/vmware/rabbitmq-public-umbrella/rabbitmq-test/multi-node) starting ct log! ct_util_server got EXIT from <0.61.0>: {noproc, {gen_server,call, [test_server_io, {print,xxxFrom,unexpected_io, [[[["
*** User 2013-08-14 12:02:36.830 ***"], "\n", [91,102,114,97,109,101,119,111, 114,107,93,32,119,97,116,99, 104,100,111,103,58,32,110,111, 32,112,114,111,99,115,32,116, 111,32,107,105,108,108,"\n"]], "\n","
"]]}, infinity]}} So it appears that the assertion that logging will work between the hook's init and terminate callbacks isn't quite working. On 14 August 2013 10:58, Tim Watson wrote: > Hi Peter, > > > Thanks for getting back to me with this. Now I've had chance to look at > it, and to experiment with some different options, I can explain a bit more > about what I think is going on. > > On 6 August 2013 16:47, Peter Andersson wrote: > >> Printouts from CT hook functions are "safe" in the sense that they >> execute sequentially pre/post test suite/group/case execution. It's not >> possible that a call to ct:log/2 from a hook function "comes in too soon >> or late", i.e. gets handled before/after Common Test has >> started/finished executing. >> >> > > > >> In general, if one knows that Common Test is running (which is the time >> between the CT hook init and terminate call, or the start_logging and >> stop_logging event message), it is safe to call ct:log/2 or ct:pal/2 >> from anywhere and find the data in either the test case logs or in the >> unexpected i/o log (depending on the group leader setting). >> >> > > > >> The reason for the exit you reported initially, is that if a log call >> happens during startup or shutdown of Common Test, then, during a short >> window, it's possible that Common Test fails to communicate with Test >> Server and crashes. >> >> > So I've definitely been hitting a race condition in my code here. I tried > adding/removing the event hander that routes logging messages to > ct:{log,pal}/2 around the {start,stop}_logging events, however that didn't > help at all. In my ct_hook module, the call that triggers this explosion > occurs inside the init/1 function: > > %% from systest_cth.erl > > init(systest, Opts) -> > case application:start(systest, permanent) of > {error, {already_started, systest}} -> systest:reset(); > {error, _Reason}=Err -> Err; > ok -> ok > end, > etc .... > > %% from systest.erl > > reset() -> > %% both these operations are synchronous > systest_watchdog:reset(), > systest_results:reset(), > ok. > > %% systest_watchdog.erl > > reset() -> > gen_server:call(?MODULE, reset). > > > Further down in systest_watchdog, the handle_call/3 clause that deals with > the 'reset' signal ends up calling the systest_log, which leads to a > gen_event handler calling ct:pal/2. > > According to what you've said above, about the timing of logging attempts > within the bounds of a ct_hook's init/terminate functions, that seems like > it should work shouldn't it? Attempting to trigger the logging by only > (de)registering the ct:pal handler in response to the {start,stop}_logging > events didn't though. I guess there must be another potential race there - > however I haven't attempted to do that via the hook's init/terminate > functions - I'll try that now and let you know if it resolves my issues. > > Before I try to answer your question below, I need to understand better >> what you want to happen to the log printouts that take place during your >> configuration/setup phase (before the test run starts) and/or during the >> teardown phase (when Common Test has shut down). > > > I think/hope that all my setup/teardown is synchronous with regards each > test run - that is to say I allow for parallel test cases within suites and > I don't actually return from the hook's systest_cth:stop/4 callback until > all the resources configured at that scope (be it suite, group, testcase) > have been killed and reported (via monitors) as dead. > > If Common Test - in an >> offline mode (i.e. not running) - should attempt to write incoming >> ct:log/2 strings to a file, the best it can do really, is to >> write/append them to say a circular log file in the current working >> directory. This is possible, but it will be difficult to know which >> printouts belong to which test runs when analyzing the logs, and as far >> as I understand, this is the sort of thing you're trying to avoid anyway. >> >> > Indeed. As a temporary work-around, I've stopped logging internal events > to the common_test logs and put them in a separate log file instead, but > that's not really what I wanted to end up with. > > >> As far as I see, it's quite possible to do something clever with log >> printouts that happen *before* Common Test starts. They could be >> buffered in a temporary file then read and resent by a CT hook init >> function so that this data ends up first in the unexpected i/o log for >> the test run. The problem here is what to do with printouts that happen >> *after* Common Test has stopped but before your teardown is finished. >> Another possibility could maybe be, if possible, to change the order of >> the whole session so that Common Test is always started before your >> configuration/setup (CT hooks can for example be added dynamically) and >> not stopped until teardown is also finished. Perhaps an init and >> terminate function in a high prio CT hook module could be used to >> synchronize this. Sounds feasible to me. >> >> Let me know if I understand your problem correctly and tell me what >> ideas/requests you have and let's move on from there. >> >> > I'm going to try again with a high prio hook to turn on/off the logging > handler and see if this works. By all accounts it sounds as if it should. > The other thing I might do to alleviate this problem is swallow any > exception from ct:pal/2, which is *cough* bad form as a rule, but in this > case might actually be the right thing to do. > > Let me experiment with those two options first and get back to you. It > might be that all I end up asking for is a bit more info in the > documentation explaining the constraints. > > I'll try to post back later today. > > Cheers, > Tim > -------------- next part -------------- An HTML attachment was scrubbed... URL: From watson.timothy@REDACTED Wed Aug 14 13:12:47 2013 From: watson.timothy@REDACTED (Tim Watson) Date: Wed, 14 Aug 2013 12:12:47 +0100 Subject: [erlang-bugs] common_test + test_server_io errors In-Reply-To: References: <5A095356-16E8-4224-BA03-AA58C30FC28D@gmail.com> <51CC4E35.8090408@erlang.org> <3B4CC6D8-4CB5-4236-AEDC-CBAF51A9E7AE@gmail.com> <51DFB044.50302@erlang.org> <52011A87.5080203@erlang.org> Message-ID: On 14 August 2013 12:09, Tim Watson wrote: > When I execute a test run with this code in place however, I still get the > crash, though the io:format/2 notice that I'm starting the ct log appears > first: > > Common Test starting (cwd is > /home/t4/work/vmware/rabbitmq-public-umbrella/rabbitmq-test/multi-node) > > starting ct log! > > > ct_util_server got EXIT from <0.61.0>: {noproc, > {gen_server,call, > [test_server_io, > {print,xxxFrom,unexpected_io, > [[[["
class=\"default\">*** User 2013-08-14 12:02:36.830 ***"], > > "\n", > > [91,102,114,97,109,101,119,111, > 114,107,93,32,119,97,116,99, > > 104,100,111,103,58,32,110,111, > > 32,112,114,111,99,115,32,116, > > 111,32,107,105,108,108,"\n"]], > "\n","
"]]}, > infinity]}} > > > So it appears that the assertion that logging will work between the hook's > init and terminate callbacks isn't quite working. > > Oh and I've tried pausing between the systest_ct_log:start/0 call and the (latter) systest:reset/0 call that triggers the logging, but that didn't make any difference either - e.g., like so: init(systest, Opts) -> case application:start(systest, permanent) of {error, {already_started, systest}} -> io:format("starting ct log!~n"), systest_ct_log:start(), receive foobar -> ok after 2000 -> ok end, systest:reset(); {error, _Reason}=Err -> Err; ok -> ok end, -------------- next part -------------- An HTML attachment was scrubbed... URL: From watson.timothy@REDACTED Wed Aug 14 13:44:35 2013 From: watson.timothy@REDACTED (Tim Watson) Date: Wed, 14 Aug 2013 12:44:35 +0100 Subject: [erlang-bugs] common_test + test_server_io errors In-Reply-To: References: <5A095356-16E8-4224-BA03-AA58C30FC28D@gmail.com> <51CC4E35.8090408@erlang.org> <3B4CC6D8-4CB5-4236-AEDC-CBAF51A9E7AE@gmail.com> <51DFB044.50302@erlang.org> <52011A87.5080203@erlang.org> Message-ID: Hi Peter, Ok that's great - thanks for your assistance! Cheers, Tim On 14 August 2013 12:36, Peter Andersson wrote: > > Hi Tim, > > Thanks for all the useful info! > > I haven't actually run any tests on this myself, only read some code so > far. Obviously the init and terminate hook functions get called before the > test server process is even started. In other words, these functions > actually execute in that short "evil" window during startup when you can't > call pal/2 or log/2. I missed that. :-( Sorry for misleading you! > > Let me dig into this properly and get back to you when I can propose > useful (tested!) solutions to your problems! > > Best, > Peter > > Ericsson AB, Erlang/OTP > > > > On Wed, 14 Aug 2013, Tim Watson wrote: > > On 14 August 2013 12:09, Tim Watson wrote: >> >> When I execute a test run with this code in place however, I still get >>> the >>> crash, though the io:format/2 notice that I'm starting the ct log appears >>> first: >>> >>> Common Test starting (cwd is >>> /home/t4/work/vmware/rabbitmq-**public-umbrella/rabbitmq-test/** >>> multi-node) >>> >>> starting ct log! >>> >>> >>> ct_util_server got EXIT from <0.61.0>: {noproc, >>> {gen_server,call, >>> [test_server_io, >>> {print,xxxFrom,unexpected_io, >>> [[[["
>> class=\"default\">*** User 2013-08-14 12:02:36.830 ***"], >>> >>> "\n", >>> >>> [91,102,114,97,109,101,119,**111, >>> >>> 114,107,93,32,119,97,116,99, >>> >>> 104,100,111,103,58,32,110,111, >>> >>> 32,112,114,111,99,115,32,116, >>> >>> 111,32,107,105,108,108,"\n"]], >>> "\n","
"]]}, >>> infinity]}} >>> >>> >>> So it appears that the assertion that logging will work between the >>> hook's >>> init and terminate callbacks isn't quite working. >>> >>> >>> Oh and I've tried pausing between the systest_ct_log:start/0 call and >> the >> (latter) systest:reset/0 call that triggers the logging, but that didn't >> make any difference either - e.g., like so: >> >> init(systest, Opts) -> >> case application:start(systest, permanent) of >> {error, {already_started, systest}} -> io:format("starting ct >> log!~n"), >> systest_ct_log:start(), >> receive >> foobar -> ok >> after 2000 -> ok >> end, >> systest:reset(); >> {error, _Reason}=Err -> Err; >> ok -> ok >> end, >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From peppe@REDACTED Wed Aug 14 16:34:08 2013 From: peppe@REDACTED (Peter Andersson) Date: Wed, 14 Aug 2013 16:34:08 +0200 Subject: [erlang-bugs] common_test + test_server_io errors In-Reply-To: References: <5A095356-16E8-4224-BA03-AA58C30FC28D@gmail.com> <51CC4E35.8090408@erlang.org> <3B4CC6D8-4CB5-4236-AEDC-CBAF51A9E7AE@gmail.com> <51DFB044.50302@erlang.org> <52011A87.5080203@erlang.org> Message-ID: Hi Tim, Thanks for all the useful info! I haven't actually run any tests on this myself, only read some code so far. Obviously the init and terminate hook functions get called before the test server process is even started. In other words, these functions actually execute in that short "evil" window during startup when you can't call pal/2 or log/2. I missed that. :-( Sorry for misleading you! Let me dig into this properly and get back to you when I can propose useful (tested!) solutions to your problems! Best, Peter Ericsson AB, Erlang/OTP On Wed, 14 Aug 2013, Tim Watson wrote: > On 14 August 2013 12:09, Tim Watson wrote: > >> When I execute a test run with this code in place however, I still get the >> crash, though the io:format/2 notice that I'm starting the ct log appears >> first: >> >> Common Test starting (cwd is >> /home/t4/work/vmware/rabbitmq-public-umbrella/rabbitmq-test/multi-node) >> >> starting ct log! >> >> >> ct_util_server got EXIT from <0.61.0>: {noproc, >> {gen_server,call, >> [test_server_io, >> {print,xxxFrom,unexpected_io, >> [[[["
> class=\"default\">*** User 2013-08-14 12:02:36.830 ***"], >> >> "\n", >> >> [91,102,114,97,109,101,119,111, >> 114,107,93,32,119,97,116,99, >> >> 104,100,111,103,58,32,110,111, >> >> 32,112,114,111,99,115,32,116, >> >> 111,32,107,105,108,108,"\n"]], >> "\n","
"]]}, >> infinity]}} >> >> >> So it appears that the assertion that logging will work between the hook's >> init and terminate callbacks isn't quite working. >> >> > Oh and I've tried pausing between the systest_ct_log:start/0 call and the > (latter) systest:reset/0 call that triggers the logging, but that didn't > make any difference either - e.g., like so: > > init(systest, Opts) -> > case application:start(systest, permanent) of > {error, {already_started, systest}} -> io:format("starting ct > log!~n"), > systest_ct_log:start(), > receive > foobar -> ok > after 2000 -> ok > end, > systest:reset(); > {error, _Reason}=Err -> Err; > ok -> ok > end, > From rr@REDACTED Wed Aug 14 16:48:28 2013 From: rr@REDACTED (Rick Reed) Date: Wed, 14 Aug 2013 07:48:28 -0700 Subject: [erlang-bugs] efile_drv & async thread key In-Reply-To: <520B4192.2060003@erlang.org> References: <5209E307.6030806@erlang.org> <5209E8AD.5000208@erlang.org> <520A1DF3.8050708@erlang.org> <520B4192.2060003@erlang.org> Message-ID: Hi Patrik! And you want the requests in the same async queue to enforce ordering per file descriptor or some other reason? It seems like ordering isn't an issue because the ultimately the file calls in erlang are synchronous, and an app would have to enforce ordering itself anyway (we do it by sending all the i/o for a file through a single proc and/or setting our own per-file locks). For the app I'm debugging now, it turns out no scheme that ties the port to a particular thread is going to work. The system is running at the limits of the hardware, and the ports are long-lived. Only perfect distribution of i/o requests over the available threads prevents certain threads from being overloaded and backing up the i/o on the ports that map to it. I've been running a few of the systems overnight with a patch that disables keying in efile_drv. Now I'm getting a nice flat distribution of i/o across the async threads. Unfortunately, it hasn't completely solved my problem, but those systems are doing much better. I'm just wondering if there's some other reason that I'm missing (cache/mem affinity, platform differences, etc.) for having to map file descriptors to particular threads. Thanks for looking into this! Rr On Wed, Aug 14, 2013 at 1:36 AM, Patrik Nyblom wrote: > Hi Rick! > > > On 08/14/2013 02:21 AM, Rick Reed wrote: > > I assume the reason for keying the file requests is to prevent a single > port from > soaking up all the async threads? > > Yes, and it's also important that requests for the same file "descriptor" > end up in she same async queue. So we need to store a fixed key in the file > descriptor structure. > > I think I will hash the pointer to create the key, not just shift away the > "zero-bits", you never know which icky patterns an allocator can create > that will distribute the jobs unevenly. The key will only be calculated > upon opening, so there will be minimal performance hit due to the more > complicated calculations. > > Thanks for reporting - this could cause severe performance issues in > applications! > > Cheers, > Patrik > > > Rr > > > On Tue, Aug 13, 2013 at 4:52 AM, Lukas Larsson wrote: > >> And there it is, conclusive proof that I should not be debugging >> Rickard's code before lunch. >> >> Found the issue, will create a fix for it. As a workaround for R16B you >> can use a prime number as the number of async threads :) >> >> Lukas >> >> >> On 13/08/13 10:05, Lukas Larsson wrote: >> >> Sigh, apparently I spoke too soon. >> >> I remembered incorrectly about the change. It was in R16B that ErlDrvPort >> became a ptr and it was an id before R16B. Anyways, it is odd that the ptr >> is 8 bit aligned on you system. On mine (Ubuntu 13.04, x86_64) the ptrs are >> not aligned and the load is nicely distributed among async threads. If I >> remember correctly you are using FreeBSD on x86_64? I'll check if I can >> reproduce the behavior you are seeing on our FreeBSD machine. >> >> Lukas >> >> On 13/08/13 09:40, Lukas Larsson wrote: >> >> Hello Rick! >> >> Which version of Erlang are you using? From R16B (I think), the >> ErlDrvPort datatype no longer is a pointer to the port struct. Instead it >> is the slot id into the port table and those ids should contain all values. >> I did a quick test on my computer running the latest on maint on github and >> seem to get a full spread over all async threads. >> >> Lukas >> >> On 13/08/13 05:40, Rick Reed wrote: >> >> It looks to me as though there's a bit of a problem in the way >> efile_drv.c generates the >> key that's used to select an async driver queue. It uses the address of >> the port which >> on our system is 8-byte aligned. Meanwhile, erl_async.c does a simple >> mod operation >> with the number of async threads, so the number of threads that can >> actually be used >> by file operations is 1/8th of the number configured. I suspect this >> isn't intended. >> >> Rr >> >> >> >> _______________________________________________ >> erlang-bugs mailing listerlang-bugs@REDACTED://erlang.org/mailman/listinfo/erlang-bugs >> >> >> >> >> _______________________________________________ >> erlang-bugs mailing listerlang-bugs@REDACTED://erlang.org/mailman/listinfo/erlang-bugs >> >> >> >> >> _______________________________________________ >> erlang-bugs mailing listerlang-bugs@REDACTED://erlang.org/mailman/listinfo/erlang-bugs >> >> >> > > > _______________________________________________ > erlang-bugs mailing listerlang-bugs@REDACTED://erlang.org/mailman/listinfo/erlang-bugs > > > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ess@REDACTED Thu Aug 15 15:41:08 2013 From: ess@REDACTED (=?ISO-8859-1?Q?Erik_S=F8e_S=F8rensen?=) Date: Thu, 15 Aug 2013 15:41:08 +0200 Subject: [erlang-bugs] A funny bug In-Reply-To: <7DAE1921-064D-41C5-B9AE-EE4513FDF842@feuerlabs.com> References: <6AD0BB5D-9A64-4641-9F85-CBE14BDEC39E@rogvall.se> <28C886FB-1767-45F7-B2DC-7F796298589A@gmail.com> <614DCCF0-DB18-4848-BCBB-B4891F80D30D@rogvall.se> <266F75A8-3E1A-406F-8877-2477647E6C2B@rogvall.se> <856DDC77-AA97-428B-8C68-BC4DE88618C0@gmail.com> <7DAE1921-064D-41C5-B9AE-EE4513FDF842@feuerlabs.com> Message-ID: <520CDA74.80003@trifork.com> On 02-08-2013 16:57, Ulf Wiger wrote: [snip] > So arguably, a way to parameterize receive *should* be available, and *should* be documented. I'm not saying that prim_eval:'receive'/2 is that very thing that should be documented, but it comes close enough that Erlang wizards like Tony should not only be excused for playing around with it, but should be *expected* to. ;-) An alternative "parameterized receive" method is this: http://polymorphictypist.blogspot.dk/2011/10/dynamic-selective-receive-erlang-hack.html (Disclaimer: self plug) It takes a compiled match spec, like so: dyn_sel_recv:match_spec_receive(CMS, 1000) which is presumably safer than allowing any closure to be called. /Erik From pan@REDACTED Thu Aug 15 18:34:21 2013 From: pan@REDACTED (Patrik Nyblom) Date: Thu, 15 Aug 2013 18:34:21 +0200 Subject: [erlang-bugs] efile_drv & async thread key In-Reply-To: References: <5209E307.6030806@erlang.org> <5209E8AD.5000208@erlang.org> <520A1DF3.8050708@erlang.org> <520B4192.2060003@erlang.org> Message-ID: <520D030D.8070404@erlang.org> Hi Rick! On 08/14/2013 04:48 PM, Rick Reed wrote: > Hi Patrik! > > And you want the requests in the same async queue to enforce ordering per > file descriptor or some other reason? It seems like ordering isn't an > issue > because the ultimately the file calls in erlang are synchronous, and > an app > would have to enforce ordering itself anyway (we do it by sending all the > i/o for a file through a single proc and/or setting our own per-file > locks). > Yes, one example is process exit, where close definitely should not be intermingled with other file operations from other threads that are ongoing. That definitely happens if you round robin the file descriptors. I remember that there has been other situations where the synchronous Erlang interface is not enough, but I can not for my life remember them right now. Anyway, process exit is definitely one example :) > For the app I'm debugging now, it turns out no scheme that ties the > port to > a particular thread is going to work. The system is running at the > limits of > the hardware, and the ports are long-lived. Only perfect distribution > of i/o > requests over the available threads prevents certain threads from being > overloaded and backing up the i/o on the ports that map to it. Well, given the current design, I'm afraid a really good hash is the best I can come up with :( The I/O should be rethought and rewritten once we have dirty schedulers instead of the async threads! > > I've been running a few of the systems overnight with a patch that > disables > keying in efile_drv. Now I'm getting a nice flat distribution of i/o > across the > async threads. Unfortunately, it hasn't completely solved my problem, but > those systems are doing much better. Yes, probably. It is not safe though, especially compressed files in combination with processes getting exit (kill) signals during the file operations may core the VM. With better distribution of the FD's maybe you can get as good results as with the round robin without risks? > > I'm just wondering if there's some other reason that I'm missing > (cache/mem > affinity, platform differences, etc.) for having to map file > descriptors to > particular threads. I don't think it helps caches that much, it's far more threads than cores anyway, so it's bound to generate inter-core communication regardless. > > Thanks for looking into this! Thanks for reporting! > > Rr Cheers, Patrik > > > On Wed, Aug 14, 2013 at 1:36 AM, Patrik Nyblom > wrote: > > Hi Rick! > > > On 08/14/2013 02:21 AM, Rick Reed wrote: >> I assume the reason for keying the file requests is to prevent a >> single port from >> soaking up all the async threads? > Yes, and it's also important that requests for the same file > "descriptor" end up in she same async queue. So we need to store a > fixed key in the file descriptor structure. > > I think I will hash the pointer to create the key, not just shift > away the "zero-bits", you never know which icky patterns an > allocator can create that will distribute the jobs unevenly. The > key will only be calculated upon opening, so there will be minimal > performance hit due to the more complicated calculations. > > Thanks for reporting - this could cause severe performance issues > in applications! > > Cheers, > Patrik > >> >> Rr >> >> >> On Tue, Aug 13, 2013 at 4:52 AM, Lukas Larsson > > wrote: >> >> And there it is, conclusive proof that I should not be >> debugging Rickard's code before lunch. >> >> Found the issue, will create a fix for it. As a workaround >> for R16B you can use a prime number as the number of async >> threads :) >> >> Lukas >> >> >> On 13/08/13 10:05, Lukas Larsson wrote: >>> Sigh, apparently I spoke too soon. >>> >>> I remembered incorrectly about the change. It was in R16B >>> that ErlDrvPort became a ptr and it was an id before R16B. >>> Anyways, it is odd that the ptr is 8 bit aligned on you >>> system. On mine (Ubuntu 13.04, x86_64) the ptrs are not >>> aligned and the load is nicely distributed among async >>> threads. If I remember correctly you are using FreeBSD on >>> x86_64? I'll check if I can reproduce the behavior you are >>> seeing on our FreeBSD machine. >>> >>> Lukas >>> >>> On 13/08/13 09:40, Lukas Larsson wrote: >>>> Hello Rick! >>>> >>>> Which version of Erlang are you using? From R16B (I think), >>>> the ErlDrvPort datatype no longer is a pointer to the port >>>> struct. Instead it is the slot id into the port table and >>>> those ids should contain all values. I did a quick test on >>>> my computer running the latest on maint on github and seem >>>> to get a full spread over all async threads. >>>> >>>> Lukas >>>> >>>> On 13/08/13 05:40, Rick Reed wrote: >>>>> It looks to me as though there's a bit of a problem in the >>>>> way efile_drv.c generates the >>>>> key that's used to select an async driver queue. It uses >>>>> the address of the port which >>>>> on our system is 8-byte aligned. Meanwhile, erl_async.c >>>>> does a simple mod operation >>>>> with the number of async threads, so the number of threads >>>>> that can actually be used >>>>> by file operations is 1/8th of the number configured. I >>>>> suspect this isn't intended. >>>>> >>>>> Rr >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> erlang-bugs mailing list >>>>> erlang-bugs@REDACTED >>>>> http://erlang.org/mailman/listinfo/erlang-bugs >>>> >>>> >>>> >>>> _______________________________________________ >>>> erlang-bugs mailing list >>>> erlang-bugs@REDACTED >>>> http://erlang.org/mailman/listinfo/erlang-bugs >>> >>> >>> >>> _______________________________________________ >>> erlang-bugs mailing list >>> erlang-bugs@REDACTED >>> http://erlang.org/mailman/listinfo/erlang-bugs >> >> >> >> >> _______________________________________________ >> erlang-bugs mailing list >> erlang-bugs@REDACTED >> http://erlang.org/mailman/listinfo/erlang-bugs > > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rr@REDACTED Fri Aug 16 01:23:45 2013 From: rr@REDACTED (Rick Reed) Date: Thu, 15 Aug 2013 16:23:45 -0700 Subject: [erlang-bugs] efile_drv & async thread key In-Reply-To: <520D030D.8070404@erlang.org> References: <5209E307.6030806@erlang.org> <5209E8AD.5000208@erlang.org> <520A1DF3.8050708@erlang.org> <520B4192.2060003@erlang.org> <520D030D.8070404@erlang.org> Message-ID: On Thu, Aug 15, 2013 at 9:34 AM, Patrik Nyblom wrote: > Yes, one example is process exit, where close definitely should not be > intermingled > with other file operations from other threads that are ongoing. That > definitely happens if > you round robin the file descriptors. > Perhaps the close could be enqueued on the descriptor work queue but not issued to the async thread queue until any outstanding ops have finished, though it doesn't look like the descriptor currently keeps track of how many async ops are outstanding. > Yes, probably. It is not safe though, especially compressed files in > combination with > processes getting exit (kill) signals during the file operations may core > the VM. > In our case, the file procs run forever, so I think our risk will be low on this particular system. I've enabled the behavior via an env var, so we won't be running this on the rest of our systems. With better distribution of the FD's maybe you can get as good results as > with > the round robin without risks? > Unfortunately not. The only way to ensure that there wouldn't a noticeable difference in load between different async threads would be to either have far too many or far too few async threads. Rr -------------- next part -------------- An HTML attachment was scrubbed... URL: From yosh@REDACTED Thu Aug 15 20:43:51 2013 From: yosh@REDACTED (Manish Singh) Date: Thu, 15 Aug 2013 11:43:51 -0700 Subject: [erlang-bugs] file:pread broken with GCC 4.8 Message-ID: I've also run into this problem: http://erlang.org/pipermail/erlang-bugs/2013-July/003674.html At first I thought it was a gcc bug, but http://gcc.gnu.org/bugs/#reportsays "if compiling with -fno-strict-aliasing -fwrapv -fno-aggressive-loop-optimizations makes a difference, your code probably is not correct." Compiling efile_drv.c with -fno-aggressive-loop-optimizations makes the problem go away. With -Wextra, there are warnings about signed/unsigned comparisons, which might be causing this: drivers/common/efile_drv.c:3749:14: note: in expansion of macro ?EV_GET_UINT64? if ( !EV_GET_UINT64(ev, &d->c.preadv.offsets[i-1], &p, &q) ^ drivers/common/efile_drv.c:590:30: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] *(pp) = ( *(pp)+8 < (ev)->iov[*(qp)].iov_len \ ^ drivers/common/efile_drv.c:3749:14: note: in expansion of macro ?EV_GET_UINT64? if ( !EV_GET_UINT64(ev, &d->c.preadv.offsets[i-1], &p, &q) ^ drivers/common/efile_drv.c:564:14: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] (*(pp)+4 <= (ev)->iov[*(qp)].iov_len \ ^ drivers/common/efile_drv.c:3750:7: note: in expansion of macro ?EV_GET_UINT32? || !EV_GET_UINT32(ev, &sizeH, &p, &q) ^ drivers/common/efile_drv.c:569:30: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] *(pp) = ( *(pp)+4 < (ev)->iov[*(qp)].iov_len \ ^ drivers/common/efile_drv.c:3750:7: note: in expansion of macro ?EV_GET_UINT32? || !EV_GET_UINT32(ev, &sizeH, &p, &q) ^ drivers/common/efile_drv.c:564:14: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] (*(pp)+4 <= (ev)->iov[*(qp)].iov_len \ ^ drivers/common/efile_drv.c:3751:7: note: in expansion of macro ?EV_GET_UINT32? || !EV_GET_UINT32(ev, &sizeL, &p, &q)) { ^ drivers/common/efile_drv.c:569:30: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] *(pp) = ( *(pp)+4 < (ev)->iov[*(qp)].iov_len \ ^ drivers/common/efile_drv.c:3751:7: note: in expansion of macro ?EV_GET_UINT32? || !EV_GET_UINT32(ev, &sizeL, &p, &q)) { ^ drivers/common/efile_drv.c:581:14: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] (*(pp)+8 <= (ev)->iov[*(qp)].iov_len \ -Manish -------------- next part -------------- An HTML attachment was scrubbed... URL: From watson.timothy@REDACTED Fri Aug 16 11:13:22 2013 From: watson.timothy@REDACTED (Tim Watson) Date: Fri, 16 Aug 2013 10:13:22 +0100 Subject: [erlang-bugs] efile_drv & async thread key In-Reply-To: <520A1DF3.8050708@erlang.org> References: <5209E307.6030806@erlang.org> <5209E8AD.5000208@erlang.org> <520A1DF3.8050708@erlang.org> Message-ID: <9264866B-C09E-40F1-A179-29D8D55B5326@gmail.com> On 13 Aug 2013, at 12:52, Lukas Larsson wrote: > And there it is, conclusive proof that I should not be debugging Rickard's code before lunch. > > Found the issue, will create a fix for it. As a workaround for R16B you can use a prime number as the number of async threads :) > Hi Lukas, Does this issue only affect R16B, or all versions >= R16B? Cheers, Tim From lukas@REDACTED Fri Aug 16 11:13:45 2013 From: lukas@REDACTED (Lukas Larsson) Date: Fri, 16 Aug 2013 11:13:45 +0200 Subject: [erlang-bugs] file:pread broken with GCC 4.8 In-Reply-To: References: Message-ID: <520DED49.8020603@erlang.org> Hello Manish, Thanks for reporting this again and digging into it a little deeper. I've created a fix which solves the problem as seen by Tomas and will include it in the R16B02 release. I'll be testing the fix over the weekend and hopefully it will be visible in maint on github by early next week. Lukas On 15/08/13 20:43, Manish Singh wrote: > I've also run into this problem: > > http://erlang.org/pipermail/erlang-bugs/2013-July/003674.html > > At first I thought it was a gcc bug, but > http://gcc.gnu.org/bugs/#report says "if compiling with > -fno-strict-aliasing -fwrapv -fno-aggressive-loop-optimizations makes > a difference, your code probably is not correct." Compiling > efile_drv.c with -fno-aggressive-loop-optimizations makes the problem > go away. > > With -Wextra, there are warnings about signed/unsigned comparisons, > which might be causing this: > > drivers/common/efile_drv.c:3749:14: note: in expansion of macro > 'EV_GET_UINT64' > if ( !EV_GET_UINT64(ev, &d->c.preadv.offsets[i-1], &p, &q) > ^ > drivers/common/efile_drv.c:590:30: warning: comparison between signed > and unsigned integer expressions [-Wsign-compare] > *(pp) = ( *(pp)+8 < (ev)->iov[*(qp)].iov_len \ > ^ > drivers/common/efile_drv.c:3749:14: note: in expansion of macro > 'EV_GET_UINT64' > if ( !EV_GET_UINT64(ev, &d->c.preadv.offsets[i-1], &p, &q) > ^ > drivers/common/efile_drv.c:564:14: warning: comparison between signed > and unsigned integer expressions [-Wsign-compare] > (*(pp)+4 <= (ev)->iov[*(qp)].iov_len \ > ^ > drivers/common/efile_drv.c:3750:7: note: in expansion of macro > 'EV_GET_UINT32' > || !EV_GET_UINT32(ev, &sizeH, &p, &q) > ^ > drivers/common/efile_drv.c:569:30: warning: comparison between signed > and unsigned integer expressions [-Wsign-compare] > *(pp) = ( *(pp)+4 < (ev)->iov[*(qp)].iov_len \ > ^ > drivers/common/efile_drv.c:3750:7: note: in expansion of macro > 'EV_GET_UINT32' > || !EV_GET_UINT32(ev, &sizeH, &p, &q) > ^ > drivers/common/efile_drv.c:564:14: warning: comparison between signed > and unsigned integer expressions [-Wsign-compare] > (*(pp)+4 <= (ev)->iov[*(qp)].iov_len \ > ^ > drivers/common/efile_drv.c:3751:7: note: in expansion of macro > 'EV_GET_UINT32' > || !EV_GET_UINT32(ev, &sizeL, &p, &q)) { > ^ > drivers/common/efile_drv.c:569:30: warning: comparison between signed > and unsigned integer expressions [-Wsign-compare] > *(pp) = ( *(pp)+4 < (ev)->iov[*(qp)].iov_len \ > ^ > drivers/common/efile_drv.c:3751:7: note: in expansion of macro > 'EV_GET_UINT32' > || !EV_GET_UINT32(ev, &sizeL, &p, &q)) { > ^ > drivers/common/efile_drv.c:581:14: warning: comparison between signed > and unsigned integer expressions [-Wsign-compare] > (*(pp)+8 <= (ev)->iov[*(qp)].iov_len \ > > -Manish > > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs -------------- next part -------------- An HTML attachment was scrubbed... URL: From lukas@REDACTED Fri Aug 16 11:16:08 2013 From: lukas@REDACTED (Lukas Larsson) Date: Fri, 16 Aug 2013 11:16:08 +0200 Subject: [erlang-bugs] efile_drv & async thread key In-Reply-To: <9264866B-C09E-40F1-A179-29D8D55B5326@gmail.com> References: <5209E307.6030806@erlang.org> <5209E8AD.5000208@erlang.org> <520A1DF3.8050708@erlang.org> <9264866B-C09E-40F1-A179-29D8D55B5326@gmail.com> Message-ID: <520DEDD8.30301@erlang.org> It affects both R16B and R16B01. Lukas On 16/08/13 11:13, Tim Watson wrote: > On 13 Aug 2013, at 12:52, Lukas Larsson wrote: >> And there it is, conclusive proof that I should not be debugging Rickard's code before lunch. >> >> Found the issue, will create a fix for it. As a workaround for R16B you can use a prime number as the number of async threads :) >> > Hi Lukas, > > Does this issue only affect R16B, or all versions >= R16B? > > Cheers, > Tim > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs > From watson.timothy@REDACTED Fri Aug 16 11:16:43 2013 From: watson.timothy@REDACTED (Tim Watson) Date: Fri, 16 Aug 2013 10:16:43 +0100 Subject: [erlang-bugs] efile_drv & async thread key In-Reply-To: <520DEDD8.30301@erlang.org> References: <5209E307.6030806@erlang.org> <5209E8AD.5000208@erlang.org> <520A1DF3.8050708@erlang.org> <9264866B-C09E-40F1-A179-29D8D55B5326@gmail.com> <520DEDD8.30301@erlang.org> Message-ID: <0DC18C53-8B3E-4A50-A9F0-FE6587AE055F@gmail.com> Ok, thanks for the confirmation. Cheers, Tim On 16 Aug 2013, at 10:16, Lukas Larsson wrote: > It affects both R16B and R16B01. > > Lukas > > On 16/08/13 11:13, Tim Watson wrote: >> On 13 Aug 2013, at 12:52, Lukas Larsson wrote: >>> And there it is, conclusive proof that I should not be debugging Rickard's code before lunch. >>> >>> Found the issue, will create a fix for it. As a workaround for R16B you can use a prime number as the number of async threads :) >>> >> Hi Lukas, >> >> Does this issue only affect R16B, or all versions >= R16B? >> >> Cheers, >> Tim >> >> _______________________________________________ >> erlang-bugs mailing list >> erlang-bugs@REDACTED >> http://erlang.org/mailman/listinfo/erlang-bugs >> > From hans.bolinder@REDACTED Wed Aug 21 10:46:49 2013 From: hans.bolinder@REDACTED (Hans Bolinder) Date: Wed, 21 Aug 2013 08:46:49 +0000 Subject: [erlang-bugs] A dets big ? In-Reply-To: References: Message-ID: <56466BD70414EA48969B4064696CF28C081199@ESESSMB207.ericsson.se> Hi, [Manuel Dur?n Aguete:] > After upgrading a project from R14B03 to R16B01 I've found that dets > files are growing constanly after delete operations. In previous > version the empty space was reused to allocate new data, after R16B > seems that empty space isn't reused. > > I've uploaded a test case to github: http://kcy.me/oz1s Thank you for the excellent bug report. A fix should appear on the 'maint' branch soon. Best regards, Hans Bolinder, Erlang/OTP team, Ericsson -------------- next part -------------- An HTML attachment was scrubbed... URL: From hans.bolinder@REDACTED Thu Aug 22 08:34:12 2013 From: hans.bolinder@REDACTED (Hans Bolinder) Date: Thu, 22 Aug 2013 06:34:12 +0000 Subject: [erlang-bugs] dialyzer false positive io_lib:fread In-Reply-To: <5206BC95.9060902@cs.ntua.gr> References: ,<5206BC95.9060902@cs.ntua.gr> Message-ID: <56466BD70414EA48969B4064696CF28C0812CF@ESESSMB207.ericsson.se> Hi, [Chris King:] > dialyzer produces a false positive when analyzing io_lib:fread with a ~a > argument ? it believes (erroneously) that the parsed value will be a > string, when in fact it will be an atom. This does not occur with > io:fread, or with io_lib:fread with an integer argument. [Kostis:] > The behaviour you are experiencing is a side-effect of the type and spec > declarations that exist in modules io_lib and io_lib_fread Thanks for the bug report. I've corrected the specs of io_lib:fread(). The fix should appear on the 'maint' branch soon. Best regards, Hans Bolinder, Erlang OTP team, Ericsson From essen@REDACTED Thu Aug 22 10:16:12 2013 From: essen@REDACTED (=?ISO-8859-1?Q?Lo=EFc_Hoguin?=) Date: Thu, 22 Aug 2013 10:16:12 +0200 Subject: [erlang-bugs] ssl:negotiated_next_protocol/1 client bug? Message-ID: <5215C8CC.4010200@ninenines.eu> Hello, I can't get ssl:negotiated_next_protocol/1 to work from the client side. As a result I can't really know what protocol to use once the connection is established. Example: 1> ssl:start(). ok 2> {ok, S} = ssl:connect("twitter.com", 443, [binary, {active, false}, {client_preferred_next_protocols, client, [<<"spdy/3">>, <<"http/1.1">>], <<"http/1.1">>}]). {ok,{sslsocket,{gen_tcp,#Port<0.1088>},<0.52.0>}} 3> ssl:negotiated_next_protocol(S). {error,next_protocol_not_negotiated} It says that the protocol hasn't been negotiated. But it actually has been as can be demonstrated below. 4> ssl:send(S, [<<128,3,0,1,1,0,0,81,0,0,0,1,0,0,0,0,0,0>>, 4> [<<120,187,227,198,167,194,2,101,37,80,122,180,66,164,90,119,215,16,176,72, 4> 49,176,236,203,5,23,144,25,37,37,5,160,244,203,106,5,45,54,184,75,202,51, 4> 75,128,113,163,151,12,46,77,88,173,10,18,193,229,24,163,62,40,65,91,97, 4> 73,220,0,0,0,0,255,255>>]]). ok 5> ssl:recv(S, 0). {ok,<<128,3,0,4,0,0,0,12,0,0,0,1,0,0,0,4,0,0,0,100>>} This is SPDY working just fine. The same can be done against an Erlang server that uses ssl:negotiated_next_protocol/1 where it works, it only fails on the client side. Bug? -- Lo?c Hoguin Erlang Cowboy Nine Nines http://ninenines.eu From robert.virding@REDACTED Fri Aug 23 12:24:17 2013 From: robert.virding@REDACTED (Robert Virding) Date: Fri, 23 Aug 2013 11:24:17 +0100 (BST) Subject: [erlang-bugs] A funny bug In-Reply-To: <520CDA74.80003@trifork.com> References: <6AD0BB5D-9A64-4641-9F85-CBE14BDEC39E@rogvall.se> <28C886FB-1767-45F7-B2DC-7F796298589A@gmail.com> <614DCCF0-DB18-4848-BCBB-B4891F80D30D@rogvall.se> <266F75A8-3E1A-406F-8877-2477647E6C2B@rogvall.se> <856DDC77-AA97-428B-8C68-BC4DE88618C0@gmail.com> <7DAE1921-064D-41C5-B9AE-EE4513FDF842@feuerlabs.com> <520CDA74.80003@trifork.com> Message-ID: <2069032907.12270099.1377253457268.JavaMail.zimbra@erlang-solutions.com> Most definitely safer! An interesting question is what happens if you do a receive in the closure. Do you see or not see the "current" message? Does the original receive see any messages removed by the closure? Etc If it is possible to do it then people will do it irrespective if you tell them not to. And they will complain if the undocumented behaviour is changed. :-) So if a parametrized receive is added then something like a MS is to be preferred to a general closure. Robert ----- Original Message ----- > From: "Erik S?e S?rensen" > To: erlang-bugs@REDACTED > Sent: Thursday, 15 August, 2013 3:41:08 PM > Subject: Re: [erlang-bugs] A funny bug > > On 02-08-2013 16:57, Ulf Wiger wrote: > [snip] > > So arguably, a way to parameterize receive *should* be available, and > > *should* be documented. I'm not saying that prim_eval:'receive'/2 is that > > very thing that should be documented, but it comes close enough that > > Erlang wizards like Tony should not only be excused for playing around > > with it, but should be *expected* to. ;-) > An alternative "parameterized receive" method is this: > > http://polymorphictypist.blogspot.dk/2011/10/dynamic-selective-receive-erlang-hack.html > > (Disclaimer: self plug) > > It takes a compiled match spec, like so: > > dyn_sel_recv:match_spec_receive(CMS, 1000) > > which is presumably safer than allowing any closure to be called. > > /Erik > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs > From peppe@REDACTED Fri Aug 23 17:43:35 2013 From: peppe@REDACTED (Peter Andersson) Date: Fri, 23 Aug 2013 17:43:35 +0200 Subject: [erlang-bugs] common_test + test_server_io errors In-Reply-To: References: <5A095356-16E8-4224-BA03-AA58C30FC28D@gmail.com> <51CC4E35.8090408@erlang.org> <3B4CC6D8-4CB5-4236-AEDC-CBAF51A9E7AE@gmail.com> <51DFB044.50302@erlang.org> <52011A87.5080203@erlang.org> Message-ID: <52178327.4010003@erlang.org> Hi Tim, Ok, I have something for you now that will hopefully work. Please test it as soon as you can and get back to me! I've modified both Common Test and Test Server and you'll find the changes in this branch: git://github.com/peppe-erlang/otp.git peppe/common_test/cth_ctrl The idea now is that you start Common Test using a hook like the example module I've attached (cth_ctrl), e.g: ct_run -pa $PWD -logdir ./logs -ct_hooks cth_ctrl -suite dummy_SUITE.erl This "pauses" Common Test immediately after startup, in the hook init function, and logging is enabled at that point (which didn't work before). The example hook spawns a process that calls ct:pal/2 and error_logger:error_report/1 in a loop to verify this. When you're done with your startup operations, you call a proceed function to start the test run. When the tests are done you get paused again, this time in the hook terminate function, with logging still enabled. When your teardown operations are finished, you call the proceed function again and Common Test terminates. Here's how logging works (which I mean to document properly in the User's Guide before the upcoming release): All printouts with ct:log/2 or ct:pal/2, or any error/progress reports that happen in the pre-test phase are saved in a log file which you find a link to on the CT Framework Log page. Similarly, all printouts/reports that happen in the post-test phase, are saved in the same log file, and you get a link to this section also on the CT Framework Log page. When tests run, printouts and reports are saved as usual in the test case log files, or in the Unexpected I/O Log (for any printouts that can't be associated to a particular test case). I hope this solution works for you. Please get back to me with comments and questions! Best regards, Peter Ericsson AB, Erlang/OTP Tim Watson wrote: > Hi Peter, > > Ok that's great - thanks for your assistance! > > Cheers, > Tim > > > > On 14 August 2013 12:36, Peter Andersson > > wrote: > > > Hi Tim, > > Thanks for all the useful info! > > I haven't actually run any tests on this myself, only read some > code so far. Obviously the init and terminate hook functions get > called before the test server process is even started. In other > words, these functions actually execute in that short "evil" > window during startup when you can't call pal/2 or log/2. I missed > that. :-( Sorry for misleading you! > > Let me dig into this properly and get back to you when I can > propose useful (tested!) solutions to your problems! > > Best, > Peter > > Ericsson AB, Erlang/OTP > > > > On Wed, 14 Aug 2013, Tim Watson wrote: > > On 14 August 2013 12:09, Tim Watson > wrote: > > When I execute a test run with this code in place however, > I still get the > crash, though the io:format/2 notice that I'm starting the > ct log appears > first: > > Common Test starting (cwd is > /home/t4/work/vmware/rabbitmq-public-umbrella/rabbitmq-test/multi-node) > > starting ct log! > > > ct_util_server got EXIT from <0.61.0>: {noproc, > {gen_server,call, > [test_server_io, > > {print,xxxFrom,unexpected_io, > [[[["
class=\"default\">*** User 2013-08-14 12:02:36.830 > ***"], > > "\n", > > [91,102,114,97,109,101,119,111, > > 114,107,93,32,119,97,116,99, > > 104,100,111,103,58,32,110,111, > > 32,112,114,111,99,115,32,116, > > 111,32,107,105,108,108,"\n"]], > "\n","
"]]}, > infinity]}} > > > So it appears that the assertion that logging will work > between the hook's > init and terminate callbacks isn't quite working. > > > Oh and I've tried pausing between the systest_ct_log:start/0 > call and the > (latter) systest:reset/0 call that triggers the logging, but > that didn't > make any difference either - e.g., like so: > > init(systest, Opts) -> > case application:start(systest, permanent) of > {error, {already_started, systest}} -> > io:format("starting ct > log!~n"), > > systest_ct_log:start(), > receive > foobar -> ok > after 2000 -> ok > end, > systest:reset(); > {error, _Reason}=Err -> Err; > ok -> ok > end, > > > ------------------------------------------------------------------------ > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs > -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: cth_ctrl.erl URL: From tuncer.ayaz@REDACTED Mon Aug 26 13:25:41 2013 From: tuncer.ayaz@REDACTED (Tuncer Ayaz) Date: Mon, 26 Aug 2013 13:25:41 +0200 Subject: [erlang-bugs] erlang.el mis-indents whole-buffer selection Message-ID: Previously it was just a certain[1] function in rebar.erl which got mis-indented when you did a whole-buffer indent, but now there's also a second[2] function which gets mis-indented. In both cases indenting the function itself separately works, and the bug happens if you select the whole buffer and indent that with erlang.el (C-x C-q). I'm using Emacs 24.3.1 with latest erlang.el from maint. Is it possible to fix this in the existing indenter? [1] https://github.com/rebar/rebar/blob/620c4b01c/src/rebar.erl#L318-L365 [2] https://github.com/rebar/rebar/blob/620c4b01c/src/rebar_erlc_compiler.erl#L92-L112 From tuncer.ayaz@REDACTED Mon Aug 26 13:34:09 2013 From: tuncer.ayaz@REDACTED (Tuncer Ayaz) Date: Mon, 26 Aug 2013 13:34:09 +0200 Subject: [erlang-bugs] erlang.el mis-indents whole-buffer selection In-Reply-To: References: Message-ID: On Mon, Aug 26, 2013 at 1:25 PM, Tuncer Ayaz wrote: > Previously it was just a certain[1] function in rebar.erl which got > mis-indented when you did a whole-buffer indent, but now there's also > a second[2] function which gets mis-indented. > > In both cases indenting the function itself separately works, and the > bug happens if you select the whole buffer and indent that with > erlang.el (C-x C-q). Sorry, that should of course be (C-c C-q) instead. > I'm using Emacs 24.3.1 with latest erlang.el from maint. > > Is it possible to fix this in the existing indenter? > > [1] https://github.com/rebar/rebar/blob/620c4b01c/src/rebar.erl#L318-L365 > [2] https://github.com/rebar/rebar/blob/620c4b01c/src/rebar_erlc_compiler.erl#L92-L112 From watson.timothy@REDACTED Tue Aug 27 12:20:06 2013 From: watson.timothy@REDACTED (Tim Watson) Date: Tue, 27 Aug 2013 11:20:06 +0100 Subject: [erlang-bugs] common_test + test_server_io errors In-Reply-To: <52178327.4010003@erlang.org> References: <5A095356-16E8-4224-BA03-AA58C30FC28D@gmail.com> <51CC4E35.8090408@erlang.org> <3B4CC6D8-4CB5-4236-AEDC-CBAF51A9E7AE@gmail.com> <51DFB044.50302@erlang.org> <52011A87.5080203@erlang.org> <52178327.4010003@erlang.org> Message-ID: <49873C2D-69C7-4BA1-85D6-C02CE302C032@gmail.com> On 23 Aug 2013, at 16:43, Peter Andersson wrote: > > Ok, I have something for you now that will hopefully work. Please test > it as soon as you can and get back to me! > Hi Peter! Thanks for this - I'll get it tested this afternoon. > I've modified both Common Test and Test Server and you'll find the > changes in this branch: > > git://github.com/peppe-erlang/otp.git peppe/common_test/cth_ctrl > [snip] > Here's how logging works (which I mean to document properly in the > User's Guide before the upcoming release): All printouts with ct:log/2 > or ct:pal/2, or any error/progress reports that happen in the pre-test > phase are saved in a log file which you find a link to on the CT > Framework Log page. Similarly, all printouts/reports that happen in the > post-test phase, are saved in the same log file, and you get a link to > this section also on the CT Framework Log page. > Great. > When tests run, printouts and reports are saved as usual in the test > case log files, or in the Unexpected I/O Log (for any printouts that > can't be associated to a particular test case). > > I hope this solution works for you. Please get back to me with comments > and questions! > Will do. Thanks again for looking at this. I'll get back to you as soon as I've had a chance to test it. Cheers, Tim From magnus@REDACTED Wed Aug 28 17:05:29 2013 From: magnus@REDACTED (Magnus Henoch) Date: Wed, 28 Aug 2013 16:05:29 +0100 Subject: [erlang-bugs] erlang.el mis-indents whole-buffer selection In-Reply-To: (Tuncer Ayaz's message of "Mon, 26 Aug 2013 13:25:41 +0200") References: Message-ID: Tuncer Ayaz writes: > Previously it was just a certain[1] function in rebar.erl which got > mis-indented when you did a whole-buffer indent, but now there's also > a second[2] function which gets mis-indented. > > In both cases indenting the function itself separately works, and the > bug happens if you select the whole buffer and indent that with > erlang.el (C-x C-q). > > I'm using Emacs 24.3.1 with latest erlang.el from maint. > > Is it possible to fix this in the existing indenter? > > [1] https://github.com/rebar/rebar/blob/620c4b01c/src/rebar.erl#L318-L365 > [2] https://github.com/rebar/rebar/blob/620c4b01c/src/rebar_erlc_compiler.erl#L92-L112 It seems like this happens when the code being indented has not yet been made visible, and thus lazy syntax highlighting has not yet happened. The syntax table for erlang-mode is unable to handle some combinations of characters; in particular, when a string ends with a dollar sign (as in "foo$"), the dollar sign plus the double quote would be treated as a character constant were it not for some magic regexps in font-lock-syntactic-keywords. Manually scrolling through the buffer before reindenting seems to make the problem go away. This is less than satisfactory, of course. Looking up online help for font-lock-syntactic-keywords in a modern Emacs gives: This variable is obsolete since 24.1; use `syntax-propertize-function' instead. And the NEWS file for Emacs 24.1 contains: *** New variable `syntax-propertize-function'. This replaces `font-lock-syntactic-keywords' which is now obsolete. This allows syntax-table properties to be set independently from font-lock: just call syntax-propertize to make sure the text is propertized. Together with this new variable come a new hook syntax-propertize-extend-region-functions, as well as two helper functions: syntax-propertize-via-font-lock to reuse old font-lock-syntactic-keywords as-is; and syntax-propertize-rules which provides a new way to specify syntactic rules. This sounds like the right way to solve the problem, though of course you won't know until you try... Regards, Magnus From glorybox.away@REDACTED Wed Aug 28 20:30:21 2013 From: glorybox.away@REDACTED (Sergey Sinkovskiy) Date: Wed, 28 Aug 2013 21:30:21 +0300 Subject: [erlang-bugs] [inets] httpc cookie parsing Message-ID: Some servers send empty Set-Cookie header, which leads to process crash with following stacktrace: {function_clause, [{string,substr, [[],1,-1], [{file,"string.erl"},{line,207}]}, {httpc_cookie,parse_set_cookie,2, [{file,"httpc_cookie.erl"},{line,347}]}, {httpc_cookie,'-parse_set_cookies/2-lc$^1/1-1-',2, [{file,"httpc_cookie.erl"},{line,339}]}, {httpc_cookie,cookies,3, [{file,"httpc_cookie.erl"},{line,202}]}, {httpc_handler,handle_cookies,4, [{file,"httpc_handler.erl"},{line,1250}]}, {httpc_handler,handle_response,1, [{file,"httpc_handler.erl"},{line,1186}]}, {gen_server,handle_msg,5, [{file,"gen_server.erl"},{line,604}]}, {proc_lib,init_p_do_apply,3, [{file,"proc_lib.erl"},{line,239}]}]}, RFC doesn't allow header to be empty, so this isn't a bug in inets. Could such headers be just skipped from parsing? -- Sergey Sinkovsky -------------- next part -------------- An HTML attachment was scrubbed... URL: From roberto.aloi@REDACTED Thu Aug 29 10:22:15 2013 From: roberto.aloi@REDACTED (Roberto Aloi) Date: Thu, 29 Aug 2013 10:22:15 +0200 (CEST) Subject: [erlang-bugs] Potential issue with Erlang CT when ct:pal/X is called from a config callback module In-Reply-To: <1044738081.66436.1377764355546.JavaMail.zimbra@erlang-solutions.com> Message-ID: <1576174141.66515.1377764535200.JavaMail.zimbra@erlang-solutions.com> Hi all, I might have encountered a tiny issue with the CT logging facilities in R15B01. Details here: https://gist.github.com/robertoaloi/5884093 Is the R15B03 behaviour expected? Kind regards, Roberto Aloi --- Erlang Solutions Ltd. www.erlang-solutions.com From watson.timothy@REDACTED Thu Aug 29 12:31:49 2013 From: watson.timothy@REDACTED (Tim Watson) Date: Thu, 29 Aug 2013 11:31:49 +0100 Subject: [erlang-bugs] common_test + test_server_io errors In-Reply-To: <49873C2D-69C7-4BA1-85D6-C02CE302C032@gmail.com> References: <5A095356-16E8-4224-BA03-AA58C30FC28D@gmail.com> <51CC4E35.8090408@erlang.org> <3B4CC6D8-4CB5-4236-AEDC-CBAF51A9E7AE@gmail.com> <51DFB044.50302@erlang.org> <52011A87.5080203@erlang.org> <52178327.4010003@erlang.org> <49873C2D-69C7-4BA1-85D6-C02CE302C032@gmail.com> Message-ID: Hi Peter, This works perfectly well for me. In fact, I can even skip the whole cth_ctrl since the pre/post logging appears separately in the HTML logs, which is good enough for my use case. Thanks very much for getting this sorted! Cheers, Tim On 27 Aug 2013, at 11:20, Tim Watson wrote: > On 23 Aug 2013, at 16:43, Peter Andersson wrote: >> >> Ok, I have something for you now that will hopefully work. Please test >> it as soon as you can and get back to me! >> > > Hi Peter! Thanks for this - I'll get it tested this afternoon. > >> I've modified both Common Test and Test Server and you'll find the >> changes in this branch: >> >> git://github.com/peppe-erlang/otp.git peppe/common_test/cth_ctrl >> > > [snip] > >> Here's how logging works (which I mean to document properly in the >> User's Guide before the upcoming release): All printouts with ct:log/2 >> or ct:pal/2, or any error/progress reports that happen in the pre-test >> phase are saved in a log file which you find a link to on the CT >> Framework Log page. Similarly, all printouts/reports that happen in the >> post-test phase, are saved in the same log file, and you get a link to >> this section also on the CT Framework Log page. >> > > Great. > >> When tests run, printouts and reports are saved as usual in the test >> case log files, or in the Unexpected I/O Log (for any printouts that >> can't be associated to a particular test case). >> >> I hope this solution works for you. Please get back to me with comments >> and questions! >> > > Will do. Thanks again for looking at this. I'll get back to you as soon as I've had a chance to test it. > > Cheers, > Tim From peppe@REDACTED Thu Aug 29 12:46:52 2013 From: peppe@REDACTED (Peter Andersson) Date: Thu, 29 Aug 2013 12:46:52 +0200 Subject: [erlang-bugs] Potential issue with Erlang CT when ct:pal/X is called from a config callback module In-Reply-To: <1576174141.66515.1377764535200.JavaMail.zimbra@erlang-solutions.com> References: <1576174141.66515.1377764535200.JavaMail.zimbra@erlang-solutions.com> Message-ID: Hi Roberto, This is because of modifications to the logging system in CT that we introduced in R15B03. It's not an expected behaviour, rather one we regard as a bug. It exists in R16B01 as well and has been reported previously. We have fixed it already for the upcoming R16B02 release, which is being released in Sep. Best regards, Peter Ericsson AB, Erlang/OTP On Thu, 29 Aug 2013, Roberto Aloi wrote: > Hi all, > > I might have encountered a tiny issue with the CT logging facilities in R15B01. > Details here: > > https://gist.github.com/robertoaloi/5884093 > > Is the R15B03 behaviour expected? > > Kind regards, > > Roberto Aloi > --- > Erlang Solutions Ltd. > www.erlang-solutions.com > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs > From peppe@REDACTED Fri Aug 30 09:29:52 2013 From: peppe@REDACTED (Peter Andersson) Date: Fri, 30 Aug 2013 09:29:52 +0200 Subject: [erlang-bugs] common_test + test_server_io errors In-Reply-To: References: <5A095356-16E8-4224-BA03-AA58C30FC28D@gmail.com> <51CC4E35.8090408@erlang.org> <3B4CC6D8-4CB5-4236-AEDC-CBAF51A9E7AE@gmail.com> <51DFB044.50302@erlang.org> <52011A87.5080203@erlang.org> <52178327.4010003@erlang.org> <49873C2D-69C7-4BA1-85D6-C02CE302C032@gmail.com> Message-ID: Hi Tim, That's good news, thanks for letting me know! I will wrap this up then and release it with R16B02. Cheers, Peter On Thu, 29 Aug 2013, Tim Watson wrote: > Hi Peter, > > This works perfectly well for me. In fact, I can even skip the whole cth_ctrl since the pre/post logging appears separately in the HTML logs, which is good enough for my use case. Thanks very much for getting this sorted! > > Cheers, > Tim > > On 27 Aug 2013, at 11:20, Tim Watson wrote: > >> On 23 Aug 2013, at 16:43, Peter Andersson wrote: >>> >>> Ok, I have something for you now that will hopefully work. Please test >>> it as soon as you can and get back to me! >>> >> >> Hi Peter! Thanks for this - I'll get it tested this afternoon. >> >>> I've modified both Common Test and Test Server and you'll find the >>> changes in this branch: >>> >>> git://github.com/peppe-erlang/otp.git peppe/common_test/cth_ctrl >>> >> >> [snip] >> >>> Here's how logging works (which I mean to document properly in the >>> User's Guide before the upcoming release): All printouts with ct:log/2 >>> or ct:pal/2, or any error/progress reports that happen in the pre-test >>> phase are saved in a log file which you find a link to on the CT >>> Framework Log page. Similarly, all printouts/reports that happen in the >>> post-test phase, are saved in the same log file, and you get a link to >>> this section also on the CT Framework Log page. >>> >> >> Great. >> >>> When tests run, printouts and reports are saved as usual in the test >>> case log files, or in the Unexpected I/O Log (for any printouts that >>> can't be associated to a particular test case). >>> >>> I hope this solution works for you. Please get back to me with comments >>> and questions! >>> >> >> Will do. Thanks again for looking at this. I'll get back to you as soon as I've had a chance to test it. >> >> Cheers, >> Tim > >