From egil@REDACTED  Thu Aug  1 19:59:19 2013
From: egil@REDACTED (=?ISO-8859-1?Q?Bj=F6rn-Egil_Dahlberg?=)
Date: Thu, 1 Aug 2013 19:59:19 +0200
Subject: [erlang-bugs] R16B01's monitor delivery is broken?
In-Reply-To: <CAMjYFoOeWYOoO1w4Wx11td5FrOMp5xssO-nZHihCvphix6z_Aw@mail.gmail.com>
References: <15464.1374612273@snookles.snookles.com>
 <CAMjYFoOeWYOoO1w4Wx11td5FrOMp5xssO-nZHihCvphix6z_Aw@mail.gmail.com>
Message-ID: <51FAA1F7.8000704@erlang.org>


We have confirmed that this problem indeed exists and we think we 
understand what is happening.

The problem has a low probability of occurring, though obviously 
reproducible, and pretty serious if it occurs.

I won't go into too much details, but the uniqueness of the process 
identifier can be compromised, i.e. it will not be unique. In essence a 
process might get an identifier of an already terminated process (or an 
already living one though I haven't confirmed that), the mapping is then 
overwritten, and by inspecting this identifier it will look dead. 
Signals or messages will not be sent since it is "dead" or sent to an 
unsuspecting (wrong) process. The mappings of id's and process pointers 
has become inconsistent. It's a bit more complicated than that but in a 
nutshell that's what's happening.

What is needed for this to occur? A wrapping of the entire 
"free-list-ring" of identifiers (size of max processes) while one thread 
is in progress of doing an atomic read, some shift and masking, and then 
a write for creating an identifier. *Highly unlikely* but definitely a 
race. I.e. while one thread is doing a read, shift/mask, and write to 
memory the other threads has to create and terminate 262144 processes 
(or whatever the limit is set to, but that is the default)

If the thread is scheduled out by the OS, or a hyperthread switch occurs 
because of a mem-stall (we're dealing with membarriers here after all so 
it might be a thing) between the read and write the likelihood of an 
incident increases. Also, by lowering max-process-limit in the system 
the likelihood increases.

We think we have a solution for this and initial tests show no evidence 
of uniqueness problem after the fix. I think we will have a fix out in 
maint next week.

Using R16B01 together with the "+P legacy" is a workaround for this 
issue. The legacy option uses the old way and does not suffer from this 
problem.

Thank you Scott, and to the rest of you at Basho for reporting this.

Regards,
Bj?rn-Egil


On 2013-07-23 23:19, Bj?rn-Egil Dahlberg wrote:
> True, that seems suspicious.
>
> The vacation for Rickard is going great I think. Last I heard from 
> him, he was diving round ?land (literally "island-land") in 
> south-eastern sweden. It will be a few weeks before he's back.
>
> In the meanwhile it is fairly lonely here at OTP, today we were two 
> persons at the office, and there is a lot of stuff to do. I will have 
> a quick look at it and verify but will probably let Rickard deal with 
> it when he comes back.
>
> Thanks for a great summary and drill down of the problem!
>
> Regards,
> Bj?rn-Egil
>
>
> 2013/7/23 Scott Lystig Fritchie <fritchie@REDACTED 
> <mailto:fritchie@REDACTED>>
>
>     Hi, everyone.  Hope your summer vacations are going well.  I have some
>     bad news for Rickard, at least.
>
>         SHA:         e794251f8e54d6697e1bcc360471fd76b20c7748
>         Author:      Rickard Green <rickard@REDACTED
>     <mailto:rickard@REDACTED>>
>         Date:        Thu May 30 2013 07:56:31 GMT-0500 (CDT)
>         Subject: Merge branch 'rickard/ptab-id-alloc/OTP-11077' into maint
>         Parent:      22685099ace9802016bf6203c525702084717d72
>         Parent:      5c039a1fb4979314912dc3af6626d8d7a1c73993
>         Merge branch 'rickard/ptab-id-alloc/OTP-11077' into maint
>
>         * rickard/ptab-id-alloc/OTP-11077:
>           Introduce a better id allocation algorithm for PTabs
>
>     This commit appears to break monitor delivery?  And it may or may
>     not be
>     causing processes to die for reasons that we cannot see or understand.
>
>     Run with R15B03-1, the example code in test6.erl is merely slow:
>
>     https://gist.github.com/jtuple/aa4830a0ff0a94f69484/raw/02adc518e225f263a7e25d339ec7200ef2dda491/test6.erl
>
>     On my 4 core/8 HT core MacBook Pro, R15B03-1 cannot go above 200% CPU
>     utilization, and the execution time is correspondingly slooow.  But it
>     appears to work correctly.
>
>         erl -eval '[begin io:format("Iteration ~p at ~p\n",
>     [X,time()]), test6:go() end || X <- lists:seq(1, 240)].'
>
>     When run with R16B, it's *much* faster.  CPU utilization above 750%
>     confirms that it's going faster.  And it appears to work correctly.
>
>     However, when run with R16B01, we see non-deterministic hangs on
>     both OS
>     X and various Linux platforms.  CPU consumption by the "beam.smp"
>     process drops to 0, and the next cycle of the list comprehension never
>     starts.
>
>     Thanks to the magic of Git, it's pretty clear that the commit above is
>     broken.  The commit before it appears to work well (i.e., does not
>     hang).
>
>         SHA:         22685099ace9802016bf6203c525702084717d72
>         Author:      Anders Svensson <anders@REDACTED
>     <mailto:anders@REDACTED>>
>         Date:        Wed May 29 2013 11:46:10 GMT-0500 (CDT)
>         Subject: Merge branch
>     'anders/diameter/watchdog_function_clause/OTP-11115' into maint
>
>     Using R16B01 together with the "+P legacy" flag does not hang.
>      But this
>     problem has given us at Basho enough ... caution ... that we will be
>     much more cautious about moving our app packaging from R15B* to R16B*.
>
>     Several seconds after CPU consumption drops to 0%, then I trigger the
>     creation of a "erl_crash.dump" file using erlang:halt("bummer").  If I
>     look at that file, then the process "Spawned as: test6:do_work2/0"
>     says
>     that there are active unidirectional links (i.e., monitors), but there
>     is one process on that list that does not have a corresponding
>     "=proc:<pid.number.here>" entry in the dump ... which strongly
>     suggests
>     to me that the process is dead.  Using DTrace, I've been able to
>     establish that the dead process is indeed alive at one time and
>     has been
>     scheduled & descheduled at least once.  So there are really two
>     mysteries:
>
>     1. Why is one of the test6:indirect_proxy/1 processes dying
>     unexpectedly?  (The monitor doesn't fire, SASL isn't logging any
>     errors,
>     etc.)
>
>     2. Why isn't a monitor message being delivered?
>
>     Many thanks to Joe Blomstedt, Evan Vigil-McClanahan, Andrew Thompson,
>     Steve Vinoski, and Sean Cribbs for their sleuthing work.
>
>     -Scott
>
>     --- snip --- snip --- snip --- snip --- snip ---
>
>     R15B03 lock count analysis, FWIW:
>
>             lock     id  #tries  #collisions  collisions [%]  time
>     [us]  duration [%]
>            -----    --- ------- ------------ ---------------
>     ---------- -------------
>         proc_tab      1 1280032      1266133         98.9142 60642804
>          557.0583
>        run_queue      8 3617608        12874          0.3559 261722  
>          2.4042
>      sys_tracers      1 1280042         6445          0.5035  19365  
>          0.1779
>         pix_lock    256 4480284         1213          0.0271   9777  
>          0.0898
>        timeofday      1  709955         1187          0.1672   3216  
>          0.0295
>     [......]
>
>     --- snip --- snip --- snip --- snip --- snip ---
>
>     =proc:<0.29950.154>
>     State: Waiting
>     Spawned as: test6:do_work2/0
>     Spawned by: <0.48.0>
>     Started: Tue Jul 23 04:50:54 2013
>     Message queue length: 0
>     Number of heap fragments: 0
>     Heap fragment data: 0
>     Link list: [{from,<0.48.0>,#Ref<0.0.19.96773>},
>     {to,<0.32497.154>,#Ref<0.0.19.96797>},
>     {to,<0.1184.155>,#Ref<0.0.19.96796>},
>     {to,<0.31361.154>,#Ref<0.0.19.96799>},
>     {to,<0.32019.154>,#Ref<0.0.19.96801>},
>     {to,<0.32501.154>,#Ref<0.0.19.96800>},
>     {to,<0.1352.155>,#Ref<0.0.19.96803>},
>     {to,<0.32415.154>,#Ref<0.0.19.96805>},
>     {to,<0.504.155>,#Ref<0.0.19.96804>},
>     {to,<0.87.155>,#Ref<0.0.19.96802>},
>     {to,<0.776.155>,#Ref<0.0.19.96798>}]
>     Reductions: 45
>     Stack+heap: 233
>     OldHeap: 0
>     Heap unused: 155
>     OldHeap unused: 0
>     Memory: 3472
>     Program counter: 0x000000001e1504d0 (test6:do_work2/0 + 184)
>     CP: 0x0000000000000000 (invalid)
>     arity = 0
>     _______________________________________________
>     erlang-bugs mailing list
>     erlang-bugs@REDACTED <mailto:erlang-bugs@REDACTED>
>     http://erlang.org/mailman/listinfo/erlang-bugs
>
>
>
>
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://erlang.org/mailman/listinfo/erlang-bugs

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130801/2c81ef68/attachment.htm>

From tony@REDACTED  Thu Aug  1 20:06:12 2013
From: tony@REDACTED (Tony Rogvall)
Date: Thu, 1 Aug 2013 20:06:12 +0200
Subject: [erlang-bugs] A funny bug
Message-ID: <6AD0BB5D-9A64-4641-9F85-CBE14BDEC39E@rogvall.se>

I was inspecting the new, and long awaited for, fix of 'receive' in erl_eval.

I could not help myself to wonder what would happened if:

5> self() ! x, prim_eval:'receive'(fun(X) -> receive Y -> Y end end, 1000). 

Bus error: 10

I know you should not use prim_eval in this manner, but I tend to ignore recommendations.
But this construct should probably not crash the VM.

/Tony

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130801/0100891a/attachment.htm>

From egil@REDACTED  Thu Aug  1 20:22:12 2013
From: egil@REDACTED (=?ISO-8859-1?Q?Bj=F6rn-Egil_Dahlberg?=)
Date: Thu, 1 Aug 2013 20:22:12 +0200
Subject: [erlang-bugs] R16B01's monitor delivery is broken?
In-Reply-To: <51FAA1F7.8000704@erlang.org>
References: <15464.1374612273@snookles.snookles.com>
 <CAMjYFoOeWYOoO1w4Wx11td5FrOMp5xssO-nZHihCvphix6z_Aw@mail.gmail.com>
 <51FAA1F7.8000704@erlang.org>
Message-ID: <51FAA754.6090600@erlang.org>

On 2013-08-01 19:59, Bj?rn-Egil Dahlberg wrote:
>
> We have confirmed that this problem indeed exists and we think we 
> understand what is happening.
>
> The problem has a low probability of occurring, though obviously 
> reproducible, and pretty serious if it occurs.
>
> I won't go into too much details, but the uniqueness of the process 
> identifier can be compromised, i.e. it will not be unique. In essence 
> a process might get an identifier of an already terminated process (or 
> an already living one though I haven't confirmed that), the mapping is 
> then overwritten, and by inspecting this identifier it will look dead. 
> Signals or messages will not be sent since it is "dead" or sent to an 
> unsuspecting (wrong) process. The mappings of id's and process 
> pointers has become inconsistent. It's a bit more complicated than 
> that but in a nutshell that's what's happening.
>
> What is needed for this to occur? A wrapping of the entire 
> "free-list-ring" of identifiers (size of max processes) while one 
> thread is in progress of doing an atomic read, some shift and masking, 
> and then a write for creating an identifier. *Highly unlikely* but 
> definitely a race. I.e. while one thread is doing a read, shift/mask, 
> and write to memory the other threads has to create and terminate 
> 262144 processes (or whatever the limit is set to, but that is the 
> default)
I think I tried to simplify this explanation too much. The race occurs 
when the process is deleted and writes to the free-list and a new 
process is created which is 262144 "generations/spawns" after the 
deleted process and reads from the free-list in between the terminating 
process read-shift/mask-write. Anyway details .. it's a race.

>
> If the thread is scheduled out by the OS, or a hyperthread switch 
> occurs because of a mem-stall (we're dealing with membarriers here 
> after all so it might be a thing) between the read and write the 
> likelihood of an incident increases. Also, by lowering 
> max-process-limit in the system the likelihood increases.
>
> We think we have a solution for this and initial tests show no 
> evidence of uniqueness problem after the fix. I think we will have a 
> fix out in maint next week.
>
> Using R16B01 together with the "+P legacy" is a workaround for this 
> issue. The legacy option uses the old way and does not suffer from 
> this problem.
>
> Thank you Scott, and to the rest of you at Basho for reporting this.
>
> Regards,
> Bj?rn-Egil
>
>
> On 2013-07-23 23:19, Bj?rn-Egil Dahlberg wrote:
>> True, that seems suspicious.
>>
>> The vacation for Rickard is going great I think. Last I heard from 
>> him, he was diving round ?land (literally "island-land") in 
>> south-eastern sweden. It will be a few weeks before he's back.
>>
>> In the meanwhile it is fairly lonely here at OTP, today we were two 
>> persons at the office, and there is a lot of stuff to do. I will have 
>> a quick look at it and verify but will probably let Rickard deal with 
>> it when he comes back.
>>
>> Thanks for a great summary and drill down of the problem!
>>
>> Regards,
>> Bj?rn-Egil
>>
>>
>> 2013/7/23 Scott Lystig Fritchie <fritchie@REDACTED 
>> <mailto:fritchie@REDACTED>>
>>
>>     Hi, everyone.  Hope your summer vacations are going well.  I have
>>     some
>>     bad news for Rickard, at least.
>>
>>         SHA:         e794251f8e54d6697e1bcc360471fd76b20c7748
>>         Author:      Rickard Green <rickard@REDACTED
>>     <mailto:rickard@REDACTED>>
>>         Date:        Thu May 30 2013 07:56:31 GMT-0500 (CDT)
>>         Subject: Merge branch 'rickard/ptab-id-alloc/OTP-11077' into
>>     maint
>>         Parent:      22685099ace9802016bf6203c525702084717d72
>>         Parent:      5c039a1fb4979314912dc3af6626d8d7a1c73993
>>         Merge branch 'rickard/ptab-id-alloc/OTP-11077' into maint
>>
>>         * rickard/ptab-id-alloc/OTP-11077:
>>           Introduce a better id allocation algorithm for PTabs
>>
>>     This commit appears to break monitor delivery?  And it may or may
>>     not be
>>     causing processes to die for reasons that we cannot see or
>>     understand.
>>
>>     Run with R15B03-1, the example code in test6.erl is merely slow:
>>
>>     https://gist.github.com/jtuple/aa4830a0ff0a94f69484/raw/02adc518e225f263a7e25d339ec7200ef2dda491/test6.erl
>>
>>     On my 4 core/8 HT core MacBook Pro, R15B03-1 cannot go above 200% CPU
>>     utilization, and the execution time is correspondingly slooow.
>>      But it
>>     appears to work correctly.
>>
>>         erl -eval '[begin io:format("Iteration ~p at ~p\n",
>>     [X,time()]), test6:go() end || X <- lists:seq(1, 240)].'
>>
>>     When run with R16B, it's *much* faster.  CPU utilization above 750%
>>     confirms that it's going faster.  And it appears to work correctly.
>>
>>     However, when run with R16B01, we see non-deterministic hangs on
>>     both OS
>>     X and various Linux platforms.  CPU consumption by the "beam.smp"
>>     process drops to 0, and the next cycle of the list comprehension
>>     never
>>     starts.
>>
>>     Thanks to the magic of Git, it's pretty clear that the commit
>>     above is
>>     broken.  The commit before it appears to work well (i.e., does not
>>     hang).
>>
>>         SHA:         22685099ace9802016bf6203c525702084717d72
>>         Author:      Anders Svensson <anders@REDACTED
>>     <mailto:anders@REDACTED>>
>>         Date:        Wed May 29 2013 11:46:10 GMT-0500 (CDT)
>>         Subject: Merge branch
>>     'anders/diameter/watchdog_function_clause/OTP-11115' into maint
>>
>>     Using R16B01 together with the "+P legacy" flag does not hang.
>>      But this
>>     problem has given us at Basho enough ... caution ... that we will be
>>     much more cautious about moving our app packaging from R15B* to
>>     R16B*.
>>
>>     Several seconds after CPU consumption drops to 0%, then I trigger the
>>     creation of a "erl_crash.dump" file using erlang:halt("bummer").
>>      If I
>>     look at that file, then the process "Spawned as:
>>     test6:do_work2/0" says
>>     that there are active unidirectional links (i.e., monitors), but
>>     there
>>     is one process on that list that does not have a corresponding
>>     "=proc:<pid.number.here>" entry in the dump ... which strongly
>>     suggests
>>     to me that the process is dead.  Using DTrace, I've been able to
>>     establish that the dead process is indeed alive at one time and
>>     has been
>>     scheduled & descheduled at least once.  So there are really two
>>     mysteries:
>>
>>     1. Why is one of the test6:indirect_proxy/1 processes dying
>>     unexpectedly?  (The monitor doesn't fire, SASL isn't logging any
>>     errors,
>>     etc.)
>>
>>     2. Why isn't a monitor message being delivered?
>>
>>     Many thanks to Joe Blomstedt, Evan Vigil-McClanahan, Andrew Thompson,
>>     Steve Vinoski, and Sean Cribbs for their sleuthing work.
>>
>>     -Scott
>>
>>     --- snip --- snip --- snip --- snip --- snip ---
>>
>>     R15B03 lock count analysis, FWIW:
>>
>>             lock     id  #tries  #collisions  collisions [%]  time
>>     [us]  duration [%]
>>            -----    --- ------- ------------ ---------------
>>     ---------- -------------
>>         proc_tab      1 1280032      1266133         98.9142 60642804
>>          557.0583
>>        run_queue      8 3617608        12874          0.3559   261722
>>            2.4042
>>      sys_tracers      1 1280042         6445          0.5035    19365
>>            0.1779
>>         pix_lock    256 4480284         1213          0.0271     9777
>>            0.0898
>>        timeofday      1  709955         1187          0.1672     3216
>>            0.0295
>>     [......]
>>
>>     --- snip --- snip --- snip --- snip --- snip ---
>>
>>     =proc:<0.29950.154>
>>     State: Waiting
>>     Spawned as: test6:do_work2/0
>>     Spawned by: <0.48.0>
>>     Started: Tue Jul 23 04:50:54 2013
>>     Message queue length: 0
>>     Number of heap fragments: 0
>>     Heap fragment data: 0
>>     Link list: [{from,<0.48.0>,#Ref<0.0.19.96773>},
>>     {to,<0.32497.154>,#Ref<0.0.19.96797>},
>>     {to,<0.1184.155>,#Ref<0.0.19.96796>},
>>     {to,<0.31361.154>,#Ref<0.0.19.96799>},
>>     {to,<0.32019.154>,#Ref<0.0.19.96801>},
>>     {to,<0.32501.154>,#Ref<0.0.19.96800>},
>>     {to,<0.1352.155>,#Ref<0.0.19.96803>},
>>     {to,<0.32415.154>,#Ref<0.0.19.96805>},
>>     {to,<0.504.155>,#Ref<0.0.19.96804>},
>>     {to,<0.87.155>,#Ref<0.0.19.96802>},
>>     {to,<0.776.155>,#Ref<0.0.19.96798>}]
>>     Reductions: 45
>>     Stack+heap: 233
>>     OldHeap: 0
>>     Heap unused: 155
>>     OldHeap unused: 0
>>     Memory: 3472
>>     Program counter: 0x000000001e1504d0 (test6:do_work2/0 + 184)
>>     CP: 0x0000000000000000 (invalid)
>>     arity = 0
>>     _______________________________________________
>>     erlang-bugs mailing list
>>     erlang-bugs@REDACTED <mailto:erlang-bugs@REDACTED>
>>     http://erlang.org/mailman/listinfo/erlang-bugs
>>
>>
>>
>>
>> _______________________________________________
>> erlang-bugs mailing list
>> erlang-bugs@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-bugs
>
>
>
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://erlang.org/mailman/listinfo/erlang-bugs

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130801/5a28e6bb/attachment.htm>

From n.oxyde@REDACTED  Thu Aug  1 22:29:19 2013
From: n.oxyde@REDACTED (Anthony Ramine)
Date: Thu, 1 Aug 2013 22:29:19 +0200
Subject: [erlang-bugs] A funny bug
In-Reply-To: <6AD0BB5D-9A64-4641-9F85-CBE14BDEC39E@rogvall.se>
References: <6AD0BB5D-9A64-4641-9F85-CBE14BDEC39E@rogvall.se>
Message-ID: <28C886FB-1767-45F7-B2DC-7F796298589A@gmail.com>

Hello,

It's not that you should not use prim_eval in this particular manner, you should not use prim_eval at all. This is probably not the only primitive that can make the VM segfault.

That being said, when I implemented that patch the function didn't call a given closure but an hard-coded remote function in prim_eval; maybe we should put that back?

Regards,

-- 
Anthony Ramine

Le 1 ao?t 2013 ? 20:06, Tony Rogvall a ?crit :

> I was inspecting the new, and long awaited for, fix of 'receive' in erl_eval.
> 
> I could not help myself to wonder what would happened if:
> 
> 5> self() ! x, prim_eval:'receive'(fun(X) -> receive Y -> Y end end, 1000). 
> 
> Bus error: 10
> 
> I know you should not use prim_eval in this manner, but I tend to ignore recommendations.
> But this construct should probably not crash the VM.
> 
> /Tony
> 
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://erlang.org/mailman/listinfo/erlang-bugs


From n.oxyde@REDACTED  Fri Aug  2 12:54:59 2013
From: n.oxyde@REDACTED (Anthony Ramine)
Date: Fri, 2 Aug 2013 12:54:59 +0200
Subject: [erlang-bugs] R16B01's monitor delivery is broken?
In-Reply-To: <51FAA754.6090600@erlang.org>
References: <15464.1374612273@snookles.snookles.com>
 <CAMjYFoOeWYOoO1w4Wx11td5FrOMp5xssO-nZHihCvphix6z_Aw@mail.gmail.com>
 <51FAA1F7.8000704@erlang.org> <51FAA754.6090600@erlang.org>
Message-ID: <78E4CAE9-38E9-4050-97BF-395A63BD7406@gmail.com>

Hello,

Talking about explanations and whatnot, next time such an important algorithm is modified, could we have more than a one-liner commit message? Thank you.

Here is a nice idea: when you review your own commits from the OTP team, consider a complete stranger wrote it and require from yourself the same kind of explanatory commit messages you would from such a third party entity.

I hate useless commit messages.

Regards,

-- 
Anthony Ramine

Le 1 ao?t 2013 ? 20:22, Bj?rn-Egil Dahlberg a ?crit :

> On 2013-08-01 19:59, Bj?rn-Egil Dahlberg wrote:
>> 
>> We have confirmed that this problem indeed exists and we think we understand what is happening.
>> 
>> The problem has a low probability of occurring, though obviously reproducible, and pretty serious if it occurs.
>> 
>> I won't go into too much details, but the uniqueness of the process identifier can be compromised, i.e. it will not be unique. In essence a process might get an identifier of an already terminated process (or an already living one though I haven't confirmed that), the mapping is then overwritten, and by inspecting this identifier it will look dead. Signals or messages will not be sent since it is "dead" or sent to an unsuspecting (wrong) process. The mappings of id's and process pointers has become inconsistent. It's a bit more complicated than that but in a nutshell that's what's happening.
>> 
>> What is needed for this to occur? A wrapping of the entire "free-list-ring" of identifiers (size of max processes) while one thread is in progress of doing an atomic read, some shift and masking, and then a write for creating an identifier. *Highly unlikely* but definitely a race. I.e. while one thread is doing a read, shift/mask, and write to memory the other threads has to create and terminate 262144 processes (or whatever the limit is set to, but that is the default)
> I think I tried to simplify this explanation too much. The race occurs when the process is deleted and writes to the free-list and a new process is created which is 262144 "generations/spawns" after the deleted process and reads from the free-list in between the terminating process read-shift/mask-write. Anyway details .. it's a race.
> 
>> 
>> If the thread is scheduled out by the OS, or a hyperthread switch occurs because of a mem-stall (we're dealing with membarriers here after all so it might be a thing) between the read and write the likelihood of an incident increases. Also, by lowering max-process-limit in the system the likelihood increases.
>> 
>> We think we have a solution for this and initial tests show no evidence of uniqueness problem after the fix. I think we will have a fix out in maint next week.
>> 
>> Using R16B01 together with the "+P legacy" is a workaround for this issue. The legacy option uses the old way and does not suffer from this problem.
>> 
>> Thank you Scott, and to the rest of you at Basho for reporting this.
>> 
>> Regards,
>> Bj?rn-Egil
>> 
>> 
>> On 2013-07-23 23:19, Bj?rn-Egil Dahlberg wrote:
>>> True, that seems suspicious.
>>> 
>>> The vacation for Rickard is going great I think. Last I heard from him, he was diving round ?land (literally             "island-land") in south-eastern sweden. It will be a few weeks before he's back.
>>> 
>>> In the meanwhile it is fairly lonely here at OTP, today we were two persons at the office, and there is a lot of stuff to do. I will have a quick look at it and verify but will probably let Rickard deal with it when he comes back.
>>> 
>>> Thanks for a great summary and drill down of the problem!
>>> 
>>> Regards,
>>> Bj?rn-Egil
>>> 
>>> 
>>> 2013/7/23 Scott Lystig Fritchie <fritchie@REDACTED>
>>> Hi, everyone.  Hope your summer vacations are going well.  I have some
>>> bad news for Rickard, at least.
>>> 
>>>     SHA:         e794251f8e54d6697e1bcc360471fd76b20c7748
>>>     Author:      Rickard Green <rickard@REDACTED>
>>>     Date:        Thu May 30 2013 07:56:31 GMT-0500 (CDT)
>>>     Subject: Merge branch 'rickard/ptab-id-alloc/OTP-11077' into maint
>>>     Parent:      22685099ace9802016bf6203c525702084717d72
>>>     Parent:      5c039a1fb4979314912dc3af6626d8d7a1c73993
>>>     Merge branch 'rickard/ptab-id-alloc/OTP-11077' into maint
>>> 
>>>     * rickard/ptab-id-alloc/OTP-11077:
>>>       Introduce a better id allocation algorithm for PTabs
>>> 
>>> This commit appears to break monitor delivery?  And it may or may not be
>>> causing processes to die for reasons that we cannot see or understand.
>>> 
>>> Run with R15B03-1, the example code in test6.erl is merely slow:
>>> 
>>>     https://gist.github.com/jtuple/aa4830a0ff0a94f69484/raw/02adc518e225f263a7e25d339ec7200ef2dda491/test6.erl
>>> 
>>> On my 4 core/8 HT core MacBook Pro, R15B03-1 cannot go above 200% CPU
>>> utilization, and the execution time is correspondingly slooow.  But it
>>> appears to work correctly.
>>> 
>>>     erl -eval '[begin io:format("Iteration ~p at ~p\n", [X,time()]), test6:go() end || X <- lists:seq(1, 240)].'
>>> 
>>> When run with R16B, it's *much* faster.  CPU utilization above 750%
>>> confirms that it's going faster.  And it appears to work correctly.
>>> 
>>> However, when run with R16B01, we see non-deterministic hangs on both OS
>>> X and various Linux platforms.  CPU consumption by the "beam.smp"
>>> process drops to 0, and the next cycle of the list comprehension never
>>> starts.
>>> 
>>> Thanks to the magic of Git, it's pretty clear that the commit above is
>>> broken.  The commit before it appears to work well (i.e., does not
>>> hang).
>>> 
>>>     SHA:         22685099ace9802016bf6203c525702084717d72
>>>     Author:      Anders Svensson <anders@REDACTED>
>>>     Date:        Wed May 29 2013 11:46:10 GMT-0500 (CDT)
>>>     Subject: Merge branch 'anders/diameter/watchdog_function_clause/OTP-11115' into maint
>>> 
>>> Using R16B01 together with the "+P legacy" flag does not hang.  But this
>>> problem has given us at Basho enough ... caution ... that we will be
>>> much more cautious about moving our app packaging from R15B* to R16B*.
>>> 
>>> Several seconds after CPU consumption drops to 0%, then I trigger the
>>> creation of a "erl_crash.dump" file using erlang:halt("bummer").  If I
>>> look at that file, then the process "Spawned as: test6:do_work2/0" says
>>> that there are active unidirectional links (i.e., monitors), but there
>>> is one process on that list that does not have a corresponding
>>> "=proc:<pid.number.here>" entry in the dump ... which strongly suggests
>>> to me that the process is dead.  Using DTrace, I've been able to
>>> establish that the dead process is indeed alive at one time and has been
>>> scheduled & descheduled at least once.  So there are really two
>>> mysteries:
>>> 
>>> 1. Why is one of the test6:indirect_proxy/1 processes dying
>>> unexpectedly?  (The monitor doesn't fire, SASL isn't logging any errors,
>>> etc.)
>>> 
>>> 2. Why isn't a monitor message being delivered?
>>> 
>>> Many thanks to Joe Blomstedt, Evan Vigil-McClanahan, Andrew Thompson,
>>> Steve Vinoski, and Sean Cribbs for their sleuthing work.
>>> 
>>> -Scott
>>> 
>>> --- snip --- snip --- snip --- snip --- snip ---
>>> 
>>> R15B03 lock count analysis, FWIW:
>>> 
>>>         lock     id  #tries  #collisions  collisions [%]  time [us]  duration [%]
>>>        -----    --- ------- ------------ --------------- ---------- -------------
>>>     proc_tab      1 1280032      1266133         98.9142   60642804      557.0583
>>>    run_queue      8 3617608        12874          0.3559     261722        2.4042
>>>  sys_tracers      1 1280042         6445          0.5035      19365        0.1779
>>>     pix_lock    256 4480284         1213          0.0271       9777        0.0898
>>>    timeofday      1  709955         1187          0.1672       3216        0.0295
>>> [......]
>>> 
>>> --- snip --- snip --- snip --- snip --- snip ---
>>> 
>>> =proc:<0.29950.154>
>>> State: Waiting
>>> Spawned as: test6:do_work2/0
>>> Spawned by: <0.48.0>
>>> Started: Tue Jul 23 04:50:54 2013
>>> Message queue length: 0
>>> Number of heap fragments: 0
>>> Heap fragment data: 0
>>> Link list: [{from,<0.48.0>,#Ref<0.0.19.96773>}, {to,<0.32497.154>,#Ref<0.0.19.96797>}, {to,<0.1184.155>,#Ref<0.0.19.96796>}, {to,<0.31361.154>,#Ref<0.0.19.96799>}, {to,<0.32019.154>,#Ref<0.0.19.96801>}, {to,<0.32501.154>,#Ref<0.0.19.96800>}, {to,<0.1352.155>,#Ref<0.0.19.96803>}, {to,<0.32415.154>,#Ref<0.0.19.96805>}, {to,<0.504.155>,#Ref<0.0.19.96804>}, {to,<0.87.155>,#Ref<0.0.19.96802>}, {to,<0.776.155>,#Ref<0.0.19.96798>}]
>>> Reductions: 45
>>> Stack+heap: 233
>>> OldHeap: 0
>>> Heap unused: 155
>>> OldHeap unused: 0
>>> Memory: 3472
>>> Program counter: 0x000000001e1504d0 (test6:do_work2/0 + 184)
>>> CP: 0x0000000000000000 (invalid)
>>> arity = 0
>>> _______________________________________________
>>> erlang-bugs mailing list
>>> erlang-bugs@REDACTED
>>> http://erlang.org/mailman/listinfo/erlang-bugs
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> erlang-bugs mailing list
>>> 
>>> erlang-bugs@REDACTED
>>> http://erlang.org/mailman/listinfo/erlang-bugs
>> 
>> 
>> 
>> _______________________________________________
>> erlang-bugs mailing list
>> 
>> erlang-bugs@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-bugs
> 
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://erlang.org/mailman/listinfo/erlang-bugs


From tony@REDACTED  Fri Aug  2 14:03:56 2013
From: tony@REDACTED (Tony Rogvall)
Date: Fri, 2 Aug 2013 14:03:56 +0200
Subject: [erlang-bugs] A funny bug
In-Reply-To: <28C886FB-1767-45F7-B2DC-7F796298589A@gmail.com>
References: <6AD0BB5D-9A64-4641-9F85-CBE14BDEC39E@rogvall.se>
 <28C886FB-1767-45F7-B2DC-7F796298589A@gmail.com>
Message-ID: <614DCCF0-DB18-4848-BCBB-B4891F80D30D@rogvall.se>


On 1 aug 2013, at 22:29, Anthony Ramine <n.oxyde@REDACTED> wrote:

> Hello,
> 
> It's not that you should not use prim_eval in this particular manner, you should not use prim_eval at all. This is probably not the only primitive that can make the VM segfault.

Well I am, indirectly, using prim_eval:'receive'/2 when I am executing "receive Y -> Y end" from the shell.
I think that this should still be allowed. (need a smily here? I guess, ok here it is ;-)

> 
> That being said, when I implemented that patch the function didn't call a given closure but an hard-coded remote function in prim_eval; maybe we should put that back?

I like prim_eval:'receive'/2 the way it is right now. But, for example, a badarg when trying to do receive within the function closure would be nice.
I guess OTP team can figure out a lightweight way of doing this?

Regards

/Tony

> 
> Regards,
> 
> -- 
> Anthony Ramine
> 
> Le 1 ao?t 2013 ? 20:06, Tony Rogvall a ?crit :
> 
>> I was inspecting the new, and long awaited for, fix of 'receive' in erl_eval.
>> 
>> I could not help myself to wonder what would happened if:
>> 
>> 5> self() ! x, prim_eval:'receive'(fun(X) -> receive Y -> Y end end, 1000). 
>> 
>> Bus error: 10
>> 
>> I know you should not use prim_eval in this manner, but I tend to ignore recommendations.
>> But this construct should probably not crash the VM.
>> 
>> /Tony
>> 
>> _______________________________________________
>> erlang-bugs mailing list
>> erlang-bugs@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-bugs
> 

"Installing applications can lead to corruption over time. Applications gradually write over each other's libraries, partial upgrades occur, user and system errors happen, and minute changes may be unnoticeable and difficult to fix"


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130802/f7a436b4/attachment.htm>

From n.oxyde@REDACTED  Fri Aug  2 15:20:47 2013
From: n.oxyde@REDACTED (Anthony Ramine)
Date: Fri, 2 Aug 2013 15:20:47 +0200
Subject: [erlang-bugs] A funny bug
In-Reply-To: <614DCCF0-DB18-4848-BCBB-B4891F80D30D@rogvall.se>
References: <6AD0BB5D-9A64-4641-9F85-CBE14BDEC39E@rogvall.se>
 <28C886FB-1767-45F7-B2DC-7F796298589A@gmail.com>
 <614DCCF0-DB18-4848-BCBB-B4891F80D30D@rogvall.se>
Message-ID: <EA747C3E-D61E-4F6A-BCB1-217C75B6016C@gmail.com>

Replied inline.

-- 
Anthony Ramine

Le 2 ao?t 2013 ? 14:03, Tony Rogvall a ?crit :

> 
> On 1 aug 2013, at 22:29, Anthony Ramine <n.oxyde@REDACTED> wrote:
> 
>> Hello,
>> 
>> It's not that you should not use prim_eval in this particular manner, you should not use prim_eval at all. This is probably not the only primitive that can make the VM segfault.
> 
> Well I am, indirectly, using prim_eval:'receive'/2 when I am executing "receive Y -> Y end" from the shell.
> I think that this should still be allowed. (need a smily here? I guess, ok here it is ;-)

There is a reason why undocumented stuff is undocumented, smiley or not.

>> That being said, when I implemented that patch the function didn't call a given closure but an hard-coded remote function in prim_eval; maybe we should put that back?
> 
> I like prim_eval:'receive'/2 the way it is right now. But, for example, a badarg when trying to do receive within the function closure would be nice.
> I guess OTP team can figure out a lightweight way of doing this?

There is no such a way, apart from making an extra check in the VM and thus slowing down any receive code.

> Regards
> 
> /Tony
> 
>> 
>> Regards,
>> 
>> -- 
>> Anthony Ramine
>> 
>> Le 1 ao?t 2013 ? 20:06, Tony Rogvall a ?crit :
>> 
>>> I was inspecting the new, and long awaited for, fix of 'receive' in erl_eval.
>>> 
>>> I could not help myself to wonder what would happened if:
>>> 
>>> 5> self() ! x, prim_eval:'receive'(fun(X) -> receive Y -> Y end end, 1000). 
>>> 
>>> Bus error: 10
>>> 
>>> I know you should not use prim_eval in this manner, but I tend to ignore recommendations.
>>> But this construct should probably not crash the VM.
>>> 
>>> /Tony
>>> 
>>> _______________________________________________
>>> erlang-bugs mailing list
>>> erlang-bugs@REDACTED
>>> http://erlang.org/mailman/listinfo/erlang-bugs
>> 
> 
> "Installing applications can lead to corruption over time. Applications gradually write over each other's libraries, partial upgrades occur, user and system errors happen, and minute changes may be unnoticeable and difficult to fix"
> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130802/257a274d/attachment.htm>

From tony@REDACTED  Fri Aug  2 15:39:55 2013
From: tony@REDACTED (Tony Rogvall)
Date: Fri, 2 Aug 2013 15:39:55 +0200
Subject: [erlang-bugs] A funny bug
In-Reply-To: <EA747C3E-D61E-4F6A-BCB1-217C75B6016C@gmail.com>
References: <6AD0BB5D-9A64-4641-9F85-CBE14BDEC39E@rogvall.se>
 <28C886FB-1767-45F7-B2DC-7F796298589A@gmail.com>
 <614DCCF0-DB18-4848-BCBB-B4891F80D30D@rogvall.se>
 <EA747C3E-D61E-4F6A-BCB1-217C75B6016C@gmail.com>
Message-ID: <266F75A8-3E1A-406F-8877-2477647E6C2B@rogvall.se>


On 2 aug 2013, at 15:20, Anthony Ramine <n.oxyde@REDACTED> wrote:

> Replied inline.
> 
> -- 
> Anthony Ramine
> 
> Le 2 ao?t 2013 ? 14:03, Tony Rogvall a ?crit :
> 
>> 
>> On 1 aug 2013, at 22:29, Anthony Ramine <n.oxyde@REDACTED> wrote:
>> 
>>> Hello,
>>> 
>>> It's not that you should not use prim_eval in this particular manner, you should not use prim_eval at all. This is probably not the only primitive that can make the VM segfault.
>> 
>> Well I am, indirectly, using prim_eval:'receive'/2 when I am executing "receive Y -> Y end" from the shell.
>> I think that this should still be allowed. (need a smily here? I guess, ok here it is ;-)
> 
> There is a reason why undocumented stuff is undocumented, smiley or not.

Laziness ?

> 
>>> That being said, when I implemented that patch the function didn't call a given closure but an hard-coded remote function in prim_eval; maybe we should put that back?
>> 
>> I like prim_eval:'receive'/2 the way it is right now. But, for example, a badarg when trying to do receive within the function closure would be nice.
>> I guess OTP team can figure out a lightweight way of doing this?
> 
> There is no such a way, apart from making an extra check in the VM and thus slowing down any receive code.

Well, a recursive interpreted call could be fixed without slowing down "any receive code"

/Tony


>> Regards
>> 
>> /Tony
>> 
>>> 
>>> Regards,
>>> 
>>> -- 
>>> Anthony Ramine
>>> 
>>> Le 1 ao?t 2013 ? 20:06, Tony Rogvall a ?crit :
>>> 
>>>> I was inspecting the new, and long awaited for, fix of 'receive' in erl_eval.
>>>> 
>>>> I could not help myself to wonder what would happened if:
>>>> 
>>>> 5> self() ! x, prim_eval:'receive'(fun(X) -> receive Y -> Y end end, 1000). 
>>>> 
>>>> Bus error: 10
>>>> 
>>>> I know you should not use prim_eval in this manner, but I tend to ignore recommendations.
>>>> But this construct should probably not crash the VM.
>>>> 
>>>> /Tony
>>>> 
>>>> _______________________________________________
>>>> erlang-bugs mailing list
>>>> erlang-bugs@REDACTED
>>>> http://erlang.org/mailman/listinfo/erlang-bugs
>>> 
>> 
>> "Installing applications can lead to corruption over time. Applications gradually write over each other's libraries, partial upgrades occur, user and system errors happen, and minute changes may be unnoticeable and difficult to fix"
>> 
>> 
>> 
> 

"Installing applications can lead to corruption over time. Applications gradually write over each other's libraries, partial upgrades occur, user and system errors happen, and minute changes may be unnoticeable and difficult to fix"


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130802/26dbba3f/attachment.htm>

From n.oxyde@REDACTED  Fri Aug  2 15:55:05 2013
From: n.oxyde@REDACTED (Anthony Ramine)
Date: Fri, 2 Aug 2013 15:55:05 +0200
Subject: [erlang-bugs] A funny bug
In-Reply-To: <266F75A8-3E1A-406F-8877-2477647E6C2B@rogvall.se>
References: <6AD0BB5D-9A64-4641-9F85-CBE14BDEC39E@rogvall.se>
 <28C886FB-1767-45F7-B2DC-7F796298589A@gmail.com>
 <614DCCF0-DB18-4848-BCBB-B4891F80D30D@rogvall.se>
 <EA747C3E-D61E-4F6A-BCB1-217C75B6016C@gmail.com>
 <266F75A8-3E1A-406F-8877-2477647E6C2B@rogvall.se>
Message-ID: <856DDC77-AA97-428B-8C68-BC4DE88618C0@gmail.com>

The fact that it is an interpreted call is irrelevant, the same snippet of code compiled triggers a segfault. If you want to fix it, I don't see any other way than making the loop_rec instruction itself fail if there is already a receive loop in progress.

The function is not documented not because of laziness (I always document things that should be documented) but because it just shouldn't be used outside of erl_eval. Anyone interested in the details of prim_eval should just look at the commit message I wrote when I introduced it.

-- 
Anthony Ramine

Le 2 ao?t 2013 ? 15:39, Tony Rogvall a ?crit :

> Well, a recursive interpreted call could be fixed without slowing down "any receive code"


From ulf@REDACTED  Fri Aug  2 16:57:10 2013
From: ulf@REDACTED (Ulf Wiger)
Date: Fri, 2 Aug 2013 16:57:10 +0200
Subject: [erlang-bugs] A funny bug
In-Reply-To: <856DDC77-AA97-428B-8C68-BC4DE88618C0@gmail.com>
References: <6AD0BB5D-9A64-4641-9F85-CBE14BDEC39E@rogvall.se>
 <28C886FB-1767-45F7-B2DC-7F796298589A@gmail.com>
 <614DCCF0-DB18-4848-BCBB-B4891F80D30D@rogvall.se>
 <EA747C3E-D61E-4F6A-BCB1-217C75B6016C@gmail.com>
 <266F75A8-3E1A-406F-8877-2477647E6C2B@rogvall.se>
 <856DDC77-AA97-428B-8C68-BC4DE88618C0@gmail.com>
Message-ID: <7DAE1921-064D-41C5-B9AE-EE4513FDF842@feuerlabs.com>


On 2 Aug 2013, at 15:55, Anthony Ramine <n.oxyde@REDACTED> wrote:

> The function is not documented not because of laziness (I always document things that should be documented) but because it just shouldn't be used outside of erl_eval. Anyone interested in the details of prim_eval should just look at the commit message I wrote when I introduced it.

Well, many of us have felt that it's an unfortunate shortcoming of Erlang that receive clauses cannot be parameterized. I created plain_fsm back in 2004 as an explicit workaround for this (using a parse_transform).

https://github.com/uwiger/plain_fsm/blob/master/doc/plain_fsm.md

So arguably, a way to parameterize receive *should* be available, and *should* be documented. I'm not saying that prim_eval:'receive'/2 is that very thing that should be documented, but it comes close enough that Erlang wizards like Tony should not only be excused for playing around with it, but should be *expected* to. ;-)

I aim to play with it myself, once I have some spare time.

BR,
Ulf

Ulf Wiger, Co-founder & Developer Advocate, Feuerlabs Inc.
http://feuerlabs.com


From eric.pailleau@REDACTED  Sun Aug  4 00:03:07 2013
From: eric.pailleau@REDACTED (PAILLEAU Eric)
Date: Sun, 04 Aug 2013 00:03:07 +0200
Subject: [erlang-bugs] R17A - Bug : unwanted semi-colon in generated erlang
 module after yecc compilation.
Message-ID: <51FD7E1B.8020102@wanadoo.fr>


2> yecc:file("test",[]).
test.yrl: Warning: conflicts: 111 shift/reduce, 0 reduce/reduce
{ok,"test.erl"}
3> c:c("test").
test.erl:1167: syntax error before: ';'

Yes, the generated test.erl have a semi-colon before yeccerror .
(compilation ok by removing it...).
--8<-------------------------------------------------
snip... snip

;
yeccpars2_24(_, _, _, _, T, _, _) ->
 yeccerror(T).

snip... snip
--8<-------------------------------------------------

in lib/parsetools/src/yecc.erl
the problem comes from delim/2 function called in
output_state_actions_fini/2
--8<-------------------------------------------------
snip... snip

output_state_actions_fini(State, St0) ->
    %% Backward compatible.
    St10 = delim(St0, false),
    St = fwrite(St10, <<"yeccpars2_~w(_, _, _, _, T, _, _) ->\n">>,
[State]),
    fwrite(St, <<" yeccerror(T).\n\n">>, []).

snip... snip

delim(St, true) ->
    St;
delim(St, false) ->
    fwrite(St, <<";\n">>, []).

snip... snip
--8<-------------------------------------------------
May be the delim/2 function should get 'true' as second argument,
but the global code is a bit hard to understand and I suppose
the author should be a better bugfixer... furtherover, I go in vacation
and won't have time to look at this ;>) .

comments indicates changes on yeccerror() in yecc.erl since 1.4,
parsetools-2.0.4 . May be this introduce this bug.

Helas, so far, I can't say if this bug is a consequence of my parser or not.

I get same error with R16B01 .

best regards.


From peppe@REDACTED  Tue Aug  6 17:47:19 2013
From: peppe@REDACTED (Peter Andersson)
Date: Tue, 6 Aug 2013 17:47:19 +0200
Subject: [erlang-bugs] common_test + test_server_io errors
In-Reply-To: <A8261DF1-D16C-4266-A17E-DDDEC7184B7D@gmail.com>
References: <5A095356-16E8-4224-BA03-AA58C30FC28D@gmail.com>
 <51CC4E35.8090408@erlang.org>
 <3B4CC6D8-4CB5-4236-AEDC-CBAF51A9E7AE@gmail.com>
 <CALhYyxM-56eNnXFco4=hQDmVtfZ3U2ac35R1mPqTwGKfL4FA2w@mail.gmail.com>
 <51DFB044.50302@erlang.org> <B20079A1-AB50-4507-AB58-9492E51232C6@gmail.com>
 <A8261DF1-D16C-4266-A17E-DDDEC7184B7D@gmail.com>
Message-ID: <52011A87.5080203@erlang.org>

Hi Tim,

A call to ct:log/2 when Common Test is not running (i.e. before or after
the execution of the ct_run program or the ct:run_test/1 function), is
ignored, both in R15 and R16.

For calls to ct:log/2 when Common Test is running, this is the change
from R15 to R16:

In R15:
If ct:log/2 is called on a test case process, or one that has inherited
the group leader from a test case process (typically one spawned from a
test case), the string is printed to the test case log. This is true if
the test case in question is still running. Otherwise the log file is no
longer open. And if the latter, Common Test will either print the string
to the current test case log instead, if one (and only one!) test case
is executing, otherwise print the string to the CT Framework log. This
means that in R15, printouts by means of ct:log/2 from processes unknown
to Common Test, end up either in a test case log or in the CT Framework
log.

In R16:
If ct:log/2 is called on a test case process, or one that has inherited
the group leader from a test case process, the string is printed to the
test case log (as in R15). If the test case in question is finished and
the log file has been closed, the string is printed to the "unexpected
i/o" log (via the test_server_io process) instead. Printouts by means of
ct:log/2 from processes unknown to Common Test, also always end up in
the unexpected i/o log.

Printouts from CT hook functions are "safe" in the sense that they
execute sequentially pre/post test suite/group/case execution. It's not
possible that a call to ct:log/2 from a hook function "comes in too soon
or late", i.e. gets handled before/after Common Test has
started/finished executing.

In general, if one knows that Common Test is running (which is the time
between the CT hook init and terminate call, or the start_logging and
stop_logging event message), it is safe to call ct:log/2 or ct:pal/2
from anywhere and find the data in either the test case logs or in the
unexpected i/o log (depending on the group leader setting). Printouts
that happen before or after the execution of a test suite, group or
case, end up in the unexpected i/o log.

The reason for the exit you reported initially, is that if a log call
happens during startup or shutdown of Common Test, then, during a short
window, it's possible that Common Test fails to communicate with Test
Server and crashes.

Before I try to answer your question below, I need to understand better
what you want to happen to the log printouts that take place during your
configuration/setup phase (before the test run starts) and/or during the
teardown phase (when Common Test has shut down). If Common Test - in an
offline mode (i.e. not running) - should attempt to write incoming
ct:log/2 strings to a file, the best it can do really, is to
write/append them to say a circular log file in the current working
directory. This is possible, but it will be difficult to know which
printouts belong to which test runs when analyzing the logs, and as far
as I understand, this is the sort of thing you're trying to avoid anyway.

As far as I see, it's quite possible to do something clever with log
printouts that happen *before* Common Test starts. They could be
buffered in a temporary file then read and resent by a CT hook init
function so that this data ends up first in the unexpected i/o log for
the test run. The problem here is what to do with printouts that happen
*after* Common Test has stopped but before your teardown is finished.
Another possibility could maybe be, if possible, to change the order of
the whole session so that Common Test is always started before your
configuration/setup (CT hooks can for example be added dynamically) and
not stopped until teardown is also finished. Perhaps an init and
terminate function in a high prio CT hook module could be used to
synchronize this. Sounds feasible to me.

Let me know if I understand your problem correctly and tell me what
ideas/requests you have and let's move on from there.

Best regards,
Peter

Ericsson AB, Erlang/OTP

Tim Watson wrote:
> So chaps, I've found the commit that altered the IO handling in
> test_server (in fact, the addition of test_server_io). To clarify,
> prior to the addition of test_server_io, calls to ct:log/2 (and
> friends, e.g., ct:pal/2 and so on) would succeed even if no test was
> running and end up being handled as if they resided in before/after
> suite and/or before/after testcase functions. Now it seems that I've
> got to vet all the processes that might end up calling ct:log/2
> (indirectly via my event manager) somehow, but there's no proper API
> to determine whether or not it is safe to do so. Having all my
> debug/info level testing framework logs emitted to the HTML files was
> a big reason for choosing common_test, so I'm loth to redirect them
> elsewhere. My code is basically doing lots of custom (data driven)
> setup/teardown before and after test suites and test cases, and even
> though some of this runs before (or during) the common_test test run
> is started, I *really* don't want to have to create yet another file
> location that needs to be inspected when tests fail. I'm also not keen
> on filling up stdio with lots of logging noise.
>
> Any ideas how I can work around this situation without shooting myself
> in the head/foot? ;)
>
> Cheers,
> Tim
>
> On 12 Jul 2013, at 08:36, Tim Watson wrote:
>
>> Hi Lukas, thanks for letting me know!
>>
>> Cheers,
>> Tim
>>
>> On 12 Jul 2013, at 08:29, Lukas Larsson <lukas@REDACTED
>> <mailto:lukas@REDACTED>> wrote:
>>
>>> Hello Tim,
>>>
>>> Peter is currently away enjoying the sunny summer here in Sweden.
>>> I'm sure he will get back to you when he comes back!
>>>
>>> Lukas
>>> On 12/07/13 02:03, Tim Watson wrote:
>>>> On 1 July 2013 10:25, Tim Watson <watson.timothy@REDACTED
>>>> <mailto:watson.timothy@REDACTED>> wrote:> We should try to rule
>>>> out that there's a bug that causes test_server
>>>>
>>>>     > failure.
>>>>
>>>>     How can I assist in verifying that?
>>>>
>>>>
>>>> Any more news on this? Is there anything more I can do to assist?
>>>>
>>>> Cheers,
>>>> Tim
>>>>
>>>>
>>>> _______________________________________________
>>>> erlang-bugs mailing list
>>>> erlang-bugs@REDACTED
>>>> http://erlang.org/mailman/listinfo/erlang-bugs
>>>>         
>>>
>


From ingela.anderton.andin@REDACTED  Wed Aug  7 15:04:27 2013
From: ingela.anderton.andin@REDACTED (Ingela Anderton Andin)
Date: Wed, 7 Aug 2013 15:04:27 +0200
Subject: [erlang-bugs] {header,1} inconsistency between TCP and SSL
In-Reply-To: <20130720032509.GF27534@hijacked.us>
References: <20130720032509.GF27534@hijacked.us>
Message-ID: <520245DB.6040602@erix.ericsson.se>

Hi!

Andrew Thompson wrote:
> Today I noticed a difference in behaviour of the {header, 1} option when
> using TCP and SSL in erlang releases R15B02 and newer:
>
> https://gist.github.com/Vagabond/dabecf53ac8b4317e51c
>
> As you can see, SSL in {header, 1} mode no longer includes the empty
> binary as the second element in the list.
>
> I believe this change was made in this commit:
>
> https://github.com/erlang/otp/commit/8f97b428eb8f2fb89c3f9ec348f577304b1b9131
>
> If you change that back, things work the same as TCP again, but all the
> header_decode tests in ssl_packet_SUITE start to fail.
>
> I'm simply going to stop using {header,1} and just use the bit syntax,
> since I notice that Ingela considers it to be a silly option, but I
> wanted to at least point the inconsistency out, for posterity.
>
>   
 
Thank you for pointing this out.  This is option is quite old and was 
invented before the bitsyntax.  Nowadays just using the bitsyntax is a 
better option.
The change was made to conform to how inet (e.i. gen_tcp) handles the 
header 1 option but alas it seems we fixed it in one way and brok it in 
another.
Some old things could have been better documented ;)

Regards Ingela Erlang/OTP team - Ericsson AB


From watson.timothy@REDACTED  Thu Aug  8 10:41:21 2013
From: watson.timothy@REDACTED (Tim Watson)
Date: Thu, 8 Aug 2013 09:41:21 +0100
Subject: [erlang-bugs] common_test + test_server_io errors
In-Reply-To: <52011A87.5080203@erlang.org>
References: <5A095356-16E8-4224-BA03-AA58C30FC28D@gmail.com>
 <51CC4E35.8090408@erlang.org>
 <3B4CC6D8-4CB5-4236-AEDC-CBAF51A9E7AE@gmail.com>
 <CALhYyxM-56eNnXFco4=hQDmVtfZ3U2ac35R1mPqTwGKfL4FA2w@mail.gmail.com>
 <51DFB044.50302@erlang.org> <B20079A1-AB50-4507-AB58-9492E51232C6@gmail.com>
 <A8261DF1-D16C-4266-A17E-DDDEC7184B7D@gmail.com>
 <52011A87.5080203@erlang.org>
Message-ID: <7413B698-86F6-472E-AFD6-E0034C41F885@gmail.com>

Hi Peter!

Now you've caught me on vacation! :)

I'll read through this properly ASAP and get back to you.

Thanks!

Tim

On 6 Aug 2013, at 16:47, Peter Andersson <peppe@REDACTED> wrote:

> Hi Tim,
> 
> A call to ct:log/2 when Common Test is not running (i.e. before or after
> the execution of the ct_run program or the ct:run_test/1 function), is
> ignored, both in R15 and R16.
> 
> For calls to ct:log/2 when Common Test is running, this is the change
> from R15 to R16:
> 
> In R15:
> If ct:log/2 is called on a test case process, or one that has inherited
> the group leader from a test case process (typically one spawned from a
> test case), the string is printed to the test case log. This is true if
> the test case in question is still running. Otherwise the log file is no
> longer open. And if the latter, Common Test will either print the string
> to the current test case log instead, if one (and only one!) test case
> is executing, otherwise print the string to the CT Framework log. This
> means that in R15, printouts by means of ct:log/2 from processes unknown
> to Common Test, end up either in a test case log or in the CT Framework
> log.
> 
> In R16:
> If ct:log/2 is called on a test case process, or one that has inherited
> the group leader from a test case process, the string is printed to the
> test case log (as in R15). If the test case in question is finished and
> the log file has been closed, the string is printed to the "unexpected
> i/o" log (via the test_server_io process) instead. Printouts by means of
> ct:log/2 from processes unknown to Common Test, also always end up in
> the unexpected i/o log.
> 
> Printouts from CT hook functions are "safe" in the sense that they
> execute sequentially pre/post test suite/group/case execution. It's not
> possible that a call to ct:log/2 from a hook function "comes in too soon
> or late", i.e. gets handled before/after Common Test has
> started/finished executing.
> 
> In general, if one knows that Common Test is running (which is the time
> between the CT hook init and terminate call, or the start_logging and
> stop_logging event message), it is safe to call ct:log/2 or ct:pal/2
> from anywhere and find the data in either the test case logs or in the
> unexpected i/o log (depending on the group leader setting). Printouts
> that happen before or after the execution of a test suite, group or
> case, end up in the unexpected i/o log.
> 
> The reason for the exit you reported initially, is that if a log call
> happens during startup or shutdown of Common Test, then, during a short
> window, it's possible that Common Test fails to communicate with Test
> Server and crashes.
> 
> Before I try to answer your question below, I need to understand better
> what you want to happen to the log printouts that take place during your
> configuration/setup phase (before the test run starts) and/or during the
> teardown phase (when Common Test has shut down). If Common Test - in an
> offline mode (i.e. not running) - should attempt to write incoming
> ct:log/2 strings to a file, the best it can do really, is to
> write/append them to say a circular log file in the current working
> directory. This is possible, but it will be difficult to know which
> printouts belong to which test runs when analyzing the logs, and as far
> as I understand, this is the sort of thing you're trying to avoid anyway.
> 
> As far as I see, it's quite possible to do something clever with log
> printouts that happen *before* Common Test starts. They could be
> buffered in a temporary file then read and resent by a CT hook init
> function so that this data ends up first in the unexpected i/o log for
> the test run. The problem here is what to do with printouts that happen
> *after* Common Test has stopped but before your teardown is finished.
> Another possibility could maybe be, if possible, to change the order of
> the whole session so that Common Test is always started before your
> configuration/setup (CT hooks can for example be added dynamically) and
> not stopped until teardown is also finished. Perhaps an init and
> terminate function in a high prio CT hook module could be used to
> synchronize this. Sounds feasible to me.
> 
> Let me know if I understand your problem correctly and tell me what
> ideas/requests you have and let's move on from there.
> 
> Best regards,
> Peter
> 
> Ericsson AB, Erlang/OTP
> 
> Tim Watson wrote:
>> So chaps, I've found the commit that altered the IO handling in
>> test_server (in fact, the addition of test_server_io). To clarify,
>> prior to the addition of test_server_io, calls to ct:log/2 (and
>> friends, e.g., ct:pal/2 and so on) would succeed even if no test was
>> running and end up being handled as if they resided in before/after
>> suite and/or before/after testcase functions. Now it seems that I've
>> got to vet all the processes that might end up calling ct:log/2
>> (indirectly via my event manager) somehow, but there's no proper API
>> to determine whether or not it is safe to do so. Having all my
>> debug/info level testing framework logs emitted to the HTML files was
>> a big reason for choosing common_test, so I'm loth to redirect them
>> elsewhere. My code is basically doing lots of custom (data driven)
>> setup/teardown before and after test suites and test cases, and even
>> though some of this runs before (or during) the common_test test run
>> is started, I *really* don't want to have to create yet another file
>> location that needs to be inspected when tests fail. I'm also not keen
>> on filling up stdio with lots of logging noise.
>> 
>> Any ideas how I can work around this situation without shooting myself
>> in the head/foot? ;)
>> 
>> Cheers,
>> Tim
>> 
>> On 12 Jul 2013, at 08:36, Tim Watson wrote:
>> 
>>> Hi Lukas, thanks for letting me know!
>>> 
>>> Cheers,
>>> Tim
>>> 
>>> On 12 Jul 2013, at 08:29, Lukas Larsson <lukas@REDACTED
>>> <mailto:lukas@REDACTED>> wrote:
>>> 
>>>> Hello Tim,
>>>> 
>>>> Peter is currently away enjoying the sunny summer here in Sweden.
>>>> I'm sure he will get back to you when he comes back!
>>>> 
>>>> Lukas
>>>> On 12/07/13 02:03, Tim Watson wrote:
>>>>> On 1 July 2013 10:25, Tim Watson <watson.timothy@REDACTED
>>>>> <mailto:watson.timothy@REDACTED>> wrote:> We should try to rule
>>>>> out that there's a bug that causes test_server
>>>>> 
>>>>>> failure.
>>>>> 
>>>>>    How can I assist in verifying that?
>>>>> 
>>>>> 
>>>>> Any more news on this? Is there anything more I can do to assist?
>>>>> 
>>>>> Cheers,
>>>>> Tim
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> erlang-bugs mailing list
>>>>> erlang-bugs@REDACTED
>>>>> http://erlang.org/mailman/listinfo/erlang-bugs
>>>>> 
>>>> 
>> 
> 


From colanderman@REDACTED  Sat Aug 10 06:11:26 2013
From: colanderman@REDACTED (Chris King)
Date: Sat, 10 Aug 2013 00:11:26 -0400
Subject: [erlang-bugs] dialyzer false positive io_lib:fread
In-Reply-To: <op.wz7tvxelvksmfo@shuttle.squirrel>
References: <op.wz7tvxelvksmfo@shuttle.squirrel>
Message-ID: <op.w1k1xcisvksmfo@shuttle.squirrel>

Hi,

Re-sending this, as it seems (after a month checking the archive) that  
this mailing list silently rejects e-mails from non-subscribers?  (There's  
no mention of this behavior in the listinfo page  
http://erlang.org/mailman/listinfo/erlang-bugs.)


dialyzer produces a false positive when analyzing io_lib:fread with a ~a  
argument ? it believes (erroneously) that the parsed value will be a  
string, when in fact it will be an atom.  This does not occur with  
io:fread, or with io_lib:fread with an integer argument.

The below test program exemplifies this; dialyzer claims that bugged/1  
cannot return, when in fact calling bugged("foo") returns normally in the  
interpreter.

I would be glad to supply a patch but I haven't the slightest clue where  
to start looking (this seems like either an easy fix, in an "exceptions"  
list somewhere, or a complex fix deep inside dialyzer).


-module(dialyzer_bug).

-export([bugged/1, not_bugged1/0, not_bugged2/1]).

bugged(S) ->
       case io_lib:fread("~a", S) of
       {ok, [Atom], _} when is_atom(Atom) -> Atom
       end.

not_bugged1() ->
       case io:fread("foo", "~a") of
       {ok, [Atom]} when is_atom(Atom) -> Atom
       end.

not_bugged2(S) ->
       case io_lib:fread("~d", S) of
       {ok, [Integer], _} when is_integer(Integer) -> Integer
       end.


From kostis@REDACTED  Sun Aug 11 00:20:05 2013
From: kostis@REDACTED (Kostis Sagonas)
Date: Sun, 11 Aug 2013 01:20:05 +0300
Subject: [erlang-bugs] dialyzer false positive io_lib:fread
In-Reply-To: <op.w1k1xcisvksmfo@shuttle.squirrel>
References: <op.wz7tvxelvksmfo@shuttle.squirrel>
 <op.w1k1xcisvksmfo@shuttle.squirrel>
Message-ID: <5206BC95.9060902@cs.ntua.gr>

On 08/10/2013 07:11 AM, Chris King wrote:
> Hi,
>
> Re-sending this, as it seems (after a month checking the archive) that
> this mailing list silently rejects e-mails from non-subscribers?
> (There's no mention of this behavior in the listinfo page
> http://erlang.org/mailman/listinfo/erlang-bugs.)
>
>
> dialyzer produces a false positive when analyzing io_lib:fread with a ~a
> argument ? it believes (erroneously) that the parsed value will be a
> string, when in fact it will be an atom.  This does not occur with
> io:fread, or with io_lib:fread with an integer argument.
>
> The below test program exemplifies this; dialyzer claims that bugged/1
> cannot return, when in fact calling bugged("foo") returns normally in
> the interpreter.
>
> I would be glad to supply a patch but I haven't the slightest clue where
> to start looking (this seems like either an easy fix, in an "exceptions"
> list somewhere, or a complex fix deep inside dialyzer).

The behaviour you are experiencing is a side-effect of the type and spec 
declarations that exist in modules io_lib and io_lib_fread (*)

(*) Aside: is there a really good reason why io_lib_fread is a separate 
module with cyclic dependencies to io_lib and polluting the module name 
space, instead of being part of io_lib?

In io_lib, the fread/2 function is defined as:

fread(Chars, Format) ->
     io_lib_fread:fread(Chars, Format).

and in io_lib_fread the spec of fread/2 reads:

-spec fread(Format, String) -> Result when
       Format :: string(),
       String :: string(),
       Result :: {'ok', InputList :: io_lib:chars(), LeftOverChars :: 
string()}
       ...

where the io_lib:chars() type is defined as:

-type chars() :: [char() | chars()].

mentioning nowhere that the InputList also possibly contains atoms 
instead of just chars(), i.e. short integers.

Note that the fact that the spec of io_lib:fread/2 reads:

-spec fread(Format, String) -> Result when
       Format :: string(),
       String :: string(),
       Result :: {'ok', InputList :: [term()], LeftOverChars :: string()}
       ...

is irrelevant since dialyzer will take the strongest type information it 
infers when spec declarations are too loose.


Anyway, I am not sure whether the intention of the library developer is 
to document the possibility to return an atom list or not in that 
position, so no patch from me either.

Hope this helps someone at OTP to fix this, possibly also folding the 
io_lib_fread module into io_lib the process.

Kostis


From essen@REDACTED  Sun Aug 11 15:19:37 2013
From: essen@REDACTED (=?ISO-8859-1?Q?Lo=EFc_Hoguin?=)
Date: Sun, 11 Aug 2013 15:19:37 +0200
Subject: [erlang-bugs] [erlang-questions] Possibly bug in
	cth_log_redirect?
In-Reply-To: <CAMxVRxDBAbChHkytE8PqOnxHkC6kJXE=5DzY=yO0Z+XvMMVP1w@mail.gmail.com>
References: <CAMxVRxDBAbChHkytE8PqOnxHkC6kJXE=5DzY=yO0Z+XvMMVP1w@mail.gmail.com>
Message-ID: <52078F69.8050307@ninenines.eu>

On 08/09/2013 10:12 PM, Max Lapshin wrote:
> When tests are running in parallel, LogFun in cth_log_redirect is
> changed to ct_log:
>
> https://github.com/erlang/otp/blob/3021fca734f71f8bae966ab67f1400d37f8927bc/lib/common_test/src/cth_log_redirect.erl#L49
>
> Problem is that it must be not ct_log, but tc_log:
>
> https://github.com/erlang/otp/blob/3021fca734f71f8bae966ab67f1400d37f8927bc/lib/common_test/src/ct_logs.erl#L44

I am also hit by this issue.

Full error message:

=ERROR REPORT==== 11-Aug-2013::15:16:09 ===
** gen_event handler cth_log_redirect crashed.
** Was installed in error_logger
** Last event was: {error,<0.390.0>,
                           {emulator,"~s~n",
                                     ["Error in process <0.620.0> on 
node 'ct@REDACTED' with exit value: {{<<18 
bytes>>,{stacktrace,[{http_errors,handle,2,[{file,\"test/http_SUITE_data/http_errors.erl\"},{line,37}]},{cowboy_handler,handler_handle,4,[{file,\"src/cowboy_handler.erl\"},{line,115}]},{cowboy_protocol,execute,4,[{file,\"src/cowbo... 
\n"]}}
** When handler state == ct_log
** Reason == {'function not exported',
                  [{ct_logs,ct_log,
                       [error_logger,50,"System",
                        ["\n",61,"ERROR 
REPORT",61,61,61,61,32,"11",45,"Aug",
 
45,"2013",58,58,"17",58,"16",58,"09",32,61,61,61,"\n",
                         "Error in process <0.620.0> on node 
'ct@REDACTED' with exit value: {{<<18 
bytes>>,{stacktrace,[{http_errors,handle,2,[{file,\"test/http_SUITE_data/http_errors.erl\"},{line,37}]},{cowboy_handler,handler_handle,4,[{file,\"src/cowboy_handler.erl\"},{line,115}]},{cowboy_protocol,execute,4,[{file,\"src/cowbo... 
\n",
                         "\n"],
                        []],
                       []},
                   {cth_log_redirect,handle_event,2,
                       [{file,"cth_log_redirect.erl"},{line,91}]},
                   {gen_event,server_update,4,
                       [{file,"gen_event.erl"},{line,522}]},
                   {gen_event,server_notify,4,
                       [{file,"gen_event.erl"},{line,504}]},
 
{gen_event,handle_msg,5,[{file,"gen_event.erl"},{line,266}]},
                   {proc_lib,init_p_do_apply,3,
                       [{file,"proc_lib.erl"},{line,239}]}]}


-- 
Lo?c Hoguin
Erlang Cowboy
Nine Nines
http://ninenines.eu


From peppe@REDACTED  Mon Aug 12 10:20:35 2013
From: peppe@REDACTED (Peter Andersson)
Date: Mon, 12 Aug 2013 10:20:35 +0200
Subject: [erlang-bugs] [erlang-questions] Possibly bug in
	cth_log_redirect?
In-Reply-To: <52078F69.8050307@ninenines.eu>
References: <CAMxVRxDBAbChHkytE8PqOnxHkC6kJXE=5DzY=yO0Z+XvMMVP1w@mail.gmail.com>
 <52078F69.8050307@ninenines.eu>
Message-ID: <52089AD3.4080600@erlang.org>

Thanks for reporting this, guys! We'll fix it asap.

  /Peter

Ericsson AB, Erlang/OTP

Lo?c Hoguin wrote:
> On 08/09/2013 10:12 PM, Max Lapshin wrote:
> > When tests are running in parallel, LogFun in cth_log_redirect is
> > changed to ct_log:
> >
> > https://github.com/erlang/otp/blob/3021fca734f71f8bae966ab67f1400d37f8927bc/lib/common_test/src/cth_log_redirect.erl#L49
> >
> > Problem is that it must be not ct_log, but tc_log:
> >
> > https://github.com/erlang/otp/blob/3021fca734f71f8bae966ab67f1400d37f8927bc/lib/common_test/src/ct_logs.erl#L44
>
> I am also hit by this issue.
>
> Full error message:
>
> =ERROR REPORT==== 11-Aug-2013::15:16:09 ===
> ** gen_event handler cth_log_redirect crashed.
> ** Was installed in error_logger
> ** Last event was: {error,<0.390.0>,
>                            {emulator,"~s~n",
>                                      ["Error in process <0.620.0> on 
> node 'ct@REDACTED' with exit value: {{<<18 
> bytes>>,{stacktrace,[{http_errors,handle,2,[{file,\"test/http_SUITE_data/http_errors.erl\"},{line,37}]},{cowboy_handler,handler_handle,4,[{file,\"src/cowboy_handler.erl\"},{line,115}]},{cowboy_protocol,execute,4,[{file,\"src/cowbo... 
> \n"]}}
> ** When handler state == ct_log
> ** Reason == {'function not exported',
>                   [{ct_logs,ct_log,
>                        [error_logger,50,"System",
>                         ["\n",61,"ERROR 
> REPORT",61,61,61,61,32,"11",45,"Aug",
>  
> 45,"2013",58,58,"17",58,"16",58,"09",32,61,61,61,"\n",
>                          "Error in process <0.620.0> on node 
> 'ct@REDACTED' with exit value: {{<<18 
> bytes>>,{stacktrace,[{http_errors,handle,2,[{file,\"test/http_SUITE_data/http_errors.erl\"},{line,37}]},{cowboy_handler,handler_handle,4,[{file,\"src/cowboy_handler.erl\"},{line,115}]},{cowboy_protocol,execute,4,[{file,\"src/cowbo... 
> \n",
>                          "\n"],
>                         []],
>                        []},
>                    {cth_log_redirect,handle_event,2,
>                        [{file,"cth_log_redirect.erl"},{line,91}]},
>                    {gen_event,server_update,4,
>                        [{file,"gen_event.erl"},{line,522}]},
>                    {gen_event,server_notify,4,
>                        [{file,"gen_event.erl"},{line,504}]},
>  
> {gen_event,handle_msg,5,[{file,"gen_event.erl"},{line,266}]},
>                    {proc_lib,init_p_do_apply,3,
>                        [{file,"proc_lib.erl"},{line,239}]}]}
>
>
>   


From rr@REDACTED  Tue Aug 13 05:40:40 2013
From: rr@REDACTED (Rick Reed)
Date: Mon, 12 Aug 2013 20:40:40 -0700
Subject: [erlang-bugs] efile_drv & async thread key
Message-ID: <CA+SuFX3ek+JzUjx+3iHORdUkBTcL63OYTjEJ14PPEEPhmgvRCQ@mail.gmail.com>

It looks to me as though there's a bit of a problem in the way efile_drv.c
generates the
key that's used to select an async driver queue.  It uses the address of
the port which
on our system is 8-byte aligned.  Meanwhile, erl_async.c does a simple mod
operation
with the number of async threads, so the number of threads that can
actually be used
by file operations is 1/8th of the number configured.  I suspect this isn't
intended.

Rr
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130812/1494d549/attachment.htm>

From lukas@REDACTED  Tue Aug 13 09:40:55 2013
From: lukas@REDACTED (Lukas Larsson)
Date: Tue, 13 Aug 2013 09:40:55 +0200
Subject: [erlang-bugs] efile_drv & async thread key
In-Reply-To: <CA+SuFX3ek+JzUjx+3iHORdUkBTcL63OYTjEJ14PPEEPhmgvRCQ@mail.gmail.com>
References: <CA+SuFX3ek+JzUjx+3iHORdUkBTcL63OYTjEJ14PPEEPhmgvRCQ@mail.gmail.com>
Message-ID: <5209E307.6030806@erlang.org>

Hello Rick!

Which version of Erlang are you using? From R16B (I think), the 
ErlDrvPort datatype no longer is a pointer to the port struct. Instead 
it is the slot id into the port table and those ids should contain all 
values. I did a quick test on my computer running the latest on maint on 
github and seem to get a full spread over all async threads.

Lukas

On 13/08/13 05:40, Rick Reed wrote:
> It looks to me as though there's a bit of a problem in the way 
> efile_drv.c generates the
> key that's used to select an async driver queue.  It uses the address 
> of the port which
> on our system is 8-byte aligned.  Meanwhile, erl_async.c does a simple 
> mod operation
> with the number of async threads, so the number of threads that can 
> actually be used
> by file operations is 1/8th of the number configured.  I suspect this 
> isn't intended.
>
> Rr
>
>
>
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://erlang.org/mailman/listinfo/erlang-bugs

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130813/48dbad8a/attachment.htm>

From lukas@REDACTED  Tue Aug 13 10:05:01 2013
From: lukas@REDACTED (Lukas Larsson)
Date: Tue, 13 Aug 2013 10:05:01 +0200
Subject: [erlang-bugs] efile_drv & async thread key
In-Reply-To: <5209E307.6030806@erlang.org>
References: <CA+SuFX3ek+JzUjx+3iHORdUkBTcL63OYTjEJ14PPEEPhmgvRCQ@mail.gmail.com>
 <5209E307.6030806@erlang.org>
Message-ID: <5209E8AD.5000208@erlang.org>

Sigh, apparently I spoke too soon.

I remembered incorrectly about the change. It was in R16B that 
ErlDrvPort became a ptr and it was an id before R16B. Anyways, it is odd 
that the ptr is 8 bit aligned on you system. On mine (Ubuntu 13.04, 
x86_64) the ptrs are not aligned and the load is nicely distributed 
among async threads. If I remember correctly you are using FreeBSD on 
x86_64? I'll check if I can reproduce the behavior you are seeing on our 
FreeBSD machine.

Lukas

On 13/08/13 09:40, Lukas Larsson wrote:
> Hello Rick!
>
> Which version of Erlang are you using? From R16B (I think), the 
> ErlDrvPort datatype no longer is a pointer to the port struct. Instead 
> it is the slot id into the port table and those ids should contain all 
> values. I did a quick test on my computer running the latest on maint 
> on github and seem to get a full spread over all async threads.
>
> Lukas
>
> On 13/08/13 05:40, Rick Reed wrote:
>> It looks to me as though there's a bit of a problem in the way 
>> efile_drv.c generates the
>> key that's used to select an async driver queue.  It uses the address 
>> of the port which
>> on our system is 8-byte aligned.  Meanwhile, erl_async.c does a 
>> simple mod operation
>> with the number of async threads, so the number of threads that can 
>> actually be used
>> by file operations is 1/8th of the number configured.  I suspect this 
>> isn't intended.
>>
>> Rr
>>
>>
>>
>> _______________________________________________
>> erlang-bugs mailing list
>> erlang-bugs@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-bugs
>
>
>
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://erlang.org/mailman/listinfo/erlang-bugs

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130813/ba1df93e/attachment.htm>

From lukas@REDACTED  Tue Aug 13 13:52:19 2013
From: lukas@REDACTED (Lukas Larsson)
Date: Tue, 13 Aug 2013 13:52:19 +0200
Subject: [erlang-bugs] efile_drv & async thread key
In-Reply-To: <5209E8AD.5000208@erlang.org>
References: <CA+SuFX3ek+JzUjx+3iHORdUkBTcL63OYTjEJ14PPEEPhmgvRCQ@mail.gmail.com>
 <5209E307.6030806@erlang.org> <5209E8AD.5000208@erlang.org>
Message-ID: <520A1DF3.8050708@erlang.org>

And there it is, conclusive proof that I should not be debugging 
Rickard's code before lunch.

Found the issue, will create a fix for it. As a workaround for R16B you 
can use a prime number as the number of async threads :)

Lukas

On 13/08/13 10:05, Lukas Larsson wrote:
> Sigh, apparently I spoke too soon.
>
> I remembered incorrectly about the change. It was in R16B that 
> ErlDrvPort became a ptr and it was an id before R16B. Anyways, it is 
> odd that the ptr is 8 bit aligned on you system. On mine (Ubuntu 
> 13.04, x86_64) the ptrs are not aligned and the load is nicely 
> distributed among async threads. If I remember correctly you are using 
> FreeBSD on x86_64? I'll check if I can reproduce the behavior you are 
> seeing on our FreeBSD machine.
>
> Lukas
>
> On 13/08/13 09:40, Lukas Larsson wrote:
>> Hello Rick!
>>
>> Which version of Erlang are you using? From R16B (I think), the 
>> ErlDrvPort datatype no longer is a pointer to the port struct. 
>> Instead it is the slot id into the port table and those ids should 
>> contain all values. I did a quick test on my computer running the 
>> latest on maint on github and seem to get a full spread over all 
>> async threads.
>>
>> Lukas
>>
>> On 13/08/13 05:40, Rick Reed wrote:
>>> It looks to me as though there's a bit of a problem in the way 
>>> efile_drv.c generates the
>>> key that's used to select an async driver queue.  It uses the 
>>> address of the port which
>>> on our system is 8-byte aligned.  Meanwhile, erl_async.c does a 
>>> simple mod operation
>>> with the number of async threads, so the number of threads that can 
>>> actually be used
>>> by file operations is 1/8th of the number configured.  I suspect 
>>> this isn't intended.
>>>
>>> Rr
>>>
>>>
>>>
>>> _______________________________________________
>>> erlang-bugs mailing list
>>> erlang-bugs@REDACTED
>>> http://erlang.org/mailman/listinfo/erlang-bugs
>>
>>
>>
>> _______________________________________________
>> erlang-bugs mailing list
>> erlang-bugs@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-bugs
>
>
>
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://erlang.org/mailman/listinfo/erlang-bugs

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130813/e2d88b9a/attachment.htm>

From rr@REDACTED  Wed Aug 14 02:21:04 2013
From: rr@REDACTED (Rick Reed)
Date: Tue, 13 Aug 2013 17:21:04 -0700
Subject: [erlang-bugs] efile_drv & async thread key
In-Reply-To: <520A1DF3.8050708@erlang.org>
References: <CA+SuFX3ek+JzUjx+3iHORdUkBTcL63OYTjEJ14PPEEPhmgvRCQ@mail.gmail.com>
 <5209E307.6030806@erlang.org> <5209E8AD.5000208@erlang.org>
 <520A1DF3.8050708@erlang.org>
Message-ID: <CA+SuFX1do2P9oX1o=gCvSBpc7NpZxU8GjM8Fdv=kWOsTcNXnuQ@mail.gmail.com>

I assume the reason for keying the file requests is to prevent a single
port from
soaking up all the async threads?

Rr


On Tue, Aug 13, 2013 at 4:52 AM, Lukas Larsson <lukas@REDACTED> wrote:

>  And there it is, conclusive proof that I should not be debugging
> Rickard's code before lunch.
>
> Found the issue, will create a fix for it. As a workaround for R16B you
> can use a prime number as the number of async threads :)
>
> Lukas
>
>
> On 13/08/13 10:05, Lukas Larsson wrote:
>
> Sigh, apparently I spoke too soon.
>
> I remembered incorrectly about the change. It was in R16B that ErlDrvPort
> became a ptr and it was an id before R16B. Anyways, it is odd that the ptr
> is 8 bit aligned on you system. On mine (Ubuntu 13.04, x86_64) the ptrs are
> not aligned and the load is nicely distributed among async threads. If I
> remember correctly you are using FreeBSD on x86_64? I'll check if I can
> reproduce the behavior you are seeing on our FreeBSD machine.
>
> Lukas
>
> On 13/08/13 09:40, Lukas Larsson wrote:
>
> Hello Rick!
>
> Which version of Erlang are you using? From R16B (I think), the ErlDrvPort
> datatype no longer is a pointer to the port struct. Instead it is the slot
> id into the port table and those ids should contain all values. I did a
> quick test on my computer running the latest on maint on github and seem to
> get a full spread over all async threads.
>
> Lukas
>
> On 13/08/13 05:40, Rick Reed wrote:
>
> It looks to me as though there's a bit of a problem in the way efile_drv.c
> generates the
> key that's used to select an async driver queue.  It uses the address of
> the port which
> on our system is 8-byte aligned.  Meanwhile, erl_async.c does a simple mod
> operation
> with the number of async threads, so the number of threads that can
> actually be used
> by file operations is 1/8th of the number configured.  I suspect this
> isn't intended.
>
>  Rr
>
>
>
> _______________________________________________
> erlang-bugs mailing listerlang-bugs@REDACTED://erlang.org/mailman/listinfo/erlang-bugs
>
>
>
>
> _______________________________________________
> erlang-bugs mailing listerlang-bugs@REDACTED://erlang.org/mailman/listinfo/erlang-bugs
>
>
>
>
> _______________________________________________
> erlang-bugs mailing listerlang-bugs@REDACTED://erlang.org/mailman/listinfo/erlang-bugs
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130813/e27be08a/attachment.htm>

From james@REDACTED  Tue Aug 13 14:03:01 2013
From: james@REDACTED (James Wheare)
Date: Tue, 13 Aug 2013 13:03:01 +0100
Subject: [erlang-bugs] Binary memory reuse issue in
	unicode:characters_to_list
Message-ID: <CACBTDBOVX=GxPftk7SyasQ4bFU7pVU-PqTt2Q73Wzx0StiP3MQ@mail.gmail.com>

Just found this extremely unexpected behaviour when using binary
pattern matching and unicode:characters_to_list

http://pastebin.com/7EYEhu0Z

Given a 2 byte binary, e.g. <<65,128>> (65 = letter "A", 128 = invalid
standalone utf8 byte)

<<Char:8,Rest/binary>> = <<65,128>>,
Char = 65,
Rest = <<128>>.

unicode:characters_to_list(Rest) should error, with {error, [],
<<128>>} but instead is giving {error, [], "A"}

unicode:characters_to_list(<<128>>) produces the desired result even
though it should be identical.

Making a copy will also give the desired result:
Rest2 = <<Rest/binary>>,
unicode:characters_to_list(Rest).

Is this related to binary optimisations detailed here?
http://www.erlang.org/doc/efficiency_guide/binaryhandling.html

Seems like a bug in the unicode nif.

Note that it's not reproducing on all environments, even given the
same erlang version. Even 2 identical linux vms running under
virtualbox but on 2 separate host machines produced different results
(one showed the bug, one didn't)


From pan@REDACTED  Wed Aug 14 10:29:38 2013
From: pan@REDACTED (Patrik Nyblom)
Date: Wed, 14 Aug 2013 10:29:38 +0200
Subject: [erlang-bugs] Binary memory reuse issue in
	unicode:characters_to_list
In-Reply-To: <CACBTDBOVX=GxPftk7SyasQ4bFU7pVU-PqTt2Q73Wzx0StiP3MQ@mail.gmail.com>
References: <CACBTDBOVX=GxPftk7SyasQ4bFU7pVU-PqTt2Q73Wzx0StiP3MQ@mail.gmail.com>
Message-ID: <520B3FF2.2020607@erlang.org>

Hi!

This bug was fixed in the latest release. See 
https://github.com/erlang/otp/commit/0ebffb2b55bd1870bfbe0ea47aa94017d7917084 
for details.

Cheers,
Patrik

On 08/13/2013 02:03 PM, James Wheare wrote:
> Just found this extremely unexpected behaviour when using binary
> pattern matching and unicode:characters_to_list
>
> http://pastebin.com/7EYEhu0Z
>
> Given a 2 byte binary, e.g. <<65,128>> (65 = letter "A", 128 = invalid
> standalone utf8 byte)
>
> <<Char:8,Rest/binary>> = <<65,128>>,
> Char = 65,
> Rest = <<128>>.
>
> unicode:characters_to_list(Rest) should error, with {error, [],
> <<128>>} but instead is giving {error, [], "A"}
>
> unicode:characters_to_list(<<128>>) produces the desired result even
> though it should be identical.
>
> Making a copy will also give the desired result:
> Rest2 = <<Rest/binary>>,
> unicode:characters_to_list(Rest).
>
> Is this related to binary optimisations detailed here?
> http://www.erlang.org/doc/efficiency_guide/binaryhandling.html
>
> Seems like a bug in the unicode nif.
>
> Note that it's not reproducing on all environments, even given the
> same erlang version. Even 2 identical linux vms running under
> virtualbox but on 2 separate host machines produced different results
> (one showed the bug, one didn't)
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://erlang.org/mailman/listinfo/erlang-bugs


From pan@REDACTED  Wed Aug 14 10:36:34 2013
From: pan@REDACTED (Patrik Nyblom)
Date: Wed, 14 Aug 2013 10:36:34 +0200
Subject: [erlang-bugs] efile_drv & async thread key
In-Reply-To: <CA+SuFX1do2P9oX1o=gCvSBpc7NpZxU8GjM8Fdv=kWOsTcNXnuQ@mail.gmail.com>
References: <CA+SuFX3ek+JzUjx+3iHORdUkBTcL63OYTjEJ14PPEEPhmgvRCQ@mail.gmail.com>
 <5209E307.6030806@erlang.org> <5209E8AD.5000208@erlang.org>
 <520A1DF3.8050708@erlang.org>
 <CA+SuFX1do2P9oX1o=gCvSBpc7NpZxU8GjM8Fdv=kWOsTcNXnuQ@mail.gmail.com>
Message-ID: <520B4192.2060003@erlang.org>

Hi Rick!

On 08/14/2013 02:21 AM, Rick Reed wrote:
> I assume the reason for keying the file requests is to prevent a 
> single port from
> soaking up all the async threads?
Yes, and it's also important that requests for the same file 
"descriptor" end up in she same async queue. So we need to store a fixed 
key in the file descriptor structure.

I think I will hash the pointer to create the key, not just shift away 
the "zero-bits", you never know which icky patterns an allocator can 
create that will distribute the jobs unevenly. The key will only be 
calculated upon opening, so there will be minimal performance hit due to 
the more complicated calculations.

Thanks for reporting - this could cause severe performance issues in 
applications!

Cheers,
Patrik
>
> Rr
>
>
> On Tue, Aug 13, 2013 at 4:52 AM, Lukas Larsson <lukas@REDACTED 
> <mailto:lukas@REDACTED>> wrote:
>
>     And there it is, conclusive proof that I should not be debugging
>     Rickard's code before lunch.
>
>     Found the issue, will create a fix for it. As a workaround for
>     R16B you can use a prime number as the number of async threads :)
>
>     Lukas
>
>
>     On 13/08/13 10:05, Lukas Larsson wrote:
>>     Sigh, apparently I spoke too soon.
>>
>>     I remembered incorrectly about the change. It was in R16B that
>>     ErlDrvPort became a ptr and it was an id before R16B. Anyways, it
>>     is odd that the ptr is 8 bit aligned on you system. On mine
>>     (Ubuntu 13.04, x86_64) the ptrs are not aligned and the load is
>>     nicely distributed among async threads. If I remember correctly
>>     you are using FreeBSD on x86_64? I'll check if I can reproduce
>>     the behavior you are seeing on our FreeBSD machine.
>>
>>     Lukas
>>
>>     On 13/08/13 09:40, Lukas Larsson wrote:
>>>     Hello Rick!
>>>
>>>     Which version of Erlang are you using? From R16B (I think), the
>>>     ErlDrvPort datatype no longer is a pointer to the port struct.
>>>     Instead it is the slot id into the port table and those ids
>>>     should contain all values. I did a quick test on my computer
>>>     running the latest on maint on github and seem to get a full
>>>     spread over all async threads.
>>>
>>>     Lukas
>>>
>>>     On 13/08/13 05:40, Rick Reed wrote:
>>>>     It looks to me as though there's a bit of a problem in the way
>>>>     efile_drv.c generates the
>>>>     key that's used to select an async driver queue.  It uses the
>>>>     address of the port which
>>>>     on our system is 8-byte aligned.  Meanwhile, erl_async.c does a
>>>>     simple mod operation
>>>>     with the number of async threads, so the number of threads that
>>>>     can actually be used
>>>>     by file operations is 1/8th of the number configured.  I
>>>>     suspect this isn't intended.
>>>>
>>>>     Rr
>>>>
>>>>
>>>>
>>>>     _______________________________________________
>>>>     erlang-bugs mailing list
>>>>     erlang-bugs@REDACTED  <mailto:erlang-bugs@REDACTED>
>>>>     http://erlang.org/mailman/listinfo/erlang-bugs
>>>
>>>
>>>
>>>     _______________________________________________
>>>     erlang-bugs mailing list
>>>     erlang-bugs@REDACTED  <mailto:erlang-bugs@REDACTED>
>>>     http://erlang.org/mailman/listinfo/erlang-bugs
>>
>>
>>
>>     _______________________________________________
>>     erlang-bugs mailing list
>>     erlang-bugs@REDACTED  <mailto:erlang-bugs@REDACTED>
>>     http://erlang.org/mailman/listinfo/erlang-bugs
>
>
>
>
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://erlang.org/mailman/listinfo/erlang-bugs

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130814/d1fc578d/attachment.htm>

From watson.timothy@REDACTED  Wed Aug 14 11:58:06 2013
From: watson.timothy@REDACTED (Tim Watson)
Date: Wed, 14 Aug 2013 10:58:06 +0100
Subject: [erlang-bugs] common_test + test_server_io errors
In-Reply-To: <52011A87.5080203@erlang.org>
References: <5A095356-16E8-4224-BA03-AA58C30FC28D@gmail.com>
 <51CC4E35.8090408@erlang.org>
 <3B4CC6D8-4CB5-4236-AEDC-CBAF51A9E7AE@gmail.com>
 <CALhYyxM-56eNnXFco4=hQDmVtfZ3U2ac35R1mPqTwGKfL4FA2w@mail.gmail.com>
 <51DFB044.50302@erlang.org>
 <B20079A1-AB50-4507-AB58-9492E51232C6@gmail.com>
 <A8261DF1-D16C-4266-A17E-DDDEC7184B7D@gmail.com>
 <52011A87.5080203@erlang.org>
Message-ID: <CALhYyxNJcFMj2Gf6h3RqRTvj1jMX35g3p3HhDv15Y4WDmRQpMg@mail.gmail.com>

Hi Peter,


Thanks for getting back to me with this. Now I've had chance to look at it,
and to experiment with some different options, I can explain a bit more
about what I think is going on.

On 6 August 2013 16:47, Peter Andersson <peppe@REDACTED> wrote:

> Printouts from CT hook functions are "safe" in the sense that they
> execute sequentially pre/post test suite/group/case execution. It's not
> possible that a call to ct:log/2 from a hook function "comes in too soon
> or late", i.e. gets handled before/after Common Test has
> started/finished executing.
>
>
<snip>


> In general, if one knows that Common Test is running (which is the time
> between the CT hook init and terminate call, or the start_logging and
> stop_logging event message), it is safe to call ct:log/2 or ct:pal/2
> from anywhere and find the data in either the test case logs or in the
> unexpected i/o log (depending on the group leader setting).
>
>
<snip>


> The reason for the exit you reported initially, is that if a log call
> happens during startup or shutdown of Common Test, then, during a short
> window, it's possible that Common Test fails to communicate with Test
> Server and crashes.
>
>
So I've definitely been hitting a race condition in my code here. I tried
adding/removing the event hander that routes logging messages to
ct:{log,pal}/2 around the {start,stop}_logging events, however that didn't
help at all. In my ct_hook module, the call that triggers this explosion
occurs inside the init/1 function:

%% from systest_cth.erl

init(systest, Opts) ->
    case application:start(systest, permanent) of
        {error, {already_started, systest}} -> systest:reset();
        {error, _Reason}=Err                -> Err;
        ok                                  -> ok
    end,
    etc ....

%% from systest.erl

reset() ->
    %% both these operations are synchronous
    systest_watchdog:reset(),
    systest_results:reset(),
    ok.

%% systest_watchdog.erl

reset() ->
    gen_server:call(?MODULE, reset).


Further down in systest_watchdog, the handle_call/3 clause that deals with
the 'reset' signal ends up calling the systest_log, which leads to a
gen_event handler calling ct:pal/2.

According to what you've said above, about the timing of logging attempts
within the bounds of a ct_hook's init/terminate functions, that seems like
it should work shouldn't it? Attempting to trigger the logging by only
(de)registering the ct:pal handler in response to the {start,stop}_logging
events didn't though. I guess there must be another potential race there -
however I haven't attempted to do that via the hook's init/terminate
functions - I'll try that now and let you know if it resolves my issues.

Before I try to answer your question below, I need to understand better
> what you want to happen to the log printouts that take place during your
> configuration/setup phase (before the test run starts) and/or during the
> teardown phase (when Common Test has shut down).


I think/hope that all my setup/teardown is synchronous with regards each
test run - that is to say I allow for parallel test cases within suites and
I don't actually return from the hook's systest_cth:stop/4 callback until
all the resources configured at that scope (be it suite, group, testcase)
have been killed and reported (via monitors) as dead.

If Common Test - in an
> offline mode (i.e. not running) - should attempt to write incoming
> ct:log/2 strings to a file, the best it can do really, is to
> write/append them to say a circular log file in the current working
> directory. This is possible, but it will be difficult to know which
> printouts belong to which test runs when analyzing the logs, and as far
> as I understand, this is the sort of thing you're trying to avoid anyway.
>
>
Indeed. As a temporary work-around, I've stopped logging internal events to
the common_test logs and put them in a separate log file instead, but
that's not really what I wanted to end up with.


> As far as I see, it's quite possible to do something clever with log
> printouts that happen *before* Common Test starts. They could be
> buffered in a temporary file then read and resent by a CT hook init
> function so that this data ends up first in the unexpected i/o log for
> the test run. The problem here is what to do with printouts that happen
> *after* Common Test has stopped but before your teardown is finished.
> Another possibility could maybe be, if possible, to change the order of
> the whole session so that Common Test is always started before your
> configuration/setup (CT hooks can for example be added dynamically) and
> not stopped until teardown is also finished. Perhaps an init and
> terminate function in a high prio CT hook module could be used to
> synchronize this. Sounds feasible to me.
>
> Let me know if I understand your problem correctly and tell me what
> ideas/requests you have and let's move on from there.
>
>
I'm going to try again with a high prio hook to turn on/off the logging
handler and see if this works. By all accounts it sounds as if it should.
The other thing I might do to alleviate this problem is swallow any
exception from ct:pal/2, which is *cough* bad form as a rule, but in this
case might actually be the right thing to do.

Let me experiment with those two options first and get back to you. It
might be that all I end up asking for is a bit more info in the
documentation explaining the constraints.

I'll try to post back later today.

Cheers,
Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130814/56b96a47/attachment.htm>

From watson.timothy@REDACTED  Wed Aug 14 13:09:41 2013
From: watson.timothy@REDACTED (Tim Watson)
Date: Wed, 14 Aug 2013 12:09:41 +0100
Subject: [erlang-bugs] common_test + test_server_io errors
In-Reply-To: <CALhYyxNJcFMj2Gf6h3RqRTvj1jMX35g3p3HhDv15Y4WDmRQpMg@mail.gmail.com>
References: <5A095356-16E8-4224-BA03-AA58C30FC28D@gmail.com>
 <51CC4E35.8090408@erlang.org>
 <3B4CC6D8-4CB5-4236-AEDC-CBAF51A9E7AE@gmail.com>
 <CALhYyxM-56eNnXFco4=hQDmVtfZ3U2ac35R1mPqTwGKfL4FA2w@mail.gmail.com>
 <51DFB044.50302@erlang.org>
 <B20079A1-AB50-4507-AB58-9492E51232C6@gmail.com>
 <A8261DF1-D16C-4266-A17E-DDDEC7184B7D@gmail.com>
 <52011A87.5080203@erlang.org>
 <CALhYyxNJcFMj2Gf6h3RqRTvj1jMX35g3p3HhDv15Y4WDmRQpMg@mail.gmail.com>
Message-ID: <CALhYyxOd4MrJM4Ma9Trou+HBY2_1y-d9GdDSYNEFhz4=RXrPAQ@mail.gmail.com>

Hi Peter,

"In general, if one knows that Common Test is running (which is the time
between the CT hook init and terminate call, or the start_logging and
stop_logging event message), it is safe to call ct:log/2 or ct:pal/2
from anywhere and find the data in either the test case logs or in the
unexpected i/o log (depending on the group leader setting)."

After some initial testing, I can confirm that this does not work as
expected. I changed my code to only add the event (log) handler which
executes ct:pal/2 inside the hook's init/2 callback, however the problem
still persists! My hook and the corresponding call chain now looks like
this:

%% systest_cth hook

init(systest, Opts) ->
    case application:start(systest, permanent) of
        {error, {already_started, systest}} -> io:format("starting ct
log!~n"),
                                               systest_ct_log:start(),
                                               systest:reset();
        {error, _Reason}=Err                -> Err;
        ok                                  -> ok
    end,
    etc ....

%% systest_ct_log:start

start() ->
    ok = systest_log:start(ct, systest_ct_log, common_test).

%% systest_log:start/3

start(Id, Mod, Output) ->
    gen_event:add_handler(systest_event_log, {?MODULE, Id}, [Id, Mod,
Output]).

%% systest_ct_log:write_log/4

write_log(EvId, _Fd, What, Args) ->
    ct:log("[" ++ as_string(EvId) ++ "] " ++ as_string(What), Args).


When I execute a test run with this code in place however, I still get the
crash, though the io:format/2 notice that I'm starting the ct log appears
first:

Common Test starting (cwd is
/home/t4/work/vmware/rabbitmq-public-umbrella/rabbitmq-test/multi-node)

starting ct log!


ct_util_server got EXIT from <0.61.0>: {noproc,
                                        {gen_server,call,
                                         [test_server_io,
                                          {print,xxxFrom,unexpected_io,
                                           [[[["<div
class=\"default\"><b>*** User 2013-08-14 12:02:36.830 ***</b>"],
                                              "\n",

[91,102,114,97,109,101,119,111,
                                               114,107,93,32,119,97,116,99,

104,100,111,103,58,32,110,111,
                                               32,112,114,111,99,115,32,116,

111,32,107,105,108,108,"\n"]],
                                             "\n","</div>"]]},
                                          infinity]}}


So it appears that the assertion that logging will work between the hook's
init and terminate callbacks isn't quite working.


On 14 August 2013 10:58, Tim Watson <watson.timothy@REDACTED> wrote:

> Hi Peter,
>
>
> Thanks for getting back to me with this. Now I've had chance to look at
> it, and to experiment with some different options, I can explain a bit more
> about what I think is going on.
>
> On 6 August 2013 16:47, Peter Andersson <peppe@REDACTED> wrote:
>
>> Printouts from CT hook functions are "safe" in the sense that they
>> execute sequentially pre/post test suite/group/case execution. It's not
>> possible that a call to ct:log/2 from a hook function "comes in too soon
>> or late", i.e. gets handled before/after Common Test has
>> started/finished executing.
>>
>>
> <snip>
>
>
>> In general, if one knows that Common Test is running (which is the time
>> between the CT hook init and terminate call, or the start_logging and
>> stop_logging event message), it is safe to call ct:log/2 or ct:pal/2
>> from anywhere and find the data in either the test case logs or in the
>> unexpected i/o log (depending on the group leader setting).
>>
>>
> <snip>
>
>
>> The reason for the exit you reported initially, is that if a log call
>> happens during startup or shutdown of Common Test, then, during a short
>> window, it's possible that Common Test fails to communicate with Test
>> Server and crashes.
>>
>>
> So I've definitely been hitting a race condition in my code here. I tried
> adding/removing the event hander that routes logging messages to
> ct:{log,pal}/2 around the {start,stop}_logging events, however that didn't
> help at all. In my ct_hook module, the call that triggers this explosion
> occurs inside the init/1 function:
>
> %% from systest_cth.erl
>
> init(systest, Opts) ->
>     case application:start(systest, permanent) of
>         {error, {already_started, systest}} -> systest:reset();
>         {error, _Reason}=Err                -> Err;
>         ok                                  -> ok
>     end,
>     etc ....
>
> %% from systest.erl
>
> reset() ->
>     %% both these operations are synchronous
>     systest_watchdog:reset(),
>     systest_results:reset(),
>     ok.
>
> %% systest_watchdog.erl
>
> reset() ->
>     gen_server:call(?MODULE, reset).
>
>
> Further down in systest_watchdog, the handle_call/3 clause that deals with
> the 'reset' signal ends up calling the systest_log, which leads to a
> gen_event handler calling ct:pal/2.
>
> According to what you've said above, about the timing of logging attempts
> within the bounds of a ct_hook's init/terminate functions, that seems like
> it should work shouldn't it? Attempting to trigger the logging by only
> (de)registering the ct:pal handler in response to the {start,stop}_logging
> events didn't though. I guess there must be another potential race there -
> however I haven't attempted to do that via the hook's init/terminate
> functions - I'll try that now and let you know if it resolves my issues.
>
> Before I try to answer your question below, I need to understand better
>> what you want to happen to the log printouts that take place during your
>> configuration/setup phase (before the test run starts) and/or during the
>> teardown phase (when Common Test has shut down).
>
>
> I think/hope that all my setup/teardown is synchronous with regards each
> test run - that is to say I allow for parallel test cases within suites and
> I don't actually return from the hook's systest_cth:stop/4 callback until
> all the resources configured at that scope (be it suite, group, testcase)
> have been killed and reported (via monitors) as dead.
>
> If Common Test - in an
>> offline mode (i.e. not running) - should attempt to write incoming
>> ct:log/2 strings to a file, the best it can do really, is to
>> write/append them to say a circular log file in the current working
>> directory. This is possible, but it will be difficult to know which
>> printouts belong to which test runs when analyzing the logs, and as far
>> as I understand, this is the sort of thing you're trying to avoid anyway.
>>
>>
> Indeed. As a temporary work-around, I've stopped logging internal events
> to the common_test logs and put them in a separate log file instead, but
> that's not really what I wanted to end up with.
>
>
>> As far as I see, it's quite possible to do something clever with log
>> printouts that happen *before* Common Test starts. They could be
>> buffered in a temporary file then read and resent by a CT hook init
>> function so that this data ends up first in the unexpected i/o log for
>> the test run. The problem here is what to do with printouts that happen
>> *after* Common Test has stopped but before your teardown is finished.
>> Another possibility could maybe be, if possible, to change the order of
>> the whole session so that Common Test is always started before your
>> configuration/setup (CT hooks can for example be added dynamically) and
>> not stopped until teardown is also finished. Perhaps an init and
>> terminate function in a high prio CT hook module could be used to
>> synchronize this. Sounds feasible to me.
>>
>> Let me know if I understand your problem correctly and tell me what
>> ideas/requests you have and let's move on from there.
>>
>>
> I'm going to try again with a high prio hook to turn on/off the logging
> handler and see if this works. By all accounts it sounds as if it should.
> The other thing I might do to alleviate this problem is swallow any
> exception from ct:pal/2, which is *cough* bad form as a rule, but in this
> case might actually be the right thing to do.
>
> Let me experiment with those two options first and get back to you. It
> might be that all I end up asking for is a bit more info in the
> documentation explaining the constraints.
>
> I'll try to post back later today.
>
> Cheers,
> Tim
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130814/68a24a37/attachment.htm>

From watson.timothy@REDACTED  Wed Aug 14 13:12:47 2013
From: watson.timothy@REDACTED (Tim Watson)
Date: Wed, 14 Aug 2013 12:12:47 +0100
Subject: [erlang-bugs] common_test + test_server_io errors
In-Reply-To: <CALhYyxOd4MrJM4Ma9Trou+HBY2_1y-d9GdDSYNEFhz4=RXrPAQ@mail.gmail.com>
References: <5A095356-16E8-4224-BA03-AA58C30FC28D@gmail.com>
 <51CC4E35.8090408@erlang.org>
 <3B4CC6D8-4CB5-4236-AEDC-CBAF51A9E7AE@gmail.com>
 <CALhYyxM-56eNnXFco4=hQDmVtfZ3U2ac35R1mPqTwGKfL4FA2w@mail.gmail.com>
 <51DFB044.50302@erlang.org>
 <B20079A1-AB50-4507-AB58-9492E51232C6@gmail.com>
 <A8261DF1-D16C-4266-A17E-DDDEC7184B7D@gmail.com>
 <52011A87.5080203@erlang.org>
 <CALhYyxNJcFMj2Gf6h3RqRTvj1jMX35g3p3HhDv15Y4WDmRQpMg@mail.gmail.com>
 <CALhYyxOd4MrJM4Ma9Trou+HBY2_1y-d9GdDSYNEFhz4=RXrPAQ@mail.gmail.com>
Message-ID: <CALhYyxMVr8vUxxXg_ZzxsLLn1D4sZ1UAsWED1WsRCGsUwgR9Sg@mail.gmail.com>

On 14 August 2013 12:09, Tim Watson <watson.timothy@REDACTED> wrote:

> When I execute a test run with this code in place however, I still get the
> crash, though the io:format/2 notice that I'm starting the ct log appears
> first:
>
> Common Test starting (cwd is
> /home/t4/work/vmware/rabbitmq-public-umbrella/rabbitmq-test/multi-node)
>
> starting ct log!
>
>
> ct_util_server got EXIT from <0.61.0>: {noproc,
>                                         {gen_server,call,
>                                          [test_server_io,
>                                           {print,xxxFrom,unexpected_io,
>                                            [[[["<div
> class=\"default\"><b>*** User 2013-08-14 12:02:36.830 ***</b>"],
>
>                                               "\n",
>
> [91,102,114,97,109,101,119,111,
>                                                114,107,93,32,119,97,116,99,
>
> 104,100,111,103,58,32,110,111,
>
> 32,112,114,111,99,115,32,116,
>
> 111,32,107,105,108,108,"\n"]],
>                                              "\n","</div>"]]},
>                                           infinity]}}
>
>
> So it appears that the assertion that logging will work between the hook's
> init and terminate callbacks isn't quite working.
>
>
Oh and I've tried pausing between the systest_ct_log:start/0 call and the
(latter) systest:reset/0 call that triggers the logging, but that didn't
make any difference either - e.g., like so:

init(systest, Opts) ->
    case application:start(systest, permanent) of
        {error, {already_started, systest}} -> io:format("starting ct
log!~n"),
                                               systest_ct_log:start(),
                                               receive
                                                   foobar -> ok
                                               after 2000 -> ok
                                               end,
                                               systest:reset();
        {error, _Reason}=Err                -> Err;
        ok                                  -> ok
    end,
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130814/9c6196ae/attachment.htm>

From watson.timothy@REDACTED  Wed Aug 14 13:44:35 2013
From: watson.timothy@REDACTED (Tim Watson)
Date: Wed, 14 Aug 2013 12:44:35 +0100
Subject: [erlang-bugs] common_test + test_server_io errors
In-Reply-To: <Pine.LNX.4.64.1308141324270.20836@ancalagon.otp.ericsson.se>
References: <5A095356-16E8-4224-BA03-AA58C30FC28D@gmail.com>
 <51CC4E35.8090408@erlang.org>
 <3B4CC6D8-4CB5-4236-AEDC-CBAF51A9E7AE@gmail.com>
 <CALhYyxM-56eNnXFco4=hQDmVtfZ3U2ac35R1mPqTwGKfL4FA2w@mail.gmail.com>
 <51DFB044.50302@erlang.org>
 <B20079A1-AB50-4507-AB58-9492E51232C6@gmail.com>
 <A8261DF1-D16C-4266-A17E-DDDEC7184B7D@gmail.com>
 <52011A87.5080203@erlang.org>
 <CALhYyxNJcFMj2Gf6h3RqRTvj1jMX35g3p3HhDv15Y4WDmRQpMg@mail.gmail.com>
 <CALhYyxOd4MrJM4Ma9Trou+HBY2_1y-d9GdDSYNEFhz4=RXrPAQ@mail.gmail.com>
 <CALhYyxMVr8vUxxXg_ZzxsLLn1D4sZ1UAsWED1WsRCGsUwgR9Sg@mail.gmail.com>
 <Pine.LNX.4.64.1308141324270.20836@ancalagon.otp.ericsson.se>
Message-ID: <CALhYyxOta2xndMSsTq-DPyYOr5KHqy0CgGOBKMxJLVb6xoKWwg@mail.gmail.com>

Hi Peter,

Ok that's great - thanks for your assistance!

Cheers,
Tim


On 14 August 2013 12:36, Peter Andersson <peter.e.andersson@REDACTED>wrote:

>
> Hi Tim,
>
> Thanks for all the useful info!
>
> I haven't actually run any tests on this myself, only read some code so
> far. Obviously the init and terminate hook functions get called before the
> test server process is even started. In other words, these functions
> actually execute in that short "evil" window during startup when you can't
> call pal/2 or log/2. I missed that. :-( Sorry for misleading you!
>
> Let me dig into this properly and get back to you when I can propose
> useful (tested!) solutions to your problems!
>
> Best,
> Peter
>
> Ericsson AB, Erlang/OTP
>
>
>
> On Wed, 14 Aug 2013, Tim Watson wrote:
>
>  On 14 August 2013 12:09, Tim Watson <watson.timothy@REDACTED> wrote:
>>
>>  When I execute a test run with this code in place however, I still get
>>> the
>>> crash, though the io:format/2 notice that I'm starting the ct log appears
>>> first:
>>>
>>> Common Test starting (cwd is
>>> /home/t4/work/vmware/rabbitmq-**public-umbrella/rabbitmq-test/**
>>> multi-node)
>>>
>>> starting ct log!
>>>
>>>
>>> ct_util_server got EXIT from <0.61.0>: {noproc,
>>>                                         {gen_server,call,
>>>                                          [test_server_io,
>>>                                           {print,xxxFrom,unexpected_io,
>>>                                            [[[["<div
>>> class=\"default\"><b>*** User 2013-08-14 12:02:36.830 ***</b>"],
>>>
>>>                                               "\n",
>>>
>>> [91,102,114,97,109,101,119,**111,
>>>
>>>  114,107,93,32,119,97,116,99,
>>>
>>> 104,100,111,103,58,32,110,111,
>>>
>>> 32,112,114,111,99,115,32,116,
>>>
>>> 111,32,107,105,108,108,"\n"]],
>>>                                              "\n","</div>"]]},
>>>                                           infinity]}}
>>>
>>>
>>> So it appears that the assertion that logging will work between the
>>> hook's
>>> init and terminate callbacks isn't quite working.
>>>
>>>
>>>  Oh and I've tried pausing between the systest_ct_log:start/0 call and
>> the
>> (latter) systest:reset/0 call that triggers the logging, but that didn't
>> make any difference either - e.g., like so:
>>
>> init(systest, Opts) ->
>>    case application:start(systest, permanent) of
>>        {error, {already_started, systest}} -> io:format("starting ct
>> log!~n"),
>>                                               systest_ct_log:start(),
>>                                               receive
>>                                                   foobar -> ok
>>                                               after 2000 -> ok
>>                                               end,
>>                                               systest:reset();
>>        {error, _Reason}=Err                -> Err;
>>        ok                                  -> ok
>>    end,
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130814/64161ebc/attachment.htm>

From peppe@REDACTED  Wed Aug 14 16:34:08 2013
From: peppe@REDACTED (Peter Andersson)
Date: Wed, 14 Aug 2013 16:34:08 +0200
Subject: [erlang-bugs] common_test + test_server_io errors
In-Reply-To: <CALhYyxMVr8vUxxXg_ZzxsLLn1D4sZ1UAsWED1WsRCGsUwgR9Sg@mail.gmail.com>
References: <5A095356-16E8-4224-BA03-AA58C30FC28D@gmail.com>
 <51CC4E35.8090408@erlang.org> <3B4CC6D8-4CB5-4236-AEDC-CBAF51A9E7AE@gmail.com>
 <CALhYyxM-56eNnXFco4=hQDmVtfZ3U2ac35R1mPqTwGKfL4FA2w@mail.gmail.com>
 <51DFB044.50302@erlang.org> <B20079A1-AB50-4507-AB58-9492E51232C6@gmail.com>
 <A8261DF1-D16C-4266-A17E-DDDEC7184B7D@gmail.com> <52011A87.5080203@erlang.org>
 <CALhYyxNJcFMj2Gf6h3RqRTvj1jMX35g3p3HhDv15Y4WDmRQpMg@mail.gmail.com>
 <CALhYyxOd4MrJM4Ma9Trou+HBY2_1y-d9GdDSYNEFhz4=RXrPAQ@mail.gmail.com>
 <CALhYyxMVr8vUxxXg_ZzxsLLn1D4sZ1UAsWED1WsRCGsUwgR9Sg@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.1308141631320.23611@ancalagon.otp.ericsson.se>


Hi Tim,

Thanks for all the useful info!

I haven't actually run any tests on this myself, only read some code so 
far. Obviously the init and terminate hook functions get called before the 
test server process is even started. In other words, these functions 
actually execute in that short "evil" window during startup when you can't 
call pal/2 or log/2. I missed that. :-( Sorry for misleading you!

Let me dig into this properly and get back to you when I can propose 
useful (tested!) solutions to your problems!

Best,
Peter

Ericsson AB, Erlang/OTP

On Wed, 14 Aug 2013, Tim Watson wrote:

> On 14 August 2013 12:09, Tim Watson <watson.timothy@REDACTED> wrote:
>
>> When I execute a test run with this code in place however, I still get the
>> crash, though the io:format/2 notice that I'm starting the ct log appears
>> first:
>>
>> Common Test starting (cwd is
>> /home/t4/work/vmware/rabbitmq-public-umbrella/rabbitmq-test/multi-node)
>>
>> starting ct log!
>>
>>
>> ct_util_server got EXIT from <0.61.0>: {noproc,
>>                                         {gen_server,call,
>>                                          [test_server_io,
>>                                           {print,xxxFrom,unexpected_io,
>>                                            [[[["<div
>> class=\"default\"><b>*** User 2013-08-14 12:02:36.830 ***</b>"],
>>
>>                                               "\n",
>>
>> [91,102,114,97,109,101,119,111,
>>                                                114,107,93,32,119,97,116,99,
>>
>> 104,100,111,103,58,32,110,111,
>>
>> 32,112,114,111,99,115,32,116,
>>
>> 111,32,107,105,108,108,"\n"]],
>>                                              "\n","</div>"]]},
>>                                           infinity]}}
>>
>>
>> So it appears that the assertion that logging will work between the hook's
>> init and terminate callbacks isn't quite working.
>>
>>
> Oh and I've tried pausing between the systest_ct_log:start/0 call and the
> (latter) systest:reset/0 call that triggers the logging, but that didn't
> make any difference either - e.g., like so:
>
> init(systest, Opts) ->
>    case application:start(systest, permanent) of
>        {error, {already_started, systest}} -> io:format("starting ct
> log!~n"),
>                                               systest_ct_log:start(),
>                                               receive
>                                                   foobar -> ok
>                                               after 2000 -> ok
>                                               end,
>                                               systest:reset();
>        {error, _Reason}=Err                -> Err;
>        ok                                  -> ok
>    end,
>


From rr@REDACTED  Wed Aug 14 16:48:28 2013
From: rr@REDACTED (Rick Reed)
Date: Wed, 14 Aug 2013 07:48:28 -0700
Subject: [erlang-bugs] efile_drv & async thread key
In-Reply-To: <520B4192.2060003@erlang.org>
References: <CA+SuFX3ek+JzUjx+3iHORdUkBTcL63OYTjEJ14PPEEPhmgvRCQ@mail.gmail.com>
 <5209E307.6030806@erlang.org> <5209E8AD.5000208@erlang.org>
 <520A1DF3.8050708@erlang.org>
 <CA+SuFX1do2P9oX1o=gCvSBpc7NpZxU8GjM8Fdv=kWOsTcNXnuQ@mail.gmail.com>
 <520B4192.2060003@erlang.org>
Message-ID: <CA+SuFX2hUtvOspa71SOf5BSVSnoAC6XL6+6MV+2yBXpdmGP=CA@mail.gmail.com>

Hi Patrik!

And you want the requests in the same async queue to enforce ordering per
file descriptor or some other reason?  It seems like ordering isn't an issue
because the ultimately the file calls in erlang are synchronous, and an app
would have to enforce ordering itself anyway (we do it by sending all the
i/o for a file through a single proc and/or setting our own per-file locks).

For the app I'm debugging now, it turns out no scheme that ties the port to
a particular thread is going to work.  The system is running at the limits
of
the hardware, and the ports are long-lived.  Only perfect distribution of
i/o
requests over the available threads prevents certain threads from being
overloaded and backing up the i/o on the ports that map to it.

I've been running a few of the systems overnight with a patch that disables
keying in efile_drv.  Now I'm getting a nice flat distribution of i/o
across the
async threads.  Unfortunately, it hasn't completely solved my problem, but
those systems are doing much better.

I'm just wondering if there's some other reason that I'm missing (cache/mem
affinity, platform differences, etc.) for having to map file descriptors to
particular threads.

Thanks for looking into this!

Rr


On Wed, Aug 14, 2013 at 1:36 AM, Patrik Nyblom <pan@REDACTED> wrote:

>  Hi Rick!
>
>
> On 08/14/2013 02:21 AM, Rick Reed wrote:
>
> I assume the reason for keying the file requests is to prevent a single
> port from
> soaking up all the async threads?
>
> Yes, and it's also important that requests for the same file "descriptor"
> end up in she same async queue. So we need to store a fixed key in the file
> descriptor structure.
>
> I think I will hash the pointer to create the key, not just shift away the
> "zero-bits", you never know which icky patterns an allocator can create
> that will distribute the jobs unevenly. The key will only be calculated
> upon opening, so there will be minimal performance hit due to the more
> complicated calculations.
>
> Thanks for reporting - this could cause severe performance issues in
> applications!
>
> Cheers,
> Patrik
>
>
>  Rr
>
>
>  On Tue, Aug 13, 2013 at 4:52 AM, Lukas Larsson <lukas@REDACTED> wrote:
>
>>  And there it is, conclusive proof that I should not be debugging
>> Rickard's code before lunch.
>>
>> Found the issue, will create a fix for it. As a workaround for R16B you
>> can use a prime number as the number of async threads :)
>>
>> Lukas
>>
>>
>> On 13/08/13 10:05, Lukas Larsson wrote:
>>
>> Sigh, apparently I spoke too soon.
>>
>> I remembered incorrectly about the change. It was in R16B that ErlDrvPort
>> became a ptr and it was an id before R16B. Anyways, it is odd that the ptr
>> is 8 bit aligned on you system. On mine (Ubuntu 13.04, x86_64) the ptrs are
>> not aligned and the load is nicely distributed among async threads. If I
>> remember correctly you are using FreeBSD on x86_64? I'll check if I can
>> reproduce the behavior you are seeing on our FreeBSD machine.
>>
>> Lukas
>>
>> On 13/08/13 09:40, Lukas Larsson wrote:
>>
>> Hello Rick!
>>
>> Which version of Erlang are you using? From R16B (I think), the
>> ErlDrvPort datatype no longer is a pointer to the port struct. Instead it
>> is the slot id into the port table and those ids should contain all values.
>> I did a quick test on my computer running the latest on maint on github and
>> seem to get a full spread over all async threads.
>>
>> Lukas
>>
>> On 13/08/13 05:40, Rick Reed wrote:
>>
>> It looks to me as though there's a bit of a problem in the way
>> efile_drv.c generates the
>> key that's used to select an async driver queue.  It uses the address of
>> the port which
>> on our system is 8-byte aligned.  Meanwhile, erl_async.c does a simple
>> mod operation
>> with the number of async threads, so the number of threads that can
>> actually be used
>> by file operations is 1/8th of the number configured.  I suspect this
>> isn't intended.
>>
>>  Rr
>>
>>
>>
>> _______________________________________________
>> erlang-bugs mailing listerlang-bugs@REDACTED://erlang.org/mailman/listinfo/erlang-bugs
>>
>>
>>
>>
>> _______________________________________________
>> erlang-bugs mailing listerlang-bugs@REDACTED://erlang.org/mailman/listinfo/erlang-bugs
>>
>>
>>
>>
>> _______________________________________________
>> erlang-bugs mailing listerlang-bugs@REDACTED://erlang.org/mailman/listinfo/erlang-bugs
>>
>>
>>
>
>
> _______________________________________________
> erlang-bugs mailing listerlang-bugs@REDACTED://erlang.org/mailman/listinfo/erlang-bugs
>
>
>
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://erlang.org/mailman/listinfo/erlang-bugs
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130814/cfeae62f/attachment.htm>

From ess@REDACTED  Thu Aug 15 15:41:08 2013
From: ess@REDACTED (=?ISO-8859-1?Q?Erik_S=F8e_S=F8rensen?=)
Date: Thu, 15 Aug 2013 15:41:08 +0200
Subject: [erlang-bugs] A funny bug
In-Reply-To: <7DAE1921-064D-41C5-B9AE-EE4513FDF842@feuerlabs.com>
References: <6AD0BB5D-9A64-4641-9F85-CBE14BDEC39E@rogvall.se>
 <28C886FB-1767-45F7-B2DC-7F796298589A@gmail.com>
 <614DCCF0-DB18-4848-BCBB-B4891F80D30D@rogvall.se>
 <EA747C3E-D61E-4F6A-BCB1-217C75B6016C@gmail.com>
 <266F75A8-3E1A-406F-8877-2477647E6C2B@rogvall.se>
 <856DDC77-AA97-428B-8C68-BC4DE88618C0@gmail.com>
 <7DAE1921-064D-41C5-B9AE-EE4513FDF842@feuerlabs.com>
Message-ID: <520CDA74.80003@trifork.com>

On 02-08-2013 16:57, Ulf Wiger wrote:
[snip]
> So arguably, a way to parameterize receive *should* be available, and *should* be documented. I'm not saying that prim_eval:'receive'/2 is that very thing that should be documented, but it comes close enough that Erlang wizards like Tony should not only be excused for playing around with it, but should be *expected* to. ;-)
An alternative "parameterized receive" method is this:

http://polymorphictypist.blogspot.dk/2011/10/dynamic-selective-receive-erlang-hack.html

(Disclaimer: self plug)

It takes a compiled match spec, like so:

     dyn_sel_recv:match_spec_receive(CMS, 1000)

which is presumably safer than allowing any closure to be called.

/Erik


From pan@REDACTED  Thu Aug 15 18:34:21 2013
From: pan@REDACTED (Patrik Nyblom)
Date: Thu, 15 Aug 2013 18:34:21 +0200
Subject: [erlang-bugs] efile_drv & async thread key
In-Reply-To: <CA+SuFX2hUtvOspa71SOf5BSVSnoAC6XL6+6MV+2yBXpdmGP=CA@mail.gmail.com>
References: <CA+SuFX3ek+JzUjx+3iHORdUkBTcL63OYTjEJ14PPEEPhmgvRCQ@mail.gmail.com>
 <5209E307.6030806@erlang.org> <5209E8AD.5000208@erlang.org>
 <520A1DF3.8050708@erlang.org>
 <CA+SuFX1do2P9oX1o=gCvSBpc7NpZxU8GjM8Fdv=kWOsTcNXnuQ@mail.gmail.com>
 <520B4192.2060003@erlang.org>
 <CA+SuFX2hUtvOspa71SOf5BSVSnoAC6XL6+6MV+2yBXpdmGP=CA@mail.gmail.com>
Message-ID: <520D030D.8070404@erlang.org>

Hi Rick!

On 08/14/2013 04:48 PM, Rick Reed wrote:
> Hi Patrik!
>
> And you want the requests in the same async queue to enforce ordering per
> file descriptor or some other reason?  It seems like ordering isn't an 
> issue
> because the ultimately the file calls in erlang are synchronous, and 
> an app
> would have to enforce ordering itself anyway (we do it by sending all the
> i/o for a file through a single proc and/or setting our own per-file 
> locks).
>
Yes, one example is process exit, where close definitely should not be 
intermingled
with other file operations from other threads that are ongoing. That 
definitely happens if
you round robin the file descriptors. I remember that there has been 
other situations where
the synchronous Erlang interface is not enough, but I can not for my 
life remember them right now.
Anyway, process exit is definitely one example :)
> For the app I'm debugging now, it turns out no scheme that ties the 
> port to
> a particular thread is going to work.  The system is running at the 
> limits of
> the hardware, and the ports are long-lived.  Only perfect distribution 
> of i/o
> requests over the available threads prevents certain threads from being
> overloaded and backing up the i/o on the ports that map to it.
Well, given the current design, I'm afraid a really good hash is the 
best I can come up with :(

The I/O should be rethought and rewritten once we have dirty schedulers 
instead
of the async threads!
>
> I've been running a few of the systems overnight with a patch that 
> disables
> keying in efile_drv.  Now I'm getting a nice flat distribution of i/o 
> across the
> async threads.  Unfortunately, it hasn't completely solved my problem, but
> those systems are doing much better.
Yes, probably. It is not safe though, especially compressed files in 
combination with
processes getting exit (kill) signals during the file operations may 
core the VM.

With better distribution of the FD's maybe you can get as good results 
as with
the round robin without risks?
>
> I'm just wondering if there's some other reason that I'm missing 
> (cache/mem
> affinity, platform differences, etc.) for having to map file 
> descriptors to
> particular threads.
I don't think it helps caches that much, it's far more threads than 
cores anyway, so it's bound
to generate inter-core communication regardless.
>
> Thanks for looking into this!
Thanks for reporting!
>
> Rr

Cheers,
Patrik
>
>
> On Wed, Aug 14, 2013 at 1:36 AM, Patrik Nyblom <pan@REDACTED 
> <mailto:pan@REDACTED>> wrote:
>
>     Hi Rick!
>
>
>     On 08/14/2013 02:21 AM, Rick Reed wrote:
>>     I assume the reason for keying the file requests is to prevent a
>>     single port from
>>     soaking up all the async threads?
>     Yes, and it's also important that requests for the same file
>     "descriptor" end up in she same async queue. So we need to store a
>     fixed key in the file descriptor structure.
>
>     I think I will hash the pointer to create the key, not just shift
>     away the "zero-bits", you never know which icky patterns an
>     allocator can create that will distribute the jobs unevenly. The
>     key will only be calculated upon opening, so there will be minimal
>     performance hit due to the more complicated calculations.
>
>     Thanks for reporting - this could cause severe performance issues
>     in applications!
>
>     Cheers,
>     Patrik
>
>>
>>     Rr
>>
>>
>>     On Tue, Aug 13, 2013 at 4:52 AM, Lukas Larsson <lukas@REDACTED
>>     <mailto:lukas@REDACTED>> wrote:
>>
>>         And there it is, conclusive proof that I should not be
>>         debugging Rickard's code before lunch.
>>
>>         Found the issue, will create a fix for it. As a workaround
>>         for R16B you can use a prime number as the number of async
>>         threads :)
>>
>>         Lukas
>>
>>
>>         On 13/08/13 10:05, Lukas Larsson wrote:
>>>         Sigh, apparently I spoke too soon.
>>>
>>>         I remembered incorrectly about the change. It was in R16B
>>>         that ErlDrvPort became a ptr and it was an id before R16B.
>>>         Anyways, it is odd that the ptr is 8 bit aligned on you
>>>         system. On mine (Ubuntu 13.04, x86_64) the ptrs are not
>>>         aligned and the load is nicely distributed among async
>>>         threads. If I remember correctly you are using FreeBSD on
>>>         x86_64? I'll check if I can reproduce the behavior you are
>>>         seeing on our FreeBSD machine.
>>>
>>>         Lukas
>>>
>>>         On 13/08/13 09:40, Lukas Larsson wrote:
>>>>         Hello Rick!
>>>>
>>>>         Which version of Erlang are you using? From R16B (I think),
>>>>         the ErlDrvPort datatype no longer is a pointer to the port
>>>>         struct. Instead it is the slot id into the port table and
>>>>         those ids should contain all values. I did a quick test on
>>>>         my computer running the latest on maint on github and seem
>>>>         to get a full spread over all async threads.
>>>>
>>>>         Lukas
>>>>
>>>>         On 13/08/13 05:40, Rick Reed wrote:
>>>>>         It looks to me as though there's a bit of a problem in the
>>>>>         way efile_drv.c generates the
>>>>>         key that's used to select an async driver queue.  It uses
>>>>>         the address of the port which
>>>>>         on our system is 8-byte aligned.  Meanwhile, erl_async.c
>>>>>         does a simple mod operation
>>>>>         with the number of async threads, so the number of threads
>>>>>         that can actually be used
>>>>>         by file operations is 1/8th of the number configured.  I
>>>>>         suspect this isn't intended.
>>>>>
>>>>>         Rr
>>>>>
>>>>>
>>>>>
>>>>>         _______________________________________________
>>>>>         erlang-bugs mailing list
>>>>>         erlang-bugs@REDACTED  <mailto:erlang-bugs@REDACTED>
>>>>>         http://erlang.org/mailman/listinfo/erlang-bugs
>>>>
>>>>
>>>>
>>>>         _______________________________________________
>>>>         erlang-bugs mailing list
>>>>         erlang-bugs@REDACTED  <mailto:erlang-bugs@REDACTED>
>>>>         http://erlang.org/mailman/listinfo/erlang-bugs
>>>
>>>
>>>
>>>         _______________________________________________
>>>         erlang-bugs mailing list
>>>         erlang-bugs@REDACTED  <mailto:erlang-bugs@REDACTED>
>>>         http://erlang.org/mailman/listinfo/erlang-bugs
>>
>>
>>
>>
>>     _______________________________________________
>>     erlang-bugs mailing list
>>     erlang-bugs@REDACTED  <mailto:erlang-bugs@REDACTED>
>>     http://erlang.org/mailman/listinfo/erlang-bugs
>
>
>     _______________________________________________
>     erlang-bugs mailing list
>     erlang-bugs@REDACTED <mailto:erlang-bugs@REDACTED>
>     http://erlang.org/mailman/listinfo/erlang-bugs
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130815/8363a04d/attachment.htm>

From rr@REDACTED  Fri Aug 16 01:23:45 2013
From: rr@REDACTED (Rick Reed)
Date: Thu, 15 Aug 2013 16:23:45 -0700
Subject: [erlang-bugs] efile_drv & async thread key
In-Reply-To: <520D030D.8070404@erlang.org>
References: <CA+SuFX3ek+JzUjx+3iHORdUkBTcL63OYTjEJ14PPEEPhmgvRCQ@mail.gmail.com>
 <5209E307.6030806@erlang.org> <5209E8AD.5000208@erlang.org>
 <520A1DF3.8050708@erlang.org>
 <CA+SuFX1do2P9oX1o=gCvSBpc7NpZxU8GjM8Fdv=kWOsTcNXnuQ@mail.gmail.com>
 <520B4192.2060003@erlang.org>
 <CA+SuFX2hUtvOspa71SOf5BSVSnoAC6XL6+6MV+2yBXpdmGP=CA@mail.gmail.com>
 <520D030D.8070404@erlang.org>
Message-ID: <CA+SuFX31BMcv7hm+=on=L+uZEne1Ls+esc1C6ZHFJaZ4PqFE_g@mail.gmail.com>

On Thu, Aug 15, 2013 at 9:34 AM, Patrik Nyblom <pan@REDACTED> wrote:

>  Yes, one example is process exit, where close definitely should not be
> intermingled
> with other file operations from other threads that are ongoing. That
> definitely happens if
> you round robin the file descriptors.
>

Perhaps the close could be enqueued on the descriptor work queue but not
issued to
the async thread queue until any outstanding ops have finished, though it
doesn't look
like the descriptor currently keeps track of how many async ops are
outstanding.


> Yes, probably. It is not safe though, especially compressed files in
> combination with
> processes getting exit (kill) signals during the file operations may core
> the VM.
>

In our case, the file procs run forever, so I think our risk will be low on
this particular
system.  I've enabled the behavior via an env var, so we won't be running
this on the
rest of our systems.

With better distribution of the FD's maybe you can get as good results as
> with
> the round robin without risks?
>

Unfortunately not.  The only way to ensure that there wouldn't a noticeable
difference
in load between different async threads would be to either have far too
many or far
too few async threads.

Rr
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130815/a8ea16b2/attachment.htm>

From yosh@REDACTED  Thu Aug 15 20:43:51 2013
From: yosh@REDACTED (Manish Singh)
Date: Thu, 15 Aug 2013 11:43:51 -0700
Subject: [erlang-bugs] file:pread broken with GCC 4.8
Message-ID: <CAHoRzZKrHmrGAjC8NVg=Zd_d9E_NJDb5XYf4xpGJwjjT=3ttHA@mail.gmail.com>

I've also run into this problem:

http://erlang.org/pipermail/erlang-bugs/2013-July/003674.html

At first I thought it was a gcc bug, but
http://gcc.gnu.org/bugs/#reportsays "if compiling with
-fno-strict-aliasing -fwrapv
-fno-aggressive-loop-optimizations makes a difference, your code probably
is not correct." Compiling efile_drv.c with
-fno-aggressive-loop-optimizations makes the problem go away.

With -Wextra, there are warnings about signed/unsigned comparisons, which
might be causing this:

drivers/common/efile_drv.c:3749:14: note: in expansion of macro
?EV_GET_UINT64?
      if (   !EV_GET_UINT64(ev, &d->c.preadv.offsets[i-1], &p, &q)
              ^
drivers/common/efile_drv.c:590:30: warning: comparison between signed and
unsigned integer expressions [-Wsign-compare]
         *(pp) = (    *(pp)+8 < (ev)->iov[*(qp)].iov_len   \
                              ^
drivers/common/efile_drv.c:3749:14: note: in expansion of macro
?EV_GET_UINT64?
      if (   !EV_GET_UINT64(ev, &d->c.preadv.offsets[i-1], &p, &q)
              ^
drivers/common/efile_drv.c:564:14: warning: comparison between signed and
unsigned integer expressions [-Wsign-compare]
     (*(pp)+4 <= (ev)->iov[*(qp)].iov_len                  \
              ^
drivers/common/efile_drv.c:3750:7: note: in expansion of macro
?EV_GET_UINT32?
   || !EV_GET_UINT32(ev, &sizeH, &p, &q)
       ^
drivers/common/efile_drv.c:569:30: warning: comparison between signed and
unsigned integer expressions [-Wsign-compare]
         *(pp) = (    *(pp)+4 < (ev)->iov[*(qp)].iov_len   \
                              ^
drivers/common/efile_drv.c:3750:7: note: in expansion of macro
?EV_GET_UINT32?
   || !EV_GET_UINT32(ev, &sizeH, &p, &q)
       ^
drivers/common/efile_drv.c:564:14: warning: comparison between signed and
unsigned integer expressions [-Wsign-compare]
     (*(pp)+4 <= (ev)->iov[*(qp)].iov_len                  \
              ^
drivers/common/efile_drv.c:3751:7: note: in expansion of macro
?EV_GET_UINT32?
   || !EV_GET_UINT32(ev, &sizeL, &p, &q)) {
       ^
drivers/common/efile_drv.c:569:30: warning: comparison between signed and
unsigned integer expressions [-Wsign-compare]
         *(pp) = (    *(pp)+4 < (ev)->iov[*(qp)].iov_len   \
                              ^
drivers/common/efile_drv.c:3751:7: note: in expansion of macro
?EV_GET_UINT32?
   || !EV_GET_UINT32(ev, &sizeL, &p, &q)) {
       ^
drivers/common/efile_drv.c:581:14: warning: comparison between signed and
unsigned integer expressions [-Wsign-compare]
     (*(pp)+8 <= (ev)->iov[*(qp)].iov_len                  \

-Manish
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130815/4195566d/attachment.htm>

From watson.timothy@REDACTED  Fri Aug 16 11:13:22 2013
From: watson.timothy@REDACTED (Tim Watson)
Date: Fri, 16 Aug 2013 10:13:22 +0100
Subject: [erlang-bugs] efile_drv & async thread key
In-Reply-To: <520A1DF3.8050708@erlang.org>
References: <CA+SuFX3ek+JzUjx+3iHORdUkBTcL63OYTjEJ14PPEEPhmgvRCQ@mail.gmail.com>
 <5209E307.6030806@erlang.org> <5209E8AD.5000208@erlang.org>
 <520A1DF3.8050708@erlang.org>
Message-ID: <9264866B-C09E-40F1-A179-29D8D55B5326@gmail.com>

On 13 Aug 2013, at 12:52, Lukas Larsson wrote:
> And there it is, conclusive proof that I should not be debugging Rickard's code before lunch. 
> 
> Found the issue, will create a fix for it. As a workaround for R16B you can use a prime number as the number of async threads :)
> 

Hi Lukas,

Does this issue only affect R16B, or all versions >= R16B?

Cheers,
Tim


From lukas@REDACTED  Fri Aug 16 11:13:45 2013
From: lukas@REDACTED (Lukas Larsson)
Date: Fri, 16 Aug 2013 11:13:45 +0200
Subject: [erlang-bugs] file:pread broken with GCC 4.8
In-Reply-To: <CAHoRzZKrHmrGAjC8NVg=Zd_d9E_NJDb5XYf4xpGJwjjT=3ttHA@mail.gmail.com>
References: <CAHoRzZKrHmrGAjC8NVg=Zd_d9E_NJDb5XYf4xpGJwjjT=3ttHA@mail.gmail.com>
Message-ID: <520DED49.8020603@erlang.org>

Hello Manish,

Thanks for reporting this again and digging into it a little deeper. 
I've created a fix which solves the problem as seen by Tomas and will 
include it in the R16B02 release. I'll be testing the fix over the 
weekend and hopefully it will be visible in maint on github by early 
next week.

Lukas

On 15/08/13 20:43, Manish Singh wrote:
> I've also run into this problem:
>
> http://erlang.org/pipermail/erlang-bugs/2013-July/003674.html
>
> At first I thought it was a gcc bug, but 
> http://gcc.gnu.org/bugs/#report says "if compiling with 
> -fno-strict-aliasing -fwrapv -fno-aggressive-loop-optimizations makes 
> a difference, your code probably is not correct." Compiling 
> efile_drv.c with -fno-aggressive-loop-optimizations makes the problem 
> go away.
>
> With -Wextra, there are warnings about signed/unsigned comparisons, 
> which might be causing this:
>
> drivers/common/efile_drv.c:3749:14: note: in expansion of macro 
> 'EV_GET_UINT64'
>       if (   !EV_GET_UINT64(ev, &d->c.preadv.offsets[i-1], &p, &q)
>               ^
> drivers/common/efile_drv.c:590:30: warning: comparison between signed 
> and unsigned integer expressions [-Wsign-compare]
>          *(pp) = (    *(pp)+8 < (ev)->iov[*(qp)].iov_len   \
>                               ^
> drivers/common/efile_drv.c:3749:14: note: in expansion of macro 
> 'EV_GET_UINT64'
>       if (   !EV_GET_UINT64(ev, &d->c.preadv.offsets[i-1], &p, &q)
>               ^
> drivers/common/efile_drv.c:564:14: warning: comparison between signed 
> and unsigned integer expressions [-Wsign-compare]
>      (*(pp)+4 <= (ev)->iov[*(qp)].iov_len                  \
>               ^
> drivers/common/efile_drv.c:3750:7: note: in expansion of macro 
> 'EV_GET_UINT32'
>    || !EV_GET_UINT32(ev, &sizeH, &p, &q)
>        ^
> drivers/common/efile_drv.c:569:30: warning: comparison between signed 
> and unsigned integer expressions [-Wsign-compare]
>          *(pp) = (    *(pp)+4 < (ev)->iov[*(qp)].iov_len   \
>                               ^
> drivers/common/efile_drv.c:3750:7: note: in expansion of macro 
> 'EV_GET_UINT32'
>    || !EV_GET_UINT32(ev, &sizeH, &p, &q)
>        ^
> drivers/common/efile_drv.c:564:14: warning: comparison between signed 
> and unsigned integer expressions [-Wsign-compare]
>      (*(pp)+4 <= (ev)->iov[*(qp)].iov_len                  \
>               ^
> drivers/common/efile_drv.c:3751:7: note: in expansion of macro 
> 'EV_GET_UINT32'
>    || !EV_GET_UINT32(ev, &sizeL, &p, &q)) {
>        ^
> drivers/common/efile_drv.c:569:30: warning: comparison between signed 
> and unsigned integer expressions [-Wsign-compare]
>          *(pp) = (    *(pp)+4 < (ev)->iov[*(qp)].iov_len   \
>                               ^
> drivers/common/efile_drv.c:3751:7: note: in expansion of macro 
> 'EV_GET_UINT32'
>    || !EV_GET_UINT32(ev, &sizeL, &p, &q)) {
>        ^
> drivers/common/efile_drv.c:581:14: warning: comparison between signed 
> and unsigned integer expressions [-Wsign-compare]
>      (*(pp)+8 <= (ev)->iov[*(qp)].iov_len                  \
>
> -Manish
>
>
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://erlang.org/mailman/listinfo/erlang-bugs

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130816/bfa661ea/attachment.htm>

From lukas@REDACTED  Fri Aug 16 11:16:08 2013
From: lukas@REDACTED (Lukas Larsson)
Date: Fri, 16 Aug 2013 11:16:08 +0200
Subject: [erlang-bugs] efile_drv & async thread key
In-Reply-To: <9264866B-C09E-40F1-A179-29D8D55B5326@gmail.com>
References: <CA+SuFX3ek+JzUjx+3iHORdUkBTcL63OYTjEJ14PPEEPhmgvRCQ@mail.gmail.com>
 <5209E307.6030806@erlang.org> <5209E8AD.5000208@erlang.org>
 <520A1DF3.8050708@erlang.org>
 <9264866B-C09E-40F1-A179-29D8D55B5326@gmail.com>
Message-ID: <520DEDD8.30301@erlang.org>

It affects both R16B and R16B01.

Lukas

On 16/08/13 11:13, Tim Watson wrote:
> On 13 Aug 2013, at 12:52, Lukas Larsson wrote:
>> And there it is, conclusive proof that I should not be debugging Rickard's code before lunch.
>>
>> Found the issue, will create a fix for it. As a workaround for R16B you can use a prime number as the number of async threads :)
>>
> Hi Lukas,
>
> Does this issue only affect R16B, or all versions >= R16B?
>
> Cheers,
> Tim
>
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://erlang.org/mailman/listinfo/erlang-bugs
>


From watson.timothy@REDACTED  Fri Aug 16 11:16:43 2013
From: watson.timothy@REDACTED (Tim Watson)
Date: Fri, 16 Aug 2013 10:16:43 +0100
Subject: [erlang-bugs] efile_drv & async thread key
In-Reply-To: <520DEDD8.30301@erlang.org>
References: <CA+SuFX3ek+JzUjx+3iHORdUkBTcL63OYTjEJ14PPEEPhmgvRCQ@mail.gmail.com>
 <5209E307.6030806@erlang.org> <5209E8AD.5000208@erlang.org>
 <520A1DF3.8050708@erlang.org>
 <9264866B-C09E-40F1-A179-29D8D55B5326@gmail.com> <520DEDD8.30301@erlang.org>
Message-ID: <0DC18C53-8B3E-4A50-A9F0-FE6587AE055F@gmail.com>

Ok, thanks for the confirmation.

Cheers,
Tim

On 16 Aug 2013, at 10:16, Lukas Larsson wrote:

> It affects both R16B and R16B01.
> 
> Lukas
> 
> On 16/08/13 11:13, Tim Watson wrote:
>> On 13 Aug 2013, at 12:52, Lukas Larsson wrote:
>>> And there it is, conclusive proof that I should not be debugging Rickard's code before lunch.
>>> 
>>> Found the issue, will create a fix for it. As a workaround for R16B you can use a prime number as the number of async threads :)
>>> 
>> Hi Lukas,
>> 
>> Does this issue only affect R16B, or all versions >= R16B?
>> 
>> Cheers,
>> Tim
>> 
>> _______________________________________________
>> erlang-bugs mailing list
>> erlang-bugs@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-bugs
>> 
> 


From hans.bolinder@REDACTED  Wed Aug 21 10:46:49 2013
From: hans.bolinder@REDACTED (Hans Bolinder)
Date: Wed, 21 Aug 2013 08:46:49 +0000
Subject: [erlang-bugs] A dets big ?
In-Reply-To: <CADfSCeR1epef5+8y6AiaeMD5vyQAO19QRiNkTB6agd9+0q1uiQ@mail.gmail.com>
References: <CADfSCeR1epef5+8y6AiaeMD5vyQAO19QRiNkTB6agd9+0q1uiQ@mail.gmail.com>
Message-ID: <56466BD70414EA48969B4064696CF28C081199@ESESSMB207.ericsson.se>

Hi,

[Manuel Dur?n Aguete:]

> After upgrading a project from R14B03 to R16B01 I've found that dets
> files are growing constanly after delete operations. In previous
> version the empty space was reused to allocate new data, after R16B
> seems that empty space isn't reused.
>
> I've uploaded a test case to github: http://kcy.me/oz1s

Thank you for the excellent bug report. A fix should appear on the
'maint' branch soon.

Best regards,

Hans Bolinder, Erlang/OTP team, Ericsson

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130821/8a98db40/attachment.htm>

From hans.bolinder@REDACTED  Thu Aug 22 08:34:12 2013
From: hans.bolinder@REDACTED (Hans Bolinder)
Date: Thu, 22 Aug 2013 06:34:12 +0000
Subject: [erlang-bugs] dialyzer false positive io_lib:fread
In-Reply-To: <5206BC95.9060902@cs.ntua.gr>
References: <op.wz7tvxelvksmfo@shuttle.squirrel>
 <op.w1k1xcisvksmfo@shuttle.squirrel>,<5206BC95.9060902@cs.ntua.gr>
Message-ID: <56466BD70414EA48969B4064696CF28C0812CF@ESESSMB207.ericsson.se>

Hi,

[Chris King:]
> dialyzer produces a false positive when analyzing io_lib:fread with a ~a
> argument ? it believes (erroneously) that the parsed value will be a
> string, when in fact it will be an atom.  This does not occur with
> io:fread, or with io_lib:fread with an integer argument.

[Kostis:]
> The behaviour you are experiencing is a side-effect of the type and spec
> declarations that exist in modules io_lib and io_lib_fread

Thanks for the bug report. I've corrected the specs of io_lib:fread().
The fix should appear on the 'maint' branch soon.

Best regards,

Hans Bolinder, Erlang OTP team, Ericsson


From essen@REDACTED  Thu Aug 22 10:16:12 2013
From: essen@REDACTED (=?ISO-8859-1?Q?Lo=EFc_Hoguin?=)
Date: Thu, 22 Aug 2013 10:16:12 +0200
Subject: [erlang-bugs] ssl:negotiated_next_protocol/1 client bug?
Message-ID: <5215C8CC.4010200@ninenines.eu>

Hello,

I can't get ssl:negotiated_next_protocol/1 to work from the client side. 
As a result I can't really know what protocol to use once the connection 
is established.

Example:

1> ssl:start().
ok
2> {ok, S} = ssl:connect("twitter.com", 443, [binary, {active, false}, 
{client_preferred_next_protocols, client, [<<"spdy/3">>, 
<<"http/1.1">>], <<"http/1.1">>}]).
{ok,{sslsocket,{gen_tcp,#Port<0.1088>},<0.52.0>}}
3> ssl:negotiated_next_protocol(S).
{error,next_protocol_not_negotiated}

It says that the protocol hasn't been negotiated. But it actually has 
been as can be demonstrated below.

4> ssl:send(S, [<<128,3,0,1,1,0,0,81,0,0,0,1,0,0,0,0,0,0>>,
4> 
[<<120,187,227,198,167,194,2,101,37,80,122,180,66,164,90,119,215,16,176,72,
4> 
49,176,236,203,5,23,144,25,37,37,5,160,244,203,106,5,45,54,184,75,202,51,
4> 
75,128,113,163,151,12,46,77,88,173,10,18,193,229,24,163,62,40,65,91,97,
4>     73,220,0,0,0,0,255,255>>]]).
ok
5> ssl:recv(S, 0).
{ok,<<128,3,0,4,0,0,0,12,0,0,0,1,0,0,0,4,0,0,0,100>>}

This is SPDY working just fine.

The same can be done against an Erlang server that uses 
ssl:negotiated_next_protocol/1 where it works, it only fails on the 
client side.

Bug?

-- 
Lo?c Hoguin
Erlang Cowboy
Nine Nines
http://ninenines.eu


From robert.virding@REDACTED  Fri Aug 23 12:24:17 2013
From: robert.virding@REDACTED (Robert Virding)
Date: Fri, 23 Aug 2013 11:24:17 +0100 (BST)
Subject: [erlang-bugs] A funny bug
In-Reply-To: <520CDA74.80003@trifork.com>
References: <6AD0BB5D-9A64-4641-9F85-CBE14BDEC39E@rogvall.se>
 <28C886FB-1767-45F7-B2DC-7F796298589A@gmail.com>
 <614DCCF0-DB18-4848-BCBB-B4891F80D30D@rogvall.se>
 <EA747C3E-D61E-4F6A-BCB1-217C75B6016C@gmail.com>
 <266F75A8-3E1A-406F-8877-2477647E6C2B@rogvall.se>
 <856DDC77-AA97-428B-8C68-BC4DE88618C0@gmail.com>
 <7DAE1921-064D-41C5-B9AE-EE4513FDF842@feuerlabs.com>
 <520CDA74.80003@trifork.com>
Message-ID: <2069032907.12270099.1377253457268.JavaMail.zimbra@erlang-solutions.com>

Most definitely safer! An interesting question is what happens if you do a receive in the closure. Do you see or not see the "current" message? Does the original receive see any messages removed by the closure? Etc If it is possible to do it then people will do it irrespective if you tell them not to. And they will complain if the undocumented behaviour is changed. :-)

So if a parametrized receive is added then something like a MS is to be preferred to a general closure.

Robert

----- Original Message -----
> From: "Erik S?e S?rensen" <ess@REDACTED>
> To: erlang-bugs@REDACTED
> Sent: Thursday, 15 August, 2013 3:41:08 PM
> Subject: Re: [erlang-bugs] A funny bug
> 
> On 02-08-2013 16:57, Ulf Wiger wrote:
> [snip]
> > So arguably, a way to parameterize receive *should* be available, and
> > *should* be documented. I'm not saying that prim_eval:'receive'/2 is that
> > very thing that should be documented, but it comes close enough that
> > Erlang wizards like Tony should not only be excused for playing around
> > with it, but should be *expected* to. ;-)
> An alternative "parameterized receive" method is this:
> 
> http://polymorphictypist.blogspot.dk/2011/10/dynamic-selective-receive-erlang-hack.html
> 
> (Disclaimer: self plug)
> 
> It takes a compiled match spec, like so:
> 
>      dyn_sel_recv:match_spec_receive(CMS, 1000)
> 
> which is presumably safer than allowing any closure to be called.
> 
> /Erik
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://erlang.org/mailman/listinfo/erlang-bugs
> 


From peppe@REDACTED  Fri Aug 23 17:43:35 2013
From: peppe@REDACTED (Peter Andersson)
Date: Fri, 23 Aug 2013 17:43:35 +0200
Subject: [erlang-bugs] common_test + test_server_io errors
In-Reply-To: <CALhYyxOta2xndMSsTq-DPyYOr5KHqy0CgGOBKMxJLVb6xoKWwg@mail.gmail.com>
References: <5A095356-16E8-4224-BA03-AA58C30FC28D@gmail.com>
 <51CC4E35.8090408@erlang.org>
 <3B4CC6D8-4CB5-4236-AEDC-CBAF51A9E7AE@gmail.com>
 <CALhYyxM-56eNnXFco4=hQDmVtfZ3U2ac35R1mPqTwGKfL4FA2w@mail.gmail.com>
 <51DFB044.50302@erlang.org> <B20079A1-AB50-4507-AB58-9492E51232C6@gmail.com>
 <A8261DF1-D16C-4266-A17E-DDDEC7184B7D@gmail.com>
 <52011A87.5080203@erlang.org>
 <CALhYyxNJcFMj2Gf6h3RqRTvj1jMX35g3p3HhDv15Y4WDmRQpMg@mail.gmail.com>
 <CALhYyxOd4MrJM4Ma9Trou+HBY2_1y-d9GdDSYNEFhz4=RXrPAQ@mail.gmail.com>
 <CALhYyxMVr8vUxxXg_ZzxsLLn1D4sZ1UAsWED1WsRCGsUwgR9Sg@mail.gmail.com>
 <Pine.LNX.4.64.1308141324270.20836@ancalagon.otp.ericsson.se>
 <CALhYyxOta2xndMSsTq-DPyYOr5KHqy0CgGOBKMxJLVb6xoKWwg@mail.gmail.com>
Message-ID: <52178327.4010003@erlang.org>


Hi Tim,

Ok, I have something for you now that will hopefully work. Please test
it as soon as you can and get back to me!

I've modified both Common Test and Test Server and you'll find the
changes in this branch:

  git://github.com/peppe-erlang/otp.git peppe/common_test/cth_ctrl

The idea now is that you start Common Test using a hook like the example
module I've attached (cth_ctrl), e.g:

  ct_run -pa $PWD -logdir ./logs -ct_hooks cth_ctrl -suite dummy_SUITE.erl

This "pauses" Common Test immediately after startup, in the hook init
function, and logging is enabled at that point (which didn't work
before). The example hook spawns a process that calls ct:pal/2 and
error_logger:error_report/1 in a loop to verify this. When you're done
with your startup operations, you call a proceed function to start the
test run. When the tests are done you get paused again, this time in the
hook terminate function, with logging still enabled. When your teardown
operations are finished, you call the proceed function again and Common
Test terminates.

Here's how logging works (which I mean to document properly in the
User's Guide before the upcoming release): All printouts with ct:log/2
or ct:pal/2, or any error/progress reports that happen in the pre-test
phase are saved in a log file which you find a link to on the CT
Framework Log page. Similarly, all printouts/reports that happen in the
post-test phase, are saved in the same log file, and you get a link to
this section also on the CT Framework Log page.

When tests run, printouts and reports are saved as usual in the test
case log files, or in the Unexpected I/O Log (for any printouts that
can't be associated to a particular test case).

I hope this solution works for you. Please get back to me with comments
and questions!

Best regards,
Peter

Ericsson AB, Erlang/OTP


Tim Watson wrote:
> Hi Peter,
>
> Ok that's great - thanks for your assistance!
>
> Cheers,
> Tim
>
>
>
> On 14 August 2013 12:36, Peter Andersson
> <peter.e.andersson@REDACTED
> <mailto:peter.e.andersson@REDACTED>> wrote:
>
>
>     Hi Tim,
>
>     Thanks for all the useful info!
>
>     I haven't actually run any tests on this myself, only read some
>     code so far. Obviously the init and terminate hook functions get
>     called before the test server process is even started. In other
>     words, these functions actually execute in that short "evil"
>     window during startup when you can't call pal/2 or log/2. I missed
>     that. :-( Sorry for misleading you!
>
>     Let me dig into this properly and get back to you when I can
>     propose useful (tested!) solutions to your problems!
>
>     Best,
>     Peter
>
>     Ericsson AB, Erlang/OTP
>
>
>
>     On Wed, 14 Aug 2013, Tim Watson wrote:
>
>         On 14 August 2013 12:09, Tim Watson <watson.timothy@REDACTED
>         <mailto:watson.timothy@REDACTED>> wrote:
>
>             When I execute a test run with this code in place however,
>             I still get the
>             crash, though the io:format/2 notice that I'm starting the
>             ct log appears
>             first:
>
>             Common Test starting (cwd is
>             /home/t4/work/vmware/rabbitmq-public-umbrella/rabbitmq-test/multi-node)
>
>             starting ct log!
>
>
>             ct_util_server got EXIT from <0.61.0>: {noproc,
>                                                     {gen_server,call,
>                                                      [test_server_io,
>                                                      
>             {print,xxxFrom,unexpected_io,
>                                                        [[[["<div
>             class=\"default\"><b>*** User 2013-08-14 12:02:36.830
>             ***</b>"],
>
>                                                           "\n",
>
>             [91,102,114,97,109,101,119,111,
>                                                          
>              114,107,93,32,119,97,116,99,
>
>             104,100,111,103,58,32,110,111,
>
>             32,112,114,111,99,115,32,116,
>
>             111,32,107,105,108,108,"\n"]],
>                                                          "\n","</div>"]]},
>                                                       infinity]}}
>
>
>             So it appears that the assertion that logging will work
>             between the hook's
>             init and terminate callbacks isn't quite working.
>
>
>         Oh and I've tried pausing between the systest_ct_log:start/0
>         call and the
>         (latter) systest:reset/0 call that triggers the logging, but
>         that didn't
>         make any difference either - e.g., like so:
>
>         init(systest, Opts) ->
>            case application:start(systest, permanent) of
>                {error, {already_started, systest}} ->
>         io:format("starting ct
>         log!~n"),
>                                                      
>         systest_ct_log:start(),
>                                                       receive
>                                                           foobar -> ok
>                                                       after 2000 -> ok
>                                                       end,
>                                                       systest:reset();
>                {error, _Reason}=Err                -> Err;
>                ok                                  -> ok
>            end,
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://erlang.org/mailman/listinfo/erlang-bugs
>   

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: cth_ctrl.erl
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130823/64749397/attachment.ksh>

From tuncer.ayaz@REDACTED  Mon Aug 26 13:25:41 2013
From: tuncer.ayaz@REDACTED (Tuncer Ayaz)
Date: Mon, 26 Aug 2013 13:25:41 +0200
Subject: [erlang-bugs] erlang.el mis-indents whole-buffer selection
Message-ID: <CAOvwQ4gaOMctrtJesbZr-abDVDk0S29_gjF9bDjzf2fabf+Gyw@mail.gmail.com>

Previously it was just a certain[1] function in rebar.erl which got
mis-indented when you did a whole-buffer indent, but now there's also
a second[2] function which gets mis-indented.

In both cases indenting the function itself separately works, and the
bug happens if you select the whole buffer and indent that with
erlang.el (C-x C-q).

I'm using Emacs 24.3.1 with latest erlang.el from maint.

Is it possible to fix this in the existing indenter?

[1] https://github.com/rebar/rebar/blob/620c4b01c/src/rebar.erl#L318-L365
[2] https://github.com/rebar/rebar/blob/620c4b01c/src/rebar_erlc_compiler.erl#L92-L112


From tuncer.ayaz@REDACTED  Mon Aug 26 13:34:09 2013
From: tuncer.ayaz@REDACTED (Tuncer Ayaz)
Date: Mon, 26 Aug 2013 13:34:09 +0200
Subject: [erlang-bugs] erlang.el mis-indents whole-buffer selection
In-Reply-To: <CAOvwQ4gaOMctrtJesbZr-abDVDk0S29_gjF9bDjzf2fabf+Gyw@mail.gmail.com>
References: <CAOvwQ4gaOMctrtJesbZr-abDVDk0S29_gjF9bDjzf2fabf+Gyw@mail.gmail.com>
Message-ID: <CAOvwQ4ig6Qd3MMVFJpFbhZsqUggsxZMjyBoG4wUnhupf_Ae9tw@mail.gmail.com>

On Mon, Aug 26, 2013 at 1:25 PM, Tuncer Ayaz wrote:
> Previously it was just a certain[1] function in rebar.erl which got
> mis-indented when you did a whole-buffer indent, but now there's also
> a second[2] function which gets mis-indented.
>
> In both cases indenting the function itself separately works, and the
> bug happens if you select the whole buffer and indent that with
> erlang.el (C-x C-q).

Sorry, that should of course be (C-c C-q) instead.

> I'm using Emacs 24.3.1 with latest erlang.el from maint.
>
> Is it possible to fix this in the existing indenter?
>
> [1] https://github.com/rebar/rebar/blob/620c4b01c/src/rebar.erl#L318-L365
> [2] https://github.com/rebar/rebar/blob/620c4b01c/src/rebar_erlc_compiler.erl#L92-L112


From watson.timothy@REDACTED  Tue Aug 27 12:20:06 2013
From: watson.timothy@REDACTED (Tim Watson)
Date: Tue, 27 Aug 2013 11:20:06 +0100
Subject: [erlang-bugs] common_test + test_server_io errors
In-Reply-To: <52178327.4010003@erlang.org>
References: <5A095356-16E8-4224-BA03-AA58C30FC28D@gmail.com>
 <51CC4E35.8090408@erlang.org>
 <3B4CC6D8-4CB5-4236-AEDC-CBAF51A9E7AE@gmail.com>
 <CALhYyxM-56eNnXFco4=hQDmVtfZ3U2ac35R1mPqTwGKfL4FA2w@mail.gmail.com>
 <51DFB044.50302@erlang.org> <B20079A1-AB50-4507-AB58-9492E51232C6@gmail.com>
 <A8261DF1-D16C-4266-A17E-DDDEC7184B7D@gmail.com>
 <52011A87.5080203@erlang.org>
 <CALhYyxNJcFMj2Gf6h3RqRTvj1jMX35g3p3HhDv15Y4WDmRQpMg@mail.gmail.com>
 <CALhYyxOd4MrJM4Ma9Trou+HBY2_1y-d9GdDSYNEFhz4=RXrPAQ@mail.gmail.com>
 <CALhYyxMVr8vUxxXg_ZzxsLLn1D4sZ1UAsWED1WsRCGsUwgR9Sg@mail.gmail.com>
 <Pine.LNX.4.64.1308141324270.20836@ancalagon.otp.ericsson.se>
 <CALhYyxOta2xndMSsTq-DPyYOr5KHqy0CgGOBKMxJLVb6xoKWwg@mail.gmail.com>
 <52178327.4010003@erlang.org>
Message-ID: <49873C2D-69C7-4BA1-85D6-C02CE302C032@gmail.com>

On 23 Aug 2013, at 16:43, Peter Andersson wrote:
> 
> Ok, I have something for you now that will hopefully work. Please test
> it as soon as you can and get back to me!
> 

Hi Peter! Thanks for this - I'll get it tested this afternoon.

> I've modified both Common Test and Test Server and you'll find the
> changes in this branch:
> 
>  git://github.com/peppe-erlang/otp.git peppe/common_test/cth_ctrl
> 

[snip]

> Here's how logging works (which I mean to document properly in the
> User's Guide before the upcoming release): All printouts with ct:log/2
> or ct:pal/2, or any error/progress reports that happen in the pre-test
> phase are saved in a log file which you find a link to on the CT
> Framework Log page. Similarly, all printouts/reports that happen in the
> post-test phase, are saved in the same log file, and you get a link to
> this section also on the CT Framework Log page.
> 

Great.

> When tests run, printouts and reports are saved as usual in the test
> case log files, or in the Unexpected I/O Log (for any printouts that
> can't be associated to a particular test case).
> 
> I hope this solution works for you. Please get back to me with comments
> and questions!
> 

Will do. Thanks again for looking at this. I'll get back to you as soon as I've had a chance to test it.

Cheers,
Tim

From magnus@REDACTED  Wed Aug 28 17:05:29 2013
From: magnus@REDACTED (Magnus Henoch)
Date: Wed, 28 Aug 2013 16:05:29 +0100
Subject: [erlang-bugs] erlang.el mis-indents whole-buffer selection
In-Reply-To: <CAOvwQ4gaOMctrtJesbZr-abDVDk0S29_gjF9bDjzf2fabf+Gyw@mail.gmail.com>
 (Tuncer Ayaz's message of "Mon, 26 Aug 2013 13:25:41 +0200")
References: <CAOvwQ4gaOMctrtJesbZr-abDVDk0S29_gjF9bDjzf2fabf+Gyw@mail.gmail.com>
Message-ID: <m2li3lkg92.fsf@mail.gmail.com>

Tuncer Ayaz <tuncer.ayaz@REDACTED> writes:

> Previously it was just a certain[1] function in rebar.erl which got
> mis-indented when you did a whole-buffer indent, but now there's also
> a second[2] function which gets mis-indented.
>
> In both cases indenting the function itself separately works, and the
> bug happens if you select the whole buffer and indent that with
> erlang.el (C-x C-q).
>
> I'm using Emacs 24.3.1 with latest erlang.el from maint.
>
> Is it possible to fix this in the existing indenter?
>
> [1] https://github.com/rebar/rebar/blob/620c4b01c/src/rebar.erl#L318-L365
> [2] https://github.com/rebar/rebar/blob/620c4b01c/src/rebar_erlc_compiler.erl#L92-L112

It seems like this happens when the code being indented has not yet been
made visible, and thus lazy syntax highlighting has not yet happened.
The syntax table for erlang-mode is unable to handle some combinations
of characters; in particular, when a string ends with a dollar sign (as
in "foo$"), the dollar sign plus the double quote would be treated as a
character constant were it not for some magic regexps in
font-lock-syntactic-keywords.  Manually scrolling through the buffer
before reindenting seems to make the problem go away.

This is less than satisfactory, of course.  Looking up online help for
font-lock-syntactic-keywords in a modern Emacs gives:

  This variable is obsolete since 24.1;
  use `syntax-propertize-function' instead.

And the NEWS file for Emacs 24.1 contains:

  *** New variable `syntax-propertize-function'.
  This replaces `font-lock-syntactic-keywords' which is now obsolete.
  This allows syntax-table properties to be set independently from font-lock:
  just call syntax-propertize to make sure the text is propertized.
  Together with this new variable come a new hook
  syntax-propertize-extend-region-functions, as well as two helper functions:
  syntax-propertize-via-font-lock to reuse old font-lock-syntactic-keywords
  as-is; and syntax-propertize-rules which provides a new way to specify
  syntactic rules.

This sounds like the right way to solve the problem, though of course
you won't know until you try...

Regards,
Magnus


From glorybox.away@REDACTED  Wed Aug 28 20:30:21 2013
From: glorybox.away@REDACTED (Sergey Sinkovskiy)
Date: Wed, 28 Aug 2013 21:30:21 +0300
Subject: [erlang-bugs] [inets] httpc cookie parsing
Message-ID: <CACjVhxFxZpB5PQxDOjDR4smCOCrdAkH_jL7L2Cch-wEq_XNXRA@mail.gmail.com>

Some servers send empty Set-Cookie header, which leads to process crash
with following stacktrace:

{function_clause,
                      [{string,substr,
                           [[],1,-1],
                           [{file,"string.erl"},{line,207}]},
                       {httpc_cookie,parse_set_cookie,2,
                           [{file,"httpc_cookie.erl"},{line,347}]},
                       {httpc_cookie,'-parse_set_cookies/2-lc$^1/1-1-',2,
                           [{file,"httpc_cookie.erl"},{line,339}]},
                       {httpc_cookie,cookies,3,
                           [{file,"httpc_cookie.erl"},{line,202}]},
                       {httpc_handler,handle_cookies,4,
                           [{file,"httpc_handler.erl"},{line,1250}]},
                       {httpc_handler,handle_response,1,
                           [{file,"httpc_handler.erl"},{line,1186}]},
                       {gen_server,handle_msg,5,
                           [{file,"gen_server.erl"},{line,604}]},
                       {proc_lib,init_p_do_apply,3,
                           [{file,"proc_lib.erl"},{line,239}]}]},

RFC doesn't allow header to be empty, so this isn't a bug in inets.
Could such headers be just skipped from parsing?

-- 
Sergey Sinkovsky
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130828/0475355b/attachment.htm>

From roberto.aloi@REDACTED  Thu Aug 29 10:22:15 2013
From: roberto.aloi@REDACTED (Roberto Aloi)
Date: Thu, 29 Aug 2013 10:22:15 +0200 (CEST)
Subject: [erlang-bugs] Potential issue with Erlang CT when ct:pal/X is
 called from a config callback module
In-Reply-To: <1044738081.66436.1377764355546.JavaMail.zimbra@erlang-solutions.com>
Message-ID: <1576174141.66515.1377764535200.JavaMail.zimbra@erlang-solutions.com>

Hi all,

I might have encountered a tiny issue with the CT logging facilities in R15B01.
Details here:

https://gist.github.com/robertoaloi/5884093

Is the R15B03 behaviour expected?

Kind regards,

Roberto Aloi
---
Erlang Solutions Ltd.
www.erlang-solutions.com


From watson.timothy@REDACTED  Thu Aug 29 12:31:49 2013
From: watson.timothy@REDACTED (Tim Watson)
Date: Thu, 29 Aug 2013 11:31:49 +0100
Subject: [erlang-bugs] common_test + test_server_io errors
In-Reply-To: <49873C2D-69C7-4BA1-85D6-C02CE302C032@gmail.com>
References: <5A095356-16E8-4224-BA03-AA58C30FC28D@gmail.com>
 <51CC4E35.8090408@erlang.org>
 <3B4CC6D8-4CB5-4236-AEDC-CBAF51A9E7AE@gmail.com>
 <CALhYyxM-56eNnXFco4=hQDmVtfZ3U2ac35R1mPqTwGKfL4FA2w@mail.gmail.com>
 <51DFB044.50302@erlang.org> <B20079A1-AB50-4507-AB58-9492E51232C6@gmail.com>
 <A8261DF1-D16C-4266-A17E-DDDEC7184B7D@gmail.com>
 <52011A87.5080203@erlang.org>
 <CALhYyxNJcFMj2Gf6h3RqRTvj1jMX35g3p3HhDv15Y4WDmRQpMg@mail.gmail.com>
 <CALhYyxOd4MrJM4Ma9Trou+HBY2_1y-d9GdDSYNEFhz4=RXrPAQ@mail.gmail.com>
 <CALhYyxMVr8vUxxXg_ZzxsLLn1D4sZ1UAsWED1WsRCGsUwgR9Sg@mail.gmail.com>
 <Pine.LNX.4.64.1308141324270.20836@ancalagon.otp.ericsson.se>
 <CALhYyxOta2xndMSsTq-DPyYOr5KHqy0CgGOBKMxJLVb6xoKWwg@mail.gmail.com>
 <52178327.4010003@erlang.org>
 <49873C2D-69C7-4BA1-85D6-C02CE302C032@gmail.com>
Message-ID: <BCA84A58-1977-4A6C-98EF-1ACFE094AE5F@gmail.com>

Hi Peter,

This works perfectly well for me. In fact, I can even skip the whole cth_ctrl since the pre/post logging appears separately in the HTML logs, which is good enough for my use case. Thanks very much for getting this sorted!

Cheers,
Tim

On 27 Aug 2013, at 11:20, Tim Watson wrote:

> On 23 Aug 2013, at 16:43, Peter Andersson wrote:
>> 
>> Ok, I have something for you now that will hopefully work. Please test
>> it as soon as you can and get back to me!
>> 
> 
> Hi Peter! Thanks for this - I'll get it tested this afternoon.
> 
>> I've modified both Common Test and Test Server and you'll find the
>> changes in this branch:
>> 
>> git://github.com/peppe-erlang/otp.git peppe/common_test/cth_ctrl
>> 
> 
> [snip]
> 
>> Here's how logging works (which I mean to document properly in the
>> User's Guide before the upcoming release): All printouts with ct:log/2
>> or ct:pal/2, or any error/progress reports that happen in the pre-test
>> phase are saved in a log file which you find a link to on the CT
>> Framework Log page. Similarly, all printouts/reports that happen in the
>> post-test phase, are saved in the same log file, and you get a link to
>> this section also on the CT Framework Log page.
>> 
> 
> Great.
> 
>> When tests run, printouts and reports are saved as usual in the test
>> case log files, or in the Unexpected I/O Log (for any printouts that
>> can't be associated to a particular test case).
>> 
>> I hope this solution works for you. Please get back to me with comments
>> and questions!
>> 
> 
> Will do. Thanks again for looking at this. I'll get back to you as soon as I've had a chance to test it.
> 
> Cheers,
> Tim


From peppe@REDACTED  Thu Aug 29 12:46:52 2013
From: peppe@REDACTED (Peter Andersson)
Date: Thu, 29 Aug 2013 12:46:52 +0200
Subject: [erlang-bugs] Potential issue with Erlang CT when ct:pal/X is
 called from a config callback module
In-Reply-To: <1576174141.66515.1377764535200.JavaMail.zimbra@erlang-solutions.com>
References: <1576174141.66515.1377764535200.JavaMail.zimbra@erlang-solutions.com>
Message-ID: <Pine.LNX.4.64.1308291237580.12971@ancalagon.otp.ericsson.se>


Hi Roberto,

This is because of modifications to the logging system in CT that we 
introduced in R15B03. It's not an expected behaviour, rather one we regard 
as a bug. It exists in R16B01 as well and has been reported previously. 
We have fixed it already for the upcoming R16B02 release, which is being 
released in Sep.

Best regards,
Peter

Ericsson AB, Erlang/OTP

On Thu, 29 Aug 2013, Roberto Aloi wrote:

> Hi all,
>
> I might have encountered a tiny issue with the CT logging facilities in R15B01.
> Details here:
>
> https://gist.github.com/robertoaloi/5884093
>
> Is the R15B03 behaviour expected?
>
> Kind regards,
>
> Roberto Aloi
> ---
> Erlang Solutions Ltd.
> www.erlang-solutions.com
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://erlang.org/mailman/listinfo/erlang-bugs
>


From peppe@REDACTED  Fri Aug 30 09:29:52 2013
From: peppe@REDACTED (Peter Andersson)
Date: Fri, 30 Aug 2013 09:29:52 +0200
Subject: [erlang-bugs] common_test + test_server_io errors
In-Reply-To: <BCA84A58-1977-4A6C-98EF-1ACFE094AE5F@gmail.com>
References: <5A095356-16E8-4224-BA03-AA58C30FC28D@gmail.com>
 <51CC4E35.8090408@erlang.org> <3B4CC6D8-4CB5-4236-AEDC-CBAF51A9E7AE@gmail.com>
 <CALhYyxM-56eNnXFco4=hQDmVtfZ3U2ac35R1mPqTwGKfL4FA2w@mail.gmail.com>
 <51DFB044.50302@erlang.org> <B20079A1-AB50-4507-AB58-9492E51232C6@gmail.com>
 <A8261DF1-D16C-4266-A17E-DDDEC7184B7D@gmail.com> <52011A87.5080203@erlang.org>
 <CALhYyxNJcFMj2Gf6h3RqRTvj1jMX35g3p3HhDv15Y4WDmRQpMg@mail.gmail.com>
 <CALhYyxOd4MrJM4Ma9Trou+HBY2_1y-d9GdDSYNEFhz4=RXrPAQ@mail.gmail.com>
 <CALhYyxMVr8vUxxXg_ZzxsLLn1D4sZ1UAsWED1WsRCGsUwgR9Sg@mail.gmail.com>
 <Pine.LNX.4.64.1308141324270.20836@ancalagon.otp.ericsson.se>
 <CALhYyxOta2xndMSsTq-DPyYOr5KHqy0CgGOBKMxJLVb6xoKWwg@mail.gmail.com>
 <52178327.4010003@erlang.org> <49873C2D-69C7-4BA1-85D6-C02CE302C032@gmail.com>
 <BCA84A58-1977-4A6C-98EF-1ACFE094AE5F@gmail.com>
Message-ID: <Pine.LNX.4.64.1308300928030.8430@ancalagon.otp.ericsson.se>


Hi Tim,

That's good news, thanks for letting me know! I will wrap this up then and 
release it with R16B02.

Cheers,
Peter

On Thu, 29 Aug 2013, Tim Watson wrote:

> Hi Peter,
>
> This works perfectly well for me. In fact, I can even skip the whole cth_ctrl since the pre/post logging appears separately in the HTML logs, which is good enough for my use case. Thanks very much for getting this sorted!
>
> Cheers,
> Tim
>
> On 27 Aug 2013, at 11:20, Tim Watson wrote:
>
>> On 23 Aug 2013, at 16:43, Peter Andersson wrote:
>>>
>>> Ok, I have something for you now that will hopefully work. Please test
>>> it as soon as you can and get back to me!
>>>
>>
>> Hi Peter! Thanks for this - I'll get it tested this afternoon.
>>
>>> I've modified both Common Test and Test Server and you'll find the
>>> changes in this branch:
>>>
>>> git://github.com/peppe-erlang/otp.git peppe/common_test/cth_ctrl
>>>
>>
>> [snip]
>>
>>> Here's how logging works (which I mean to document properly in the
>>> User's Guide before the upcoming release): All printouts with ct:log/2
>>> or ct:pal/2, or any error/progress reports that happen in the pre-test
>>> phase are saved in a log file which you find a link to on the CT
>>> Framework Log page. Similarly, all printouts/reports that happen in the
>>> post-test phase, are saved in the same log file, and you get a link to
>>> this section also on the CT Framework Log page.
>>>
>>
>> Great.
>>
>>> When tests run, printouts and reports are saved as usual in the test
>>> case log files, or in the Unexpected I/O Log (for any printouts that
>>> can't be associated to a particular test case).
>>>
>>> I hope this solution works for you. Please get back to me with comments
>>> and questions!
>>>
>>
>> Will do. Thanks again for looking at this. I'll get back to you as soon as I've had a chance to test it.
>>
>> Cheers,
>> Tim
>
>