[erlang-patches] gen_stream intermittent errors

Raimo Niskanen raimo+erlang-patches@REDACTED
Mon Aug 23 09:44:02 CEST 2010


On Sun, Aug 22, 2010 at 08:21:24AM -0700, jay@REDACTED wrote:
> In the 3 failing test cases below, the problem is a timing issue. 
> Internally, the gen_stream has sent a {stop} message to all the children
> workers.  The result of requesting 'proc_info' is a property list of the
> number of procs requested to run, the number still active and the actual
> pids  of the processes.  In all the 3 cases the expected value for active
> is 0.
> 
> The bad match is because the workers have not yet been given a time slice
> to match their message queue and self-stop because of the {stop} message.
> 
> What is the proper way to give other procs a chance to run and consume
> their message queue in a test suite, before requesting the 'proc_info'
> results?

The question is if the problem should be solved in the test suite
or in the {stop} handshake (there seem to be no handshake today)?

By looping over the list orig_procs with is_process_alive/1 there
are inherent race conditions since you can not know if a process
is on the way down or not.

You either write an explicit handshake where the process receiving
{stop} replies, or process sending {stop} can use erlang:monitor/2
to ensure the stopped process has stopped.

I have not had time to dig into and understand the gen_stream code.
If there should be a handshake for {stop} and if it should be
through monitor depends on if one gen_stream can be stopped
from several others. It is a design question; how is a
gen_stream state defined and how can other gen_streams
know the state? Think distributed. If gen_streams are on
different nodes, what happens then?

To solve this problem only in the test suite sounds like sweeping
a real problem under the rug, but I am not certain about it.
You can e.g use erlang:monitor/2 to observe the processes going
down before calling gen_stream:proc_info/1, or by delay
repeate until it replies correctly.

> 
> jay
> 
> > The failing test case are:
> >
> > gen_stream_SUITE:block_buffers_file	SLES10_debug, WinXP, Solaris10
> > === location {gen_stream_SUITE,1325}
> > === reason = no match of right hand side value
> >                  {proc_info,[{requested,2},
> >                              {active,2},
> >                              {pids,[<0.22755.2>,<0.22756.2>]}]}
> 
> >   gen_stream_SUITE:num_buffers_file	SLES10_debug, WinXP, Solaris10
> > === location {gen_stream_SUITE,1258}
> > === reason = no match of right hand side value
> >                  {proc_info,[{requested,2},
> >                              {active,2},
> >                              {pids,[<0.22650.2>,<0.22651.2>]}]}
> 
> >   gen_stream_SUITE:num_procs_file	WinXP
> > === location {gen_stream_SUITE,1170}
> > === reason = no match of right hand side value
> >                  {proc_info,[{requested,1},{active,1},{pids,[<0.22480.2>]}]}
> 

-- 

/ Raimo Niskanen, Erlang/OTP, Ericsson AB


More information about the erlang-patches mailing list