[erlang-questions] Changing scheduler behaviour to make tests less deterministic

Wed Dec 27 02:26:12 CET 2006

I have some unit tests which run in a single Erlang node but which are
internally concurrent -- i.e. they test subsystems that spawn multiple
processes and interact with each other.

As the current BEAM scheduler is based on #reductions consumed, my
tests are quite deterministic.
The tests are not _entirely_ deterministic as they use my Berkeley DB
driver, which makes system IO calls in private threadpools before
sending replies back to Erlang, so that injects some degree of
unpredictability into 'receive' times. And even that limited degree of
non-determinism has uncovered one or two concurrency-related bugs,
e.g. incorrect handling of out-of-order messages, poor choice of
timeouts on receives, etc.

So to achieve more coverage of concurrency-interactions in the tests,
I want to make a cheap, easy hack to the BEAM scheduler to make it
less ploddingly deterministic on every test run.

One trivial way would be to use "erl -smp enable +S 2" (or more), to
introduce more OS threads running Erlang processes.  But paradoxically
that introduces some 'relatively uncontrolable' determinism (the OS
scheduler), which could indeed find some concurrency-sensitive bugs,
but might make them harder to reproduce (to verify fixes).

I'd prefer to introduce some more controlable determinism, e.g. by
changing the size of the process quantum (the number of reductions it
is allowed execute before it is pre-empted).
Obviously we need to be careful to preserve fairness, avoid thrash etc.

The first basic test would be to change this to a different constant
for all processes.  The current value is 2000 reductions, so we might
try running tests with values of 500, 1000, 1500, 3000.   Actually, I
want to put a loop around the test driver and try a large range of
values with fairly small increments.)

Unfortunately the setting is a compile-time constant in a header file
(see below)

	erts/emulator/beam/erl_vm.h:#define CONTEXT_REDS 2000   /* Swap
process out after this number */

I'd really like to avoid building multiple versions of the emulator,
and instead override CONTEXT_REDS from an environment variable at
startup.

The next step might be to make this a per-process value, settable via
spawn_opt() -- like heap_size, or fullsweep_after.  This would give a
different form of process 'priorization' than the current
high/medium/low, and which might even have interesting uses in
production.

But I'd like to try the simple approach first.

My questions:

  1. Does anyone on the BEAM team envisage problems if I change the
#define of CONTEXT_REDS to a reference to a global variable, which I
initialize from an environment variable at some suitably early point?
      (I've done basic research, and #1 looks feasible to me -- see
below.  But I just want to check I'm not missing something subtle.)

  2. Does anyone see a better way to achieve the goal?

  3. Is the 'reductions_per_quantum' parameter for spawn_opt feasible?
 Interesting?  Or too dangerous (risking starvation) to be worth it?

Many thanks,

Chris

CONTEXT_REDS is not used in many places:

	% cd erlang/otp_src_R11B-2
	% grep REDUCTIONS ***/*.h
	erts/emulator/beam/erl_vm.h:#define INPUT_REDUCTIONS (2 * CONTEXT_REDS)

	% grep CONTEXT_REDS ***/*.h
	erts/emulator/beam/bif.h:       (p)->fcalls = -CONTEXT_REDS;            \
	erts/emulator/beam/bif.h:       else if ((p)->fcalls < -CONTEXT_REDS)      \
	erts/emulator/beam/bif.h:           (p)->fcalls = -CONTEXT_REDS;           \
	erts/emulator/beam/erl_vm.h:#define CONTEXT_REDS 2000   /* Swap
process out after this number */
	erts/emulator/beam/erl_vm.h:#define INPUT_REDUCTIONS (2 * CONTEXT_REDS)

(The references in bif.h are for macros called BUMP_REDS() and BUMP_ALL_REDS())

CONTEXT_REDS creeps into ETS code (erl_db.c) and even lists:member()
and lists:keysearch(), apparently to limit the amount of work they do
in a slot.

	% grep CONTEXT_REDS ***/*.c
	erts/emulator/beam/bif.c:        BIF_RET2(old_value, CONTEXT_REDS);
	erts/emulator/beam/bif.c:    if (reds > CONTEXT_REDS) {
	erts/emulator/beam/bif.c:        reds = CONTEXT_REDS;
	erts/emulator/beam/erl_bif_lists.c:    int max_iter = 10 * CONTEXT_REDS;
	erts/emulator/beam/erl_bif_lists.c:         BIF_RET2(am_true,
CONTEXT_REDS - max_iter/10);
	erts/emulator/beam/erl_bif_lists.c:    BIF_RET2(am_false,
CONTEXT_REDS - max_iter/10);
	erts/emulator/beam/erl_bif_lists.c:    max_iter = CONTEXT_REDS * 40;
	erts/emulator/beam/erl_bif_lists.c:    int max_iter = 10 * CONTEXT_REDS;
	erts/emulator/beam/erl_db.c:    if (++i > CONTEXT_REDS) {
	erts/emulator/beam/erl_db_tree.c:    int max_iter = CONTEXT_REDS * 10;
	erts/emulator/beam/erl_db_tree.c:    BUMP_REDS(p,CONTEXT_REDS - max_iter / 10);
	erts/emulator/beam/erl_process.c:       calls = CONTEXT_REDS;

The main use is in erl_process.c:

 * schedule() is called from BEAM (process_main()) or HiPE
 * (hipe_mode_switch()) when the current process is to be
 * replaced by a new process. 'calls' is the number of reduction
 * steps the current process consumed.
 * schedule() returns the new process, and the new process'
 * ->fcalls field is initialised with its allowable number of
 * reduction steps.