[erlang-questions] : Subtle behaviour of Erlang scheduler

Wed May 30 01:36:35 CEST 2007

KatolaZ wrote:
> On Tue, May 29, 2007 at 12:11:38PM +0200, Ulf Wiger (TN/EAB) wrote:
>> So, for the purposes of discussion, how would you feel 
>> about a change as the one I suggested - do simply disallow
>> on-demand code loading for high priority processes?
>>
>> I would assume that it wouldn't hurt the existing code
>> base, since there are so few processes running on 
>> high priority, and the designers have thought carefully
>> about these aspects...
>>
>> ... or so I thought, until I implemented it.
>>
>> Below is a diff for the error_handler.erl. 
>> It simply checks if the process has priority
>> high or max, and if so doesn't call on the code
>> server.
>>
>> Interestingly, running with this patch causes 
>> global_group to crash at startup with an undef,
>> when calling net_kernel. :-D
>>
>> I fixed that temporarily by moving the call to 
>> process_flag(priority, max) to the end of the init/1
>> function in global_group.erl.
> 
> :-DDD
> 

This is a good example, I think.

The effect, regarding priority, of doing a gen_server:call to a server
on prio normal from an high prio process is more or less a temporary
decrease of prio to normal of the caller process during the call.

I don't know that much about global_group, but it is no problem that it
happens to have max prio when triggering code loading of the net_kernel
code during initialization, assuming that everything else is as it
should. The initialization begins with an increase of prio from normal
to max, then net_kernel:monitor_nodes is called, which apparently caused
the code of net_kernel to be loaded. We could switch those two lines of
code, i.e. making sure that the code of net_kernel is loaded before
increasing prio. The switch would not have much of an effect, though. If
it should have been important with max prio at this point, the process
should have been spawned with prio max (spawn_opt). The place where the
max prio is of interest comes later after the initialization. A
temporary decrease of prio after initialization may be okay too.

If it is important that a process effectively have high or max prio at
some certain point of execution, it should not trigger code loading at
that point. It is not automatically a problem if a high or max prio
processes causes code loading. It will become a problem if you have an
high or max prio process busy looping for long periods of time, but this
  is not how high prio is intended to be used. You will get other
problems too as Ulf pointed out.

> I perfectly understand what the point is :-) It seems too much easy to
> break existing code, in a way or another, and it is quite normal in a
> ten-years development. For this precise reason I thought that running 
> code server with high priority would somehow solve the problem, but I 
> agree with the fact that filesystem interaction could also be heavy and
> slow...
> 
> So, just for the purpose of discussion, why don't think at "virtual"
> synchronisation points for high priority procs ? I.e., if a high
> priority task has not been interrupted for X reductions (beeing X a
> relatively large integer), then goto do_schedule1 anyway, letting
> other high priority process to run....

If I understood you right, you want the scheduling to work as it is
implemented right now.

Regardless of priority, a process is unconditionally scheduled out when
it has consumed (currently) 2000 reductions (since it was last scheduled
in). I.e., a busy loop in a high priority process will not prevent other
high priority processes from running, but it will prevent normal and low
priority processes from running.

> In this way, code server could
> be put into high-prio queue, without problems for other high-prio
> procs...
> 

Increasing the prio of the code server to high wont be a problem for
other high prio processes.

We don't know *exactly* what the impact will be on the system. In order
to verify that it is safe, we have to thoroughly investigate it. As it
is now we don't see the need for that, since we do not consider the
behavior a bug. I don't say that the priority management cannot be
improved, it probably can, but currently it has low prio compared to
other things.

> I think I'm going to test this solution, and let you know if it
> works...
>

Please, do that.

A thumb-rule if you want us to include a new feature: If it is a small
amount of work for us (with implementation, verification, documentation,
etc) it is very much more likely to make it into OTP than if the amount
of work for us is large. Note, however, that there will always be work
for us in order to include a new feature, no matter how much work you
have done, and we *cannot* promise if/when a feature request make it
into OTP. A good motivation of why it is needed is, of course, also needed.

BR,
Rickard Green, Erlang/OTP, Ericsson AB.

> HND
> 
> Enzo
>