high priority signal for scheduling latency

Thu Jan 22 15:56:32 CET 2004

One issue we have for doing real-time work is latency
when high priority work comes in. 

Why i care is because i can't advocate erlang when i can't
"guarantee" the critical path in critical algorithms.

This is a big problem in non-erlang systems also because
preemptive scheduling doesn't work when shared locks are involved.
And shared locks are necessary in our system (vxworks, c++).
Even if shared locks aren't an issue, as with erlang, you have
the issue in erlang that preemptive scheduling solves, immediately
scheduling higher priority work. If you have a 50ms reroute budget
this can be very important.

One solution that we are considering, and i wondered if
it would work in erlang, is to have processes specify
their current priority. Code in critical paths would check
if higher priority work has come up and complete their operation
so the higher priority work could be performed.

This is a form of cooperative multitasking, it is not perfectly 
deterministic,
but is often good enough. We know this because we take careful
timings. And in the usual case the operation is completed successfully.

It is not the same as preemptive scheduling because the operation
is completed early when possible or is failed so the client can retry.
This means the lock is released.
It is not like tossing in yields for the same reason.

A common example is an iterator. The iterator takes a lock
to protect the table. It is iterating over an object that is used
in reroutes and is in the critical path. In the main body the
object is being serialized which takes a bit of time. If there
are many runs through the iterator we can block for too long.
The hackish solution was to give up the lock and reestablish
the iterator everytime through the loop. Yuck.

So, in the new world all work has a prioirity that is set globally.
At particular places in the code we check if we should abandon
work. A lot of our operations are already setup so that no
results can be returned so this isn't a big burden. Sending
a retry_later failure would also be acceptable.

thoughts?