[erlang-questions] : Subtle behaviour of Erlang scheduler

Tue May 29 11:02:23 CEST 2007

KatolaZ wrote:
> 
> Well, guys, I thought Erlang was really going to be used in 
> many different application fields, from telecommunication 
> platforms to web apps, from massively concurrent systems to 
> distributed high performance applications.

It _is_ used in telecommunication platforms, web apps,
massively concurrent systems to distributed high 
performance platforms. It is also used in banking and
invoicing systems.

> I also dreamed about Erlang as a real, affordable, safe and 
> robust alternative to many ugly and still widely-used 
> languages and platforms, such as C#/.net and Java, or at 
> least I understood so, attending the last two Erlang User 
> Conferences in 2005 and 2006.
> I also started to talk about erlang at university courses.....
>
> But the last two mails, which suddenly close the Pandora Vase 
> we opened looking a little deep inside the process scheduler, 
> simply explain that Erlang developers (or at least Ericsson 
> OTP group), has little or no interest on having a real 
> robust, affordable, distributed, secure and real-time platform.

We must have been talking past each other then.
My mail attempted to convey the message that we've 
found OTP to perform excellently in soft-real-time
systems. There have been discussions before about
"fixing" the process priorities in Erlang, but they 
usually end up in lack of consensus regarding what 
would be a better priority system. 

To really get to the bottom of how priorities work,
you should also analyse the effects of using 
selective receive. It's a very powerful construct,
but one of the drawbacks is that you make it much
more difficult to reason formally about reachability
and, I would assume, reactivity. This, and the fact
that Erlang tends to be used for extremely complex
soft-real-time systems, makes it very difficult to
predict the actual effects of different process
priority schemes.

I tried to highlight this. I did NOT mean to say
that because of this, we do not want to support
priority schemes that would be more suitable in
other applications. Obviously, if an application
works best if all processes run on normal priority,
this is simple enough to accomplish, even if there
are 256 priority levels available.

This should probably be documented, preferably with
a deep discussion on the challenges involved.

(Also, to clarify, Bengt and I are users of Erlang,
and have little to say about what they choose to 
implement. Like other users, we can request 
changes and new features. Since we happen to 
represent the largest Erlang-based product line,
we may have a bit more influence than some, but
we don't speak for OTP. Kenneth does.)

> After we discovered a big bug in the scheduler, and
> reported it to the community, the only answer we got
> was: we don't use that scheduler code in OTP,

No, this is not what the answer was.

Richard Green explained that it was not, in fact,
a scheduler bug (and if I understand him correctly,
your example would most likely not have locked up in 
an SMP environment with a sufficient amount of 
scheduler threads.)

The problem was unfortunate interaction between your
process and the code loader. The code loader, sensibly,
runs on normal priority. Its job is to fetch code from
the file system (or possibly from another erlang node
on the network), and this job shouldn't run as a 
blocking high-priority task - obviously, the file 
system is far too slow, and would potentially block
all work in the VM for a significant amount of time.

(Perhaps one way to address this problem would be 
to have the error handler detect that the calling
process is running on high priority, and then 
simply raise an exception if the code isn't loaded.
This would at least be better than calling on the
code loader, since that obviously has disastrous
consequences. In effect, high priority would mean
that you opt out of interactive code loading, which
I think is also fairly logical.)

If you want to run a process on high priority, it 
is advisable to make sure that all the code that this
process needs to execute has been loaded into memory.
This should be documented in big boldface print.

Now, Erlang has support for making sure that all the
code is available, and loaded, before applications 
are started. It's called embedded mode, and when 
enabled, it will load all code listed in the boot 
script.

If you want to use Erlang for applications where 
reactivity is of the essence, you should definitely
use embedded code loading.

BR,
Ulf W