[erlang-questions] Process scheduling and punishment

Fri Jul 15 12:40:14 CEST 2011

Dear list,

We have a case where a gen_server gets "slow" after it has handled
many messages, while what it does stays exactly the same. We suspect
the scheduling of the process changes. I was hoping someone on the
list could shed some light on why this happens and if there is any way
to avoid it.

When repeatedly running the same test suite, after some time we notice
random parts of the test suite getting two orders of magnitude slower.
The tests query our server over HTTP and the roundtrip times goes from
~1ms to 75-100ms at a very sharp point. After this point, it stays at
the same level until the gen_server is restarted. The CPU usage of the
beam process stays around 5-10% and from etop we see no change.

What happens is basically this:
 * From short-lived processes spawned by misultin we query a single
gen_server while measuring wall clock between the point where we send
the message and get the reply. From this point of view, the gen_server
starts out fast, most calls take only a couple of hundred microseconds
to complete.
 * Inside the gen_server we do very little work and from measuring the
wallclock time we spend consistently less than 100 microseconds.
 * Around 10 times per second, from the gen_server we send a message
containing roughly 1000 words to a logging process.
 * At the point where the misultin processes starts measuring the
gen_server as slow, we still spend consistently less than 100
microseconds.
 * At this point, we also see messages(no more than one) in the
message queue of the process, which is weird as end to end we are
sequential so the process has nothing to do but handle these messages.
 * At no point do we see the logging process having messages in the
queue. It is using the same amount of cpu in both states.

Is it the case that our gen_server is "punished" due to overloading
the logging process? Is there any way to measure if the VM considers
our logging process to be overloaded? Is there any general form of
"punishment" for very busy processes that might cause starvation for
our gen_server?

In our live system we have many of these gen_servers, but the request
rate is much lower and they do very little logging(if at all). If it
is the case that our gen_server is punished, what would happen when we
have ten thousand of them? If all servers log at some point in it's
life and one server goes crazy which causes the log process to be
overloaded, will all servers be punished?

Thanks
Knut
-- 
Engineering
http://www.wooga.com | phone +49 151 57202523 | fax +49-30-8964 9064

wooga GmbH | Saarbruecker Str. 38 | 10405 Berlin | Germany
Sitz der Gesellschaft: Berlin; HRB 117846 B
Registergericht Berlin-Charlottenburg
Geschaeftsfuehrung: Jens Begemann, Philipp Moeser