[erlang-questions] Process scheduling and punishment

Fri Jul 15 19:46:15 CEST 2011

One way that can happen is if lots of binaries are created faster than the garbage collector can easily consume them.  Then memory consumption should be higher and the process should be slower.  Generally, the way to deal with that, is handling all the binaries within a spawned (linked) process which is short-lived, to force (i.e., encourage) more immediate garbage collection.  This code forces the garbage collection as much as possible https://github.com/okeuday/CloudI/blob/master/src/lib/unused/src/immediate_gc.erl , however, only testing would determine that such an extreme is necessary.

- Michael

On 07/15/2011 03:40 AM, Knut Nesheim wrote:
> Dear list,
>
> We have a case where a gen_server gets "slow" after it has handled
> many messages, while what it does stays exactly the same. We suspect
> the scheduling of the process changes. I was hoping someone on the
> list could shed some light on why this happens and if there is any way
> to avoid it.
>
> When repeatedly running the same test suite, after some time we notice
> random parts of the test suite getting two orders of magnitude slower.
> The tests query our server over HTTP and the roundtrip times goes from
> ~1ms to 75-100ms at a very sharp point. After this point, it stays at
> the same level until the gen_server is restarted. The CPU usage of the
> beam process stays around 5-10% and from etop we see no change.
>
> What happens is basically this:
>  * From short-lived processes spawned by misultin we query a single
> gen_server while measuring wall clock between the point where we send
> the message and get the reply. From this point of view, the gen_server
> starts out fast, most calls take only a couple of hundred microseconds
> to complete.
>  * Inside the gen_server we do very little work and from measuring the
> wallclock time we spend consistently less than 100 microseconds.
>  * Around 10 times per second, from the gen_server we send a message
> containing roughly 1000 words to a logging process.
>  * At the point where the misultin processes starts measuring the
> gen_server as slow, we still spend consistently less than 100
> microseconds.
>  * At this point, we also see messages(no more than one) in the
> message queue of the process, which is weird as end to end we are
> sequential so the process has nothing to do but handle these messages.
>  * At no point do we see the logging process having messages in the
> queue. It is using the same amount of cpu in both states.
>
> Is it the case that our gen_server is "punished" due to overloading
> the logging process? Is there any way to measure if the VM considers
> our logging process to be overloaded? Is there any general form of
> "punishment" for very busy processes that might cause starvation for
> our gen_server?
>
> In our live system we have many of these gen_servers, but the request
> rate is much lower and they do very little logging(if at all). If it
> is the case that our gen_server is punished, what would happen when we
> have ten thousand of them? If all servers log at some point in it's
> life and one server goes crazy which causes the log process to be
> overloaded, will all servers be punished?
>
> Thanks
> Knut