[erlang-questions] How much load can supervisors handle?

Thu Oct 25 22:59:45 CEST 2012

On Thu, Oct 25, 2012 at 8:23 PM, Chris Hicks
<silent_vendetta@REDACTED> wrote:
>> Date: Thu, 25 Oct 2012 20:08:33 +0200
>> Subject: Re: [erlang-questions] How much load can supervisors handle?
>> From: erlang@REDACTED
>> To: silent_vendetta@REDACTED
>> CC: erlang-questions@REDACTED
>
>>
>> On Wed, Oct 24, 2012 at 11:18 PM, Chris Hicks
>> <silent_vendetta@REDACTED> wrote:
>> > Hello folks,
>> >
>> > I'm currently in the process of building an application which is going
>> > to be
>> > having a large amount of workers spinning up/down with large variations
>> > in
>> > the numbers of workers per second. At the high end it could be as many
>> > as
>> > 10k workers spawning per second, some being long-lived while the
>> > majority
>> > just do some work and then die, and at the low end just a couple
>> > hundred.
>> > This work could also be done by using a dynamically sized worker pool,
>> > but
>> > in either case my primary question is this: How much load can one
>> > supervisor
>> > handle?
>>
>> What do you want the supervisor to do when a process dies?
>>
>>
>> /Joe
>
> TL;DR: Not much
>
> Well, there are a couple of scenarios really. One is a static number of
> supervisors, with the number tuned based on testing of how much load will be
> generated by the system, and in that case the supervisors won't be doing
> anything other than the typical house cleaning demanded by the VM. Second
> scenario is a more dynamic tree of supervisors, which grows/shrinks based on
> current load, and in that case each supervisor would update some values
> (either just by sending a message to a monitoring process or updating a
> table somewhere directly based on the supervisor checking the number of
> children) and other parts of the system would handle distributing load,
> starting/stopping supervisors and migrating workers. Antwist on the second
> scenario would be to not have the supervisors do any of the updates
> themselves, but have a process polling the supervisors every X seconds and
> handling the updating itself, in which case the supervisors would be back to
> just doing the housecleaning the VM demands and responding to periodic
> requests about it's children.
>

Sounds like you should make you own custom supervisors and not use the
"standard" supervisor. The above suggests that you need a process
management layer.

I'm very reluctant to pre-guess anything about performance - best is
to write the
clearest posible code - run and measure - things like more memory or an SSD have
enormous impact on efficiency. For some applications the difference
beween 4G and 8G
of memory make a large difference - for others no difference at all.
It all depends.

When you say  "thousands of processes" I have no idea if this means
"thousands of tiny process with 1K stacks and heaps" or "thousands of processes
with stacks and heaps of tens of MBytes" - the difference (and the
architectures)
is huge.

This is why there is no alternative to "code and measure".

Unfortunately logic cannot be applied

if P takes time A and Q takes time B
how long does P+Q take?

This is not a science P+Q should take A+B on a sequential computer,
and max(A,B) on a parallel computer. But this is not the case.

Performance estimation is a black art - the only thing I know is the old truth
"parsing inputs" is slow.

Cheers

/Joe

> Chris.