[erlang-questions] How much load can supervisors handle?

Fri Oct 26 09:27:58 CEST 2012

On Fri, Oct 26, 2012 at 7:44 AM, Chris Hicks
<silent_vendetta@REDACTED> wrote:
>> Sounds like you should make you own custom supervisors and not use the
>> "standard" supervisor. The above suggests that you need a process
>> management layer.
>>
>> I'm very reluctant to pre-guess anything about performance - best is
>> to write the
>> clearest posible code - run and measure - things like more memory or an
>> SSD have
>> enormous impact on efficiency. For some applications the difference
>> beween 4G and 8G
>> of memory make a large difference - for others no difference at all.
>> It all depends.
>>
>> When you say "thousands of processes" I have no idea if this means
>> "thousands of tiny process with 1K stacks and heaps" or "thousands of
>> processes
>> with stacks and heaps of tens of MBytes" - the difference (and the
>> architectures)
>> is huge.
>>
>> This is why there is no alternative to "code and measure".
>>
>> Unfortunately logic cannot be applied
>>
>> if P takes time A and Q takes time B
>> how long does P+Q take?
>>
>> This is not a science P+Q should take A+B on a sequential computer,
>> and max(A,B) on a parallel computer. But this is not the case.
>>
>> Performance estimation is a black art - the only thing I know is the old
>> truth
>> "parsing inputs" is slow.
>>
>> Cheers
>>
>> /Joe
>
> Joe,
>
> You are, of course, absolutely right about the need for testing and I've got
> a lot of time set out for it so that I can tune the system as need be. I
> also know better about providing detailed information and really should have
> included the system parameters. You, and others, have provided the
> information I needed though and I certainly have some places to start as I
> was primarily concerned about some of the higher level concepts. I just
> don't know enough about the inner workings of the language to be able to
> anticipate some of the performance characteristics and just wanted a gauge
> on if I was thinking about the problem correctly.
>
> Chris.
>
> ps. (Yes, this means you can stop reading now :) )
>
> If anyone is at all interested here's what I know about the system:
>
> Not at all unreasonable to expect 20,000 long lived workers. Long lived
> means several seconds to infinity, not counting code upgrades, outages,
> etal. Assumption is that roughly half would be permanent, another quarter or
> so would last a few hours to almost a day at the extreme end, and the rest
> in the seconds to minutes range. These workers will have stacks/heaps of no
> more then 5K, with probably 90% having 2K or less.

Good - numbers - I like numbers, we can play with numbers.

You didn't say how long time the 20K workers take to arrive, but no
matter, we can
do a little experiment anyway ...

I made a simple measurements to get a feeling for this (the code follows ...)

I'll start 20K processes, each process has a 5KB heap,
and lives forever.

They are started by a single process - which is a bottleneck - I just want to
measure the times and memory.

> test1:test(20000).
{937304,{150075784,7503}}

So it took 0.9 seconds (ie 46 micro seconds/process)
and used 150 Meg of memory - the space per process is 7503 bytes

This was done on a dual core 2.53 GHz 4 Meg memory machine

(it might be an idea to hibernate the sleeping processes here - I
haven't tried this)

You could parallelize the master routine - to reduce this 0.9 seconds
since it's a dual core you might get a factor 2 - how many masters you'd need
needs tweaking. Say 2 times the number of cores. On a dual core try
3,4,5 ... masters

I strongly suspect that these figures tell you nothing and that the
real problems will be in the tcp overheads - assuming you keep a load of
sockets open. This is more difficult to measure - you need a cluster to
load the server to the point where it breaks to measure this.

Here's the program I wrote for these measurements.

It gives no more than a ball-park figure to start playing with
but might give a vague indication of what to expect

Cheers

/Joe

-module(test1).

-compile(export_all).

test(N) ->
    timer:tc(?MODULE,test0,[N]).

test0(N) ->
    M1 = proplists:get_value(total, erlang:memory()),
    Pid = spawn(fun master/0),
    loop(N, Pid),
    M2 = proplists:get_value(total, erlang:memory()),
    {M2-M1, (M2-M1) div N}.

loop(0,_) -> true;
loop(N, Pid) ->
    rpc(Pid, start_child),
    loop(N-1, Pid).

rpc(Pid, M) ->
    Pid ! {self(), M},
    receive
	{Pid, Reply} ->
	    Reply
    end.

master() ->
    receive
	{From, start_child} ->
	    Pid = start_child(),
	    From ! {self(), ack},
	    master()
    end.

start_child() ->
    spawn(fun child/0).

child() ->
    %% 5K Byte word heap
    %% one list cell = 8 bytes on 32 bit erlang
    list_to_binary(lists:duplicate(625, 42)),
    receive
	after infinity ->
		true
	end.

>
> It's also not unreasonable to expect short lived (microseconds) workers
> numbering from several hundred bursting up to 20,000, sustaining peak load
> for up to several hours the vast majority of the time with extreme instances
> lasting for weeks on end. In these last cases, however, there would still be
> some variation during the day so, while not technically 'peaked' the whole
> time, the system load, at the low points, would be much higher than is
> expected for the majority of the time. These will have stacks/heaps which
> will vary, by quite a bit. Since trying to estimate user input patterns
> without specific data on an already built system (I don't even have
> infrastructure for it yet, obviously) is pointless I just don't have
> anything solid to go on. However, my gut tells me that the majority will
> remain under 1k while a still significant minority will return 2k-5k in
> increasingly smaller amounts the higher you go, but probably not on a linear
> scale.
>
> Outside of a few supporting applications such as os_mon or a tcp/http server
> the vast majority of the work will be done by one worker type, hence the
> requirement for one type of supervisor handling up to 40k of the same type
> of worker with a lot of churn. This is also going to facilitate, at some
> point, the distribution of the workers across multiple physical nodes with
> an eye to linear scalability, and redundancy. There's more but that is, I
> think, all the primary requirements for the system that are going to shape
> the way most of it is structured.
>
> If anyone has any thoughts, to tell me I'm crazy, that they know of some
> resources, or that they just wasted two minutes of their life to read that
> and want them back (Hey, I told you that you could stop reading! I don't do
> refunds.), feel free.