Performance of selective receive

Sun Nov 13 18:31:29 CET 2005

Alternative proposal:

Never, but never use async messages into a blocking server from the  
outside world. The way to avoid this is to make the socket process  
(or whatever it is) make a synchronous call into the gen_server  
process. You then have to make the gen_server itself not block. The  
neatest way is to spawn a new process to make the blocking call to  
the backend, and use the gen_server:reply/2 mechanism to reply later.  
You could also use a pool of backend processes and reject if you hit  
"congestion".

see http://www.trapexit.org/docs/howto/ 
simple_non_blocking_pattern.html for one solution.

Sean

On 13 Nov 2005, at 16:50, Pascal Brisset wrote:

> Erlang's selective receive semantics definitely has advantages
> w.r.t. the FIFO communications found in other models.
> But it also comes with a performance penalty which has bitten us
> once in a production system.  Ulf Wiger's presentation at EUC
> last week reminded me of it.  Consider the following scenario:
>
>
> A server process is receiving requests at a constant rate.  For
> each request, the server calls a backend service synchronously:
>
>     server(State) ->
>         receive Msg ->
>             NewState = gen_server:call(backend, {State,Msg}),
>             server(NewState)
>         end.
>
> The system is dimensioned so that the CPU load is low (say 10 %).
> Now at some point in time, the backend service takes one second
> longer than usual to process one particular request.
> You'd expect that some requests will be delayed (by no more than
> one second) and that quality of service will return to normal
> within two seconds, since there is so much spare capacity.
>
> Instead, the following can happen: During the one second outage,
> requests accumulate in the message queue of the server process.
> Subsequent gen_server calls take more CPU time than usual because
> they have to scan the whole message queue to extract replies.
> As a result, more messages accumulate, and so on.
>
> snowball.erl (attached) simulates all this.  It slowly increases
> the CPU load to 10 %.  Then it pauses the backend for one second,
> and you can see the load rise to 100 % and remain there, although
> the throughput has fallen dramatically.
>
>
> Here are several ways to avoid this scenario:
>
> - When measuring the actual capacity of a system, always simulate
>   all sorts of random delays and jitters everywhere.  But in the
>   example above, this would mean taking a huge security margin.
>
> - Explicitly detect the overload and reject requests when it occurs.
>   This is the standard way of handling overloads.  But it's hard to
>   justify losing calls on a system that is running at 10 % CPU load.
>   Plus, the overload develops so quickly that you need to flush the
>   whole message queue in order to return to normal reasonably fast.
>
> - Add a proxy process dedicated to buffering requests from clients
>   and making sure the message queue of the server remains small.
>   This was suggested to me at the erlounge.  It is probably the
>   best solution, but it complicates process naming and supervision.
>   And programmers just shouldn't have to wonder whether each server
>   needs a proxy or not.
>
> - Modify Erlang to support multiple message queues per process.
>   Essentially, this is what the buffering proxy above emulates.
>
> - Optimize the implementation of selective receive in the emulator.
>   E.g. each message queue could have some kind of cache of the latest
>   X receive patterns which have failed against the first Y messages
>   of the queue.  This is hard to implement in a useful way because
>   of patterns with bound variables (see gen:wait_resp_mon/3).
>
> - Modify Erlang to add a "blocking send" primitive, or emulate it
>   with rpc:call(process_info(Server,message_queue_len)), so that
>   flow-control propagates end-to-end (ultimately, to some socket
>   or file I/O).  But bounding queues may introduce deadlocks.
>
> Any other ideas ?
>
> -- Pascal
>
>
> %% Demonstration of the snowball effect triggered by a transient
> %% overload and selective receive on a long message queue.
> %% Assumes /proc/stat in linux-2.6.3 format.
>
> %% Increate InitialPeriod if your system is too slow.
>
> -module(snowball).
> -compile(export_all).
>
> run() ->
>     {ok,_} = gen_server:start_link({local,backend}, ?MODULE, 0, []),
>     Server = spawn_link(?MODULE, server, [undefined]),
>     spawn_link(?MODULE, monitor, [Server, get_ticks(), 0]),
>     InitialPeriod = 500,
>     Client = spawn_link(?MODULE, client, [Server, InitialPeriod]),
>     receive after 2000 -> ok end,
>     incr_load(Server, Client, get_ticks()),
>     receive after 10000 -> ok end,
>     gen_server:call(backend, pause),
>     receive after 60000 -> ok end,
>     erlang:halt().
>
> incr_load(Server, Client, {PrevBusy,PrevTotal}) ->
>     receive after 10000 -> ok end,
>     {Busy,Total} = Ticks = get_ticks(),
>     Load = (Busy-PrevBusy) * 100 div (Total-PrevTotal),
>     io:format("Average load was ~w %.~n", [Load]),
>     if Load < 10 -> Client ! speedup, incr_load(Server, Client,  
> Ticks);
>        true -> io:format("~nModerate load reached.~n~n")
>     end.
>
> monitor(Server, {PrevBusy,PrevTotal}, PrevCount) ->
>     receive after 1000 -> ok end,
>     {_, MQL} = process_info(Server, message_queue_len),
>     {Busy,Total} = Ticks = get_ticks(),
>     Count = gen_server:call(backend, read),
>     Load = (Busy-PrevBusy) * 100 div (Total-PrevTotal),
>     io:format("queue_length=~w  load=~w %  rate=~w/s~n",
> 	      [MQL, Load, Count-PrevCount]),
>     monitor(Server, Ticks, Count).
>
> get_ticks() ->
>     {ok, F} = file:open("/proc/stat", [read]),
>     case io:fread(F, '', "~*s ~d ~d ~d ~d") of
> 	{ok, [T1,T2,T3,T4]} -> file:close(F), {T1+T2+T3, T1+T2+T3+T4};
> 	_ -> exit(unsupported_proc_stat_format)
>     end.
>
> utime() ->
>     {MS,S,US} = now(),
>     (MS*1000000+S)*1000000 + US.
>
> %%%% Client.
>
> client(Server, Period) ->
>     io:format("~nClient sending ~w messages/s.~n~n", [1000000 div  
> Period]),
>     client(Server, Period, utime()).
>
> client(Server, Period, Next) ->
>     Wait = lists:max([0, Next-utime()]) div 1000,
>     receive speedup -> client(Server, Period*4 div 5)
>     after Wait -> Server ! "hello", client(Server, Period, Next 
> +Period)
>     end.
>
> %%%% Server.
>
> server(State) ->
>     receive Msg ->
> 	    NewState = gen_server:call(backend, {State,Msg}),
> 	    server(NewState)
>     end.
>
> %%%% Backend.
>
> init(State) ->
>     {ok, State}.
>
> handle_call(pause, _From, State) ->
>     io:format("~nBackend pausing for 1 second.~n~n"),
>     receive after 1000 -> ok end,
>     {reply, ok, State};
> handle_call(read, _From, State) ->
>     {reply, State, State};
> handle_call({S,_Msg}, _From, State) ->
>     {reply, S, State+1}.
>
> %%%%