Performance of selective receive
Sean Hinde
sean.hinde@REDACTED
Sun Nov 13 18:31:29 CET 2005
Alternative proposal:
Never, but never use async messages into a blocking server from the
outside world. The way to avoid this is to make the socket process
(or whatever it is) make a synchronous call into the gen_server
process. You then have to make the gen_server itself not block. The
neatest way is to spawn a new process to make the blocking call to
the backend, and use the gen_server:reply/2 mechanism to reply later.
You could also use a pool of backend processes and reject if you hit
"congestion".
see http://www.trapexit.org/docs/howto/
simple_non_blocking_pattern.html for one solution.
Sean
On 13 Nov 2005, at 16:50, Pascal Brisset wrote:
> Erlang's selective receive semantics definitely has advantages
> w.r.t. the FIFO communications found in other models.
> But it also comes with a performance penalty which has bitten us
> once in a production system. Ulf Wiger's presentation at EUC
> last week reminded me of it. Consider the following scenario:
>
>
> A server process is receiving requests at a constant rate. For
> each request, the server calls a backend service synchronously:
>
> server(State) ->
> receive Msg ->
> NewState = gen_server:call(backend, {State,Msg}),
> server(NewState)
> end.
>
> The system is dimensioned so that the CPU load is low (say 10 %).
> Now at some point in time, the backend service takes one second
> longer than usual to process one particular request.
> You'd expect that some requests will be delayed (by no more than
> one second) and that quality of service will return to normal
> within two seconds, since there is so much spare capacity.
>
> Instead, the following can happen: During the one second outage,
> requests accumulate in the message queue of the server process.
> Subsequent gen_server calls take more CPU time than usual because
> they have to scan the whole message queue to extract replies.
> As a result, more messages accumulate, and so on.
>
> snowball.erl (attached) simulates all this. It slowly increases
> the CPU load to 10 %. Then it pauses the backend for one second,
> and you can see the load rise to 100 % and remain there, although
> the throughput has fallen dramatically.
>
>
> Here are several ways to avoid this scenario:
>
> - When measuring the actual capacity of a system, always simulate
> all sorts of random delays and jitters everywhere. But in the
> example above, this would mean taking a huge security margin.
>
> - Explicitly detect the overload and reject requests when it occurs.
> This is the standard way of handling overloads. But it's hard to
> justify losing calls on a system that is running at 10 % CPU load.
> Plus, the overload develops so quickly that you need to flush the
> whole message queue in order to return to normal reasonably fast.
>
> - Add a proxy process dedicated to buffering requests from clients
> and making sure the message queue of the server remains small.
> This was suggested to me at the erlounge. It is probably the
> best solution, but it complicates process naming and supervision.
> And programmers just shouldn't have to wonder whether each server
> needs a proxy or not.
>
> - Modify Erlang to support multiple message queues per process.
> Essentially, this is what the buffering proxy above emulates.
>
> - Optimize the implementation of selective receive in the emulator.
> E.g. each message queue could have some kind of cache of the latest
> X receive patterns which have failed against the first Y messages
> of the queue. This is hard to implement in a useful way because
> of patterns with bound variables (see gen:wait_resp_mon/3).
>
> - Modify Erlang to add a "blocking send" primitive, or emulate it
> with rpc:call(process_info(Server,message_queue_len)), so that
> flow-control propagates end-to-end (ultimately, to some socket
> or file I/O). But bounding queues may introduce deadlocks.
>
> Any other ideas ?
>
> -- Pascal
>
>
> %% Demonstration of the snowball effect triggered by a transient
> %% overload and selective receive on a long message queue.
> %% Assumes /proc/stat in linux-2.6.3 format.
>
> %% Increate InitialPeriod if your system is too slow.
>
> -module(snowball).
> -compile(export_all).
>
> run() ->
> {ok,_} = gen_server:start_link({local,backend}, ?MODULE, 0, []),
> Server = spawn_link(?MODULE, server, [undefined]),
> spawn_link(?MODULE, monitor, [Server, get_ticks(), 0]),
> InitialPeriod = 500,
> Client = spawn_link(?MODULE, client, [Server, InitialPeriod]),
> receive after 2000 -> ok end,
> incr_load(Server, Client, get_ticks()),
> receive after 10000 -> ok end,
> gen_server:call(backend, pause),
> receive after 60000 -> ok end,
> erlang:halt().
>
> incr_load(Server, Client, {PrevBusy,PrevTotal}) ->
> receive after 10000 -> ok end,
> {Busy,Total} = Ticks = get_ticks(),
> Load = (Busy-PrevBusy) * 100 div (Total-PrevTotal),
> io:format("Average load was ~w %.~n", [Load]),
> if Load < 10 -> Client ! speedup, incr_load(Server, Client,
> Ticks);
> true -> io:format("~nModerate load reached.~n~n")
> end.
>
> monitor(Server, {PrevBusy,PrevTotal}, PrevCount) ->
> receive after 1000 -> ok end,
> {_, MQL} = process_info(Server, message_queue_len),
> {Busy,Total} = Ticks = get_ticks(),
> Count = gen_server:call(backend, read),
> Load = (Busy-PrevBusy) * 100 div (Total-PrevTotal),
> io:format("queue_length=~w load=~w % rate=~w/s~n",
> [MQL, Load, Count-PrevCount]),
> monitor(Server, Ticks, Count).
>
> get_ticks() ->
> {ok, F} = file:open("/proc/stat", [read]),
> case io:fread(F, '', "~*s ~d ~d ~d ~d") of
> {ok, [T1,T2,T3,T4]} -> file:close(F), {T1+T2+T3, T1+T2+T3+T4};
> _ -> exit(unsupported_proc_stat_format)
> end.
>
> utime() ->
> {MS,S,US} = now(),
> (MS*1000000+S)*1000000 + US.
>
> %%%% Client.
>
> client(Server, Period) ->
> io:format("~nClient sending ~w messages/s.~n~n", [1000000 div
> Period]),
> client(Server, Period, utime()).
>
> client(Server, Period, Next) ->
> Wait = lists:max([0, Next-utime()]) div 1000,
> receive speedup -> client(Server, Period*4 div 5)
> after Wait -> Server ! "hello", client(Server, Period, Next
> +Period)
> end.
>
> %%%% Server.
>
> server(State) ->
> receive Msg ->
> NewState = gen_server:call(backend, {State,Msg}),
> server(NewState)
> end.
>
> %%%% Backend.
>
> init(State) ->
> {ok, State}.
>
> handle_call(pause, _From, State) ->
> io:format("~nBackend pausing for 1 second.~n~n"),
> receive after 1000 -> ok end,
> {reply, ok, State};
> handle_call(read, _From, State) ->
> {reply, State, State};
> handle_call({S,_Msg}, _From, State) ->
> {reply, S, State+1}.
>
> %%%%
More information about the erlang-questions
mailing list