Performance of selective receive

Sun Nov 13 17:50:46 CET 2005

Erlang's selective receive semantics definitely has advantages
w.r.t. the FIFO communications found in other models.
But it also comes with a performance penalty which has bitten us
once in a production system.  Ulf Wiger's presentation at EUC
last week reminded me of it.  Consider the following scenario:

A server process is receiving requests at a constant rate.  For
each request, the server calls a backend service synchronously:

    server(State) ->
        receive Msg ->
            NewState = gen_server:call(backend, {State,Msg}),
            server(NewState)
        end.

The system is dimensioned so that the CPU load is low (say 10 %).
Now at some point in time, the backend service takes one second
longer than usual to process one particular request.
You'd expect that some requests will be delayed (by no more than
one second) and that quality of service will return to normal
within two seconds, since there is so much spare capacity.

Instead, the following can happen: During the one second outage,
requests accumulate in the message queue of the server process.
Subsequent gen_server calls take more CPU time than usual because
they have to scan the whole message queue to extract replies.
As a result, more messages accumulate, and so on.

snowball.erl (attached) simulates all this.  It slowly increases
the CPU load to 10 %.  Then it pauses the backend for one second,
and you can see the load rise to 100 % and remain there, although
the throughput has fallen dramatically.

Here are several ways to avoid this scenario:

- When measuring the actual capacity of a system, always simulate
  all sorts of random delays and jitters everywhere.  But in the
  example above, this would mean taking a huge security margin.

- Explicitly detect the overload and reject requests when it occurs.
  This is the standard way of handling overloads.  But it's hard to
  justify losing calls on a system that is running at 10 % CPU load.
  Plus, the overload develops so quickly that you need to flush the
  whole message queue in order to return to normal reasonably fast.

- Add a proxy process dedicated to buffering requests from clients
  and making sure the message queue of the server remains small.
  This was suggested to me at the erlounge.  It is probably the
  best solution, but it complicates process naming and supervision.
  And programmers just shouldn't have to wonder whether each server
  needs a proxy or not.

- Modify Erlang to support multiple message queues per process.
  Essentially, this is what the buffering proxy above emulates.

- Optimize the implementation of selective receive in the emulator.
  E.g. each message queue could have some kind of cache of the latest
  X receive patterns which have failed against the first Y messages
  of the queue.  This is hard to implement in a useful way because
  of patterns with bound variables (see gen:wait_resp_mon/3).

- Modify Erlang to add a "blocking send" primitive, or emulate it
  with rpc:call(process_info(Server,message_queue_len)), so that
  flow-control propagates end-to-end (ultimately, to some socket
  or file I/O).  But bounding queues may introduce deadlocks.

Any other ideas ?

-- Pascal

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: snowball.erl
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20051113/1976d61d/attachment.ksh>