[erlang-questions] On selective receive (Re: eep: multiple patterns)

Mon Jun 2 02:09:28 CEST 2008

I wrote:

 > >Whenever you have disjoint receive statements, you need to
 > >take care that there is a technique for emptying unexpected
 > >messages.

Edward Fine accidentally replied only to me directly:

 > Is this a good place to use the catch-all, or is there a better
 > technique? I ask this as a newcomer to Erlang.

(This posting also gives an alternative example to Valentin's
priority problem suggestion)

Consider a case where you are doing a scatter / gather algorithm
to spread processing across nodes or across different processing
algorithms.  To make it concrete, suppose we have a database
with 5 different tables and we need to collect information from each
table to assemble into a single view to the user.

The standard approach is to use the DB capability to join the tables.
This introduces a single point access problem since the database
server is doing all the work while the initiating process waits.

Instead we put each of the tables in a different DB, flat file or ets  
table.
Then we create a process for each one that provides caching and
an access interface using messages.  They may end up on the same
machine or on 5 different machines, but we will get parallelism on
the I/O and possibly on the cache and assembly processing (if there
are multiple cores or multiple machines in the case of cache and
assembly).

What does the code look like?  [Assume getQueries(UserId) generates
a list of queries that are related to the database information we would
like to display and that the length of this list matches the number of
DB processes we have. ]

doUserQuery(UserId) ->
    Queries = getQueries(UserId),
    QueryRef = make_ref(),
    [Pid ! {getData, QueryRef, UserId, Query} || {Pid, Query} <-  
lists:zip(DbPids, Queries)],
    Responses = collect_responses(QueryRef),
    display_db_info(Responses),
    erlang:send_after(1000, self(), {cleanup, QueryRef}).

This is a pretty hokey approach -- you would want something better
than a 1 second delay to tell you whether to eliminate old messages
from the queue, but it is a concrete example to describe why you
would want to use selective receive and what to do to make sure it
doesn't cause you a problem.

collectResponses(QueryRef) ->
    collectResponses(QueryRef, []).

collectResponses(QueryRef, Responses) ->
    receive
       {responseData, QueryRef, _UserId, Results} ->
            collectResponses(QueryRef, [Responses | Results])
    after 100 -> Responses
    end.

Again, my hokey example collects results as long as they are present
or no new ones show up for 100 milliseconds.

What we have so far is a single request message sent to 5 processes
and a function which implements selective receive to collect only the
messages that are in response to the initial request from a variety of
responders (hopefully all, but not if some are slow to respond).

What happens if we have a slow responding database, but it does
actually produce results after 1/2 second.  It was too slow to be  
collected
but it puts messages on the queue anyway.  If we have no mechanism
to clear them, they will build up and cause things to gradually slow  
down.

So at some higher level we need the following code:

main() ->
    receive
        %% Throw away late arriving results from a previous request
        {cleanup, QueryRef} -> dumpOldResults(QueryRef);
        {userRequest, UserId} -> doUserQuery(UserId)
    end,
    main().

dumpOldQueryResults(QueryRef) ->
     receive
         {responseData, QueryRef, _UserId, _Results} ->
              dumpOldQueryResults(QueryRef)
     after 0 -> ok
     end.

In the main function, we give priority to cleaning up old messages.
This will keep the queue short, however, it ensures a full queue
scan for every user request.  As long as the queue is short, that
won't hurt us.  Dumping old messages just cycles as fast as it can
accepting messages that have our unique token and ignoring the
rest of the data in the message.  If there are no clean up messages
remaining, we than accept a new user request (which will necessarily
cause the message queue to grow for a short period) and display the
results.

What did we see?  Selective receive used in 3 different ways:

1) To collect the results of a request (a two-way session conversation)
2) To handle self notifications for maintenance + user requests
3) To handle old messages from an expired session

It turns out the {cleanup, QueryRef} message is not necessary in
the above example and we can just consume all {responseData, ...}
messages inside main(), but it depends on how new requests are
placed on the queue and whether timing allows two requests to
be interleaved in the results set (you don't want to remove all the
responseData for a pending request that has not had time to collect
results yet).  Structuring as above gave more explicit different uses
of selective receive.

The problem remaining in the code above is that there is no
"catch all" clause.  Do we worry about that?  It depends on how the
system evolves.  If you interface to a known protocol and you have
covered all the messages supported via selective receive, then
you could do without a catch all.  If your system is evolving or there
are other processes or programmers who might inject new message
types, you need a catch all in the main/0 function (although you have
to be careful not consume something that should stay on the queue).

I have not tried this code, nor have I typed it into a erl prompt, so I
can't guarantee it even compiles.  Mostly it should give you ideas
about ways to use selective receive.

What if we didn't have selective receive?  I see two choices:

1) Start a thread and open a new socket to the databases for each
user request.  Maintain the conversations as independent channels.

2) Create a hash table of messages received related to each request.
This requires managing the conversation correlations yourself.

Both of these approaches are much more code than selective receive
requires and the complexity of concepts does not increase, so selective
receive is a better approach and a useful feature of erlang.

Is there a better way to manage the conversations rather than the whole
cleanup back channel messaging?

If you can spawn a new process for each request, the responses will
go to privately owned message queues.  When enough responses, or
enough time has passed, the newly spawned request process returns
its results and terminates.  Any messages stuck on the queue are
eliminated.  Any future messages are silently discarded since there is
no process to receive them.  If the backend DB process were monitoring
the request process, it could even interrupt its response to discard the
results rather than waiting for processing to complete and pass them
on to a non-existent process.

With erlang, there are many architectural choices when you consider
the uses of messaging and selective receive.

jay