[erlang-questions] Message Receive Semantics (was eep:MultiplePatterns)

Sun Jun 8 23:11:44 CEST 2008

Valentin wrote:

> Back in 2000, when I just started with Erlang
>

Ok, you've got 2 years on me so your insights may be more grounded
than mine.

> (had a extensive commercial experience in procedural languages,  
> though), one of the first problems I tried to solve had to do with  
> priority handling. So I wrote something very ugly, but someting  
> that reflected a paridgm I had at the time -- I remember writing  
> something like this:
>
> loop_high( 0 ) -> loop_normal( 3 );
> loop_high( N ) ->
>  receive
>   {high, Msg} ->
>         process_msg( Msg ),
>         loop_high( N-1 )
>  after 0 -> receive
>        {normal, Msg}->
>             process_msg( Msg ),
>             loop_high( N )
>

This is the first revelation that a beginner needs to understand.
It sometimes takes a while to realize that the after clause of a
receive can be another receive (and that the semantics of that
are different than two successive receives).

> loop_low( 0 ) -> loop_high( 5 );
> loop_low( N ) -> ...
> ...
> ...
>

The next revelation is that this is polling.  You will consume
CPU time when there are no messages available.  This may
or may not matter to your application.  As you say below, it
feels grungy and it doesn't look like a very erlangish approach.

>
> The above code would do the trick, however, being ugly as it may --  
> would eventually trigger refactoring to the tune of:
>
> loop_high( 0 ) -> loop_normal( 3 );
> loop_hihg(N) ->
>   receive
>      {high, Msg} ->
>          process_msg( Msg ),
>          loop_high( N-1 );
>       {normal, Msg} ->
>           process_msg( Msg ),
>           loop_high( N );
>        {low, Msg} ->
>           process_msg( Msg ),
>           loop_high(N);
>         ANY  ->
>              case process_msg( ANY ) of
>                  stop -> exit( normal );
>                  _      -> loop_high( N )
>              end
>    after 1000 -> loop_high( N )
>   end.
>

Are you sure the above works?  Suppose you have
a Low message followed by a High message.
Won't the Low message get handled first?

The same applies to the other clauses.  I also
don't see the purpose of the 1000 timeout above,
but again you end up with a polling.

>
> What I'm trying to say, I guess -- it does not really matter if one  
> is novice or not. We all know how to think, and only through  
> thinking we can arrive to a point where we could articulate the  
> "right" kind of questions. Nothing wrong in making a few mistakes  
> in a process.
>

I may have mis-stated my objection by phrasing it as a challenge to  
beginners.
I conflated two issues: 1) advanced message handling requires one to  
become
fluent and familiar with simpler message handling first, and 2)  
priority messaging
is a pathological case for erlang's sequential mailbox approach.

#1 is true of any new feature of any new language, however, I would  
bet the
majority of new programmers to erlang have never used another language
that has as simple a mechanism for message handling, and therefore are
likely to be unfamiliar with the various options that receive  
enables.  The same
would not be true of lists or hash tables.

#2 is a result of the sequential, and wisely chosen simple nature, of  
the
receive mechanism.  I don't believe it is possible for a single  
process to
implement priority without polling, unless the message queue is unloaded
and maintained using standard data structures.  That's what I mean by
pathological -- normal use of the receive statement leads to bad results
and unorthodox means to handle things efficiently.  Although the choice
that the language designers chose makes a lot of other cases very easy
to implement and very easy to change your architecture.  Overall it is
a win, but always be aware of the worst case scenario so you can avoid
it.

(My general recommendation on priority messaging is to avoid it by
changing the architecture.  Often times it is just a side effect of the
previous techniques for handling messages.  The lightweightedness
of erlang processes allows alternative approaches to standard
priorities.)

Suppose the requirements change to add one more level of priority.
It ripples through all the receive statements.  Not a desirable outcome.

>
> BTW, how easy one may make a mistake depends largely on how the  
> problem was stated. If one says:
>
>
>
> 3) Always handle messages in the following order:
>     a) 5 messages of {reply, high, Msg}
>     b) 3 messages of {reply, normal, Msg}
>     c) 1 message of {reply, low, Msg}
>     d) 'EXIT' message
>
>
>
> one should not get surprised if the resulting code blocks until 5  
> high priority messages have been processed. However, if one  
> indicates that preference should be given to high priority  
> messages, than normal, followed by low priority message, utilising  
> a ratio of 5:3:1, and stipulate that should there no higher  
> priority message, lower one should be processed --  one may close a  
> semantic gap, and thus prevent a novice from making a mistake. All  
> of that in far better English, of course ;-).
>
>

Well, I did botch the explanation, but it seemed a rather arbitrary
requirement to start with.  Did you really mean 5:3:1 or was that an  
easy
way to approximate priority because you can't easily do it with
a normal receive?  Wouldn't you really want to process as many High
priority messages as quickly as possible, and then normal and low as
you get a chance, maybe in parallel?   Why not let the erlang process
time-slicing manage it for you, using the erlang process priority to
give preference to high, normal and low messages in that order?

Since there is one message queue and one process, you can't use
any of the built in process time-sharing or parallelism.  Even if you
unload the mailbox and put items in a priority queue, you will still
have one process to handle them.  The only way that relies on the
VM semantics of fair scheduling is to use a router process to receive
items and to forward them to one of 3 different processes for handling.
Using that approach also allows the flexibility to add new priorities
easily (one new pattern and one new process), although it is still a
little tricky to manage the balance of high vs normal vs low if you  
don't
just want the VM to treat them equally -- there are two choices: 1) use
process priorities and 2) send a series of messages followed by a
request to ack them so that the router controls the flow by waiting
for acks before adding messages to the process queues.

Doing it with a router process is the most flexible (it allows both
selective receive and unloading the mailbox to a priority queue,
plus new priority levels have less impact).  It is also arguably much
easier to understand the resulting code.  It is unfortunately a forced
solution to simplify the complexity caused by serial queues.

jay