[erlang-questions] Ideas for a new Erlang

Fri Jun 27 06:18:47 CEST 2008

On 27 Jun 2008, at 7:18 am, Sven-Olof Nystr|m wrote:
>> 	nystrom_receive()	% default channel
>> 	nystrom_receive(Timeout)
>> 	nystrom_receive_from(Channel)
>> 	nystrom_receive_from(Channel, Timeout)
>> 	nystrom_receive_from_any(Channel_Set)
>> 	nystrom_receive_from_any(Channel_Set, Timeout)
>
> True. But they are all simpler than selective receive.

True, but misleading.  EACH of them is simpler than
selective receive.  Taken *together*, ALL of them are
as complex; there are just too many separated facts to
keep track of.  This has always been my gripe about spawn;
there are just too many spawn variants, making it hard
to think about them.  It is ridiculous that I have to
write
	spawn(fun () -> <<stuff> end)
rather than a plain
	fork <<stuff> end
with some more syntax for options.

And you have to ask WHAT is simpler.  A hammer is a lot
simpler in itself than a nail gun, but if you have a
of nails to put in, it's a lot easier to USE a nail gun
than a hammer.

Smalltalk has had a SharedQueue class since 1980 or
earlier.  Given that, I can implement an owner-restricted
receiving "channel" in about 10 minutes.  It's easy to
BUILD.  But for actually writing anything that has
anything but the most trivial protocol, given me Erlang's
receive any day, as being easier to USE.

>> But it gets worse.
>>
>>     if channels are "objects", they can go anywhere,
>>     BUT THEY DON'T GET THERE WITHOUT BEING CARRIED!
>
> In Erlang today, if one wants to keep track of communication with a
> particular process one needs to carry the pid.

Yes, but with 'channels', you have to carry MORE things.
Instead of putting complexity in one place (the 'receive'),
this spreads complexity all around the program.

>
>>
>> You have to explictly pass them around, especially loops.
>> So to handle the simple bounded buffer you would find
>> yourself writing stuff like this:
>>
>> 	buffer(Status, Contents, GetChan, PutChan) ->
>> 	   Channels = case Status
>> 			of full  -> [GetChan]
>> 			 ; empty -> [PutChan]
>> 			 ; _     -> [GetChan,PutChan]
>> 		      end,
>> 	   case nystrom_receive_from_any(Channels)
>> 	     of {GetChan,Who} ->
>> 		{Status1,Contents1,Msg} = pop(Contents),
>> 		Who ! Msg
>> 	      ; {PutChan,Msg} ->
>> 		{Status1,Contents1} = add(Contents, Msg)
>> 	   end,
>> 	   buffer(Status1, Contents1, GetChan, PutChan).
>
> I've commented on your bounded_buffer example in my previous mail. I'm
> not sure what you are doing here. It might help if you could show the
> example using selective receive.

The mail you say you have commented on included the
selective receive version first.  It's an absolutely standard
bounded buffer that accepts 'get' requests only when the buffer
is not empty and 'put' requests only when the buffer is not full.
Because there are two kinds of requests, under your scheme there
have to be two channels.
>
>
>>
>> That's assuming that multireceive returns a {Channel,Message}
>> pair.  Another approach would be to pass a list of {Channel,
>> Handler} pairs, when the code would look like
>>
>> 	buffer(Status, Contents, GetChan, PutChan) ->
>> 	    GetHandler = fun (Who) ->
>> 		{Status1,Contents1,Msg} = pop(Contents),
>> 		Who ! Msg,
>> 		buffer(Status1, Contents1, GetChan, PutChan)
>> 	    end,
>> 	    PutHandler = fun (Msg) ->
>> 		{Status1,Contents1} = add(Contents, Msg),
>> 		buffer(Status1, Contents1, GetChan, PutChan)
>> 	    end,
>> 	    nystrom_receive_from_any(
>> 		case Status
>> 		  of full  -> [{GetChan,GetHandler}]
>> 		   ; empty -> [{PutChan,PutHandler}]
>> 		   ; _     -> [{GetChan,GetHandler},
>> 			       {PutChan,PutHandler}]
>> 		end).
>>
>> How this is in any way simpler than a selective receive
>> entirely escapes me.
>
> This example escapes me, too.

That's extremely odd, because it is a simple question:
If a process wants at some point to receive from any one of
several channels, HOW DOES IT KNOW WHICH ONE IT GOT?

It won't do to say "receive from THE channel and do a case
test on the result" because the whole point is that we
*need* to receive selectively:  if you accept a 'get'
request when there is nothing in the buffer, you have
got to do something with it, and sending it back to
yourself is obviously the wrong thing to do.  So we
have to say
	- if the buffer is empty,
           the only thing I will accept is a 'put';
	- if the buffer is full,
	  the only thing I will accept is a 'get';
	- otherwise, I will accept either,
	  but I need to know which one I got.

So how do you know?  Does the multi-receive function
     - accept a list of channels and
       + return the INDEX of the chosen channel,
         after which you have to call ne_receive on
	that yourself
       + return the CHANNEL that it chose,
	after which you have to call ne_receive on
	that yourself
       + return the MESSAGE it accepted,
	without any indication of which channel it
	came from (this would be a bad idea because
	you would have to redundantly tag messages
	sent to particular channels)
       + return a {Message,Index} pair?
       + return a {Message,Channel} pair?
       + do something else?
     - accept a list of {Channel,Handler} pairs and
       + return Handler(Channel)?
       + return Handler(Message)?
       + return Handler(Channel, Message) so that
	the same Handler can be used with more than
	one channel
       + do something else?

Whichever one you pick, somebody will find it inconvenient
and implement one of the others on top of it, and then the
code will be harder to comprehend that code using a plain
old 'receive'.

Basically, you are creating complexity-in-USE in order to
address a problem which it turns out you do not solve at all.
I just don't see the point.

> Let me just say that I find your bounded buffer example profoundly
> unconvincing. The buffer may be bounded, but there is nothing to
> prevent another process from filling the mailbox with "put" messages.

That's not *MY* problem.  That's *YOUR* problem.
There is nothing in *YOUR* proposal to prevent that.
>
>
>
>> By the way, recall that the problem that this "simplification"
>> is supposed to solve is this: "it is considered bad style to
>> leave too many messages in the mailbox".  Let me quote from the
>> Concurrent ML documentation for the Mailbox structure:
>
> Well, that was never the main problem. Selective receive is complex
> and unnecessary.

I flatly deny both claims.

If you understand Erlang pattern matching, selective receive is
no more complex to *understand* than your ne_receive, and it is
simpler to *use*.  In fact, I would argue that it is precisely
the 'receive' construct in Erlang that makes Erlang such a joy
to use.   If I wanted extreme terseness coupled with strong
type checking, I know where to find Concurrent Haskell.  If I
wanted flexibility in constructing complex synchronisation
schemes coupled with strong type checking and a truly amazing
module system, I know where to find Concurrent ML.  In many ways
I find Erlang syntax clumsy and verbose these days.  (Don't point
to my Prolog book and say "it's rather like Prolog".  I was never
a fan of Prolog *syntax*.)  But 'receive' makes up for it all.

As for necessity, I have pointed out that Ada and Occam have
essentially similar constructs, and for that matter, so does CSP.
Take a look at section 5.4.4 of the Occam 2.1 manual.  (I am
quite aware of the differences between the Occam ALT CASE
construt and Erlang's receive.  I am drawing attention to the
similarities.)

> That was my main point. Programmers sometimes manage
> to get into problems with it, but it was always my impression that
> there were strategies to avoid these problems. I thought however, that
> the fact that one had to make certain precautions to avoid getting
> into trouble strengthened the argument against selective receive.

But your proposal does absolutely nothing to make mailboxes
easier or safer to use.  It removes the convenience and clarity
and gives nothing in return.