[erlang-questions] Ideas for a new Erlang

Thu Jun 26 01:33:25 CEST 2008

On Tue, 2008-06-24 at 16:23 +0200, Sven-Olof Nystr|m wrote:
> Hello everyone, 
> 
> I have written down some thoughts and ideas on the future development
> of Erlang.
> 
> Main topics:
> 
>  - some notes on implementation and backward compatibility
> 
>  - an alternative to Erlang's selective receive

In terms of Erlang's selective receive, it's possible to get around the
O(N) look up issue w/o a language change (albeit with a slight change to
the VM) if you restrict which patterns you use.

(Review of the selective receive problem)
The problem is that given code like:
  receive
     some_term -> some_value
  end.
each time this receive is hit, the whole message queue is scanned. If
there are N unmatched messages pending, then this take O(N) each time
this bit of code is hit.

However, if the receive pattern is static (i.e. it does not include an
already bound variable), then these duplicate scans would be redundant
-- we already know that if a message is not matched once, it won't match
the same pattern in the future. The next time we hit the same receive,
we'd only need to scan messages that have been added since we last did
the same selective receive. (See below about dealing with non-static
patterns).

In this way, each message would be patterned matched at most once per
selective receive, and thus, regardless of the size of the message
queue, selective receives would become an amoritized O(1) task.

The current message queue is implemented in the VM by a set of 3
pointers: "first", "save", "last". "first" points to the first item in
the queue, "last" points to the place to add the next item, and "save"
is a temp pointer used to traverse the message queue.

A way to implement O(1) selective receives  would be to add an
(initially empty) list of saved {id,pointer} pairs to each message
queue. When a selective receive finishes scans through the message
queue, it stores a pointer to the next message would have processed
(keyed to something that uniquely identifies this receive). When the
same selective receives occurs again, it just needs to start at that
saved location (instead of at the "first" message in the list). As an
optimization, once the message queue becomes empty, this list can be
wiped out.

This adds one pointer's worth of space to each process. Very little
overhead is added for the normal case where all receives have a default
clause.  

This is a sketch of how the VM would need to be modified to have
amortized O(1) selective receives.

Currently a selective receive such as 
  func() ->
   receive
     {high_priority, Message} -> Message
     after 0 -> no_message
   end.
is implemented in bytecode such as:
  {label,14}.
    {loop_rec,{f,16},{x,0}}.
    {test,is_tuple,{f,15},[{x,0}]}.
    {test,test_arity,{f,15},[{x,0},2]}.
    {get_tuple_element,{x,0},0,{x,1}}.
    {get_tuple_element,{x,0},1,{x,2}}.
    {test,is_eq_exact,{f,15},[{x,1},{atom,high_priority}]}.
    remove_message.
    {move,{x,2},{x,0}}.
    return.
  {label,15}.
    {loop_rec_end,{f,14}}.
  {label,16}.
    timeout.
    {move,{atom,no_message},{x,0}}.

Here the "loop_rec" opcode returns the message pointed to by the "save"
pointer (which is normally the same as "first").
If the message queue is empty, it jumps to the "timeout" opcode which
resets the "save" pointer to "first".
If the message queue is not empty, the message is pattern matched
against {high_priority,_}. 
  If it doesn't match, it calls the opcode "loop_rec_end" when sets
"save" to "save->next", and jumps back to the "loop_rec" opcode.
  If it does match, it returns the second item of the tuple, and calls
the "remove_message" opcode which removes the message from the queue and
also sets "save" to "first".

Suppose we introduce two new opcodes:
restore_queue_pos
save_queue_pos

save_queue_pos stores the current value of "save" indexed against some
unique key.
restore_queue_pos returns either the value saved by save_queue_pos
(possibly modified by remove_message) or "first" if there was no saved
value.
The remove_message opcode needs be modified to update any saved queues
positions (in case they point to the message that is being deleted) or
remove them all if the queue becomes empty.

The VM code for the same selected receive using these new opcodes:

    {restore_queue_pos,XXX}.
  {label,14}.
    {loop_rec,{f,16},{x,0}}.
    {test,is_tuple,{f,15},[{x,0}]}.
    {test,test_arity,{f,15},[{x,0},2]}.
    {get_tuple_element,{x,0},0,{x,1}}.
    {get_tuple_element,{x,0},1,{x,2}}.
    {test,is_eq_exact,{f,15},[{x,1},{atom,high_priority}]}.
    {save_queue_pos,XXX}.
    remove_message.
    {move,{x,2},{x,0}}.
    return.
  {label,15}.
    {loop_rec_end,{f,14}}.
  {label,16}.
    timeout.
    {move,{atom,ok},{x,0}}.
In the above, XXX represents a value unique to the selective receive and
is generated an unspecified way.

Care would need to be taken for dynamic code reloading. E.g. when a
module is reloaded, the VM might want to reset all the values previously
stored by save_queue_pos. 

Static vs. Non-Static selective receives

This technique will fail for receives that contain bound variables. e.g.
  Ref = make_ref(),
  Pid ! {command, Ref},
  receive
     {some_rather_unique_response_atom, Ref} -> ...
     after Timeout -> ...
  end
This can not use the above technique since each time this runs, the
receive is matching against a different reference.

It's possible to re-write the code to:
func1(Pid) ->
   Ref = make_ref(),
   Pid !  {command,Ref},
   func2(Ref,Pid).

func2(Ref1,Pid) ->
   % This is now a static pattern since Ref2 is unbound
   receive
      {some_rather_unique_response_atom, Ref2} ->
          if
            Ref1 == Ref2 -> 
               ...;
            true ->
               % We drop messages the we wouldn't have before 
               func2(Ref1,Pid)
          end
      after Timeout -> ...
   end.

Because we now drop certain messages, this is not exactly semantically
the same as the first version. However, in general,  if a message does
match this pattern, it's most likely a response to a previous call on
which we timed out. The approriate action in that case would be to drop
the message.

> 
>  - a simple language mechanism to allow function in-lining across
>    module boundaries
> 
>  - a new mechanism for introducing local variables with a more cleanly
>    defined semantics
> 
>  - a mini-language to allow the efficent implementation of low-level
>    algorithms
> 
> The paper can be downloaded here:
> <http://user.it.uu.se/~svenolof/new-erlang.pdf>
> 
> 
> 
> Sven-Olof Nyström
> 
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions