[erlang-questions] Ideas for a new Erlang

Matthew Reilly matthew.reilly@REDACTED
Thu Jun 26 03:08:19 CEST 2008


On Wed, 2008-06-25 at 16:33 -0700, Matthew Reilly wrote:
> On Tue, 2008-06-24 at 16:23 +0200, Sven-Olof Nystr|m wrote:
> > Hello everyone, 
> > 
> > I have written down some thoughts and ideas on the future development
> > of Erlang.
> > 
> > Main topics:
> > 
> >  - some notes on implementation and backward compatibility
> > 
> >  - an alternative to Erlang's selective receive
> 
> In terms of Erlang's selective receive, it's possible to get around the
> O(N) look up issue w/o a language change (albeit with a slight change to
> the VM) if you restrict which patterns you use.
> 
> (Review of the selective receive problem)
> The problem is that given code like:
>   receive
>      some_term -> some_value
>   end.
> each time this receive is hit, the whole message queue is scanned. If
> there are N unmatched messages pending, then this take O(N) each time
> this bit of code is hit.
> 
> However, if the receive pattern is static (i.e. it does not include an
> already bound variable), then these duplicate scans would be redundant
> -- we already know that if a message is not matched once, it won't match
> the same pattern in the future. The next time we hit the same receive,
> we'd only need to scan messages that have been added since we last did
> the same selective receive. (See below about dealing with non-static
> patterns).
> 
> In this way, each message would be patterned matched at most once per
> selective receive, and thus, regardless of the size of the message
> queue, selective receives would become an amoritized O(1) task.
> 
> The current message queue is implemented in the VM by a set of 3
> pointers: "first", "save", "last". "first" points to the first item in
> the queue, "last" points to the place to add the next item, and "save"
> is a temp pointer used to traverse the message queue.
> 
> A way to implement O(1) selective receives  would be to add an
> (initially empty) list of saved {id,pointer} pairs to each message
> queue. When a selective receive finishes scans through the message
> queue, it stores a pointer to the next message would have processed
> (keyed to something that uniquely identifies this receive). When the
> same selective receives occurs again, it just needs to start at that
> saved location (instead of at the "first" message in the list). As an
> optimization, once the message queue becomes empty, this list can be
> wiped out.
> 
> This adds one pointer's worth of space to each process. Very little
> overhead is added for the normal case where all receives have a default
> clause.  
> 
> This is a sketch of how the VM would need to be modified to have
> amortized O(1) selective receives.
> 
> Currently a selective receive such as 
>   func() ->
>    receive
>      {high_priority, Message} -> Message
>      after 0 -> no_message
>    end.
> is implemented in bytecode such as:
>   {label,14}.
>     {loop_rec,{f,16},{x,0}}.
>     {test,is_tuple,{f,15},[{x,0}]}.
>     {test,test_arity,{f,15},[{x,0},2]}.
>     {get_tuple_element,{x,0},0,{x,1}}.
>     {get_tuple_element,{x,0},1,{x,2}}.
>     {test,is_eq_exact,{f,15},[{x,1},{atom,high_priority}]}.
>     remove_message.
>     {move,{x,2},{x,0}}.
>     return.
>   {label,15}.
>     {loop_rec_end,{f,14}}.
>   {label,16}.
>     timeout.
>     {move,{atom,no_message},{x,0}}.
> 
> Here the "loop_rec" opcode returns the message pointed to by the "save"
> pointer (which is normally the same as "first").
> If the message queue is empty, it jumps to the "timeout" opcode which
> resets the "save" pointer to "first".
> If the message queue is not empty, the message is pattern matched
> against {high_priority,_}. 
>   If it doesn't match, it calls the opcode "loop_rec_end" when sets
> "save" to "save->next", and jumps back to the "loop_rec" opcode.
>   If it does match, it returns the second item of the tuple, and calls
> the "remove_message" opcode which removes the message from the queue and
> also sets "save" to "first".
> 
> Suppose we introduce two new opcodes:
> restore_queue_pos
> save_queue_pos
> 
> save_queue_pos stores the current value of "save" indexed against some
> unique key.
> restore_queue_pos returns either the value saved by save_queue_pos
> (possibly modified by remove_message) or "first" if there was no saved
> value.
> The remove_message opcode needs be modified to update any saved queues
> positions (in case they point to the message that is being deleted) or
> remove them all if the queue becomes empty.
> 
> The VM code for the same selected receive using these new opcodes:
> 
>     {restore_queue_pos,XXX}.
>   {label,14}.
>     {loop_rec,{f,16},{x,0}}.
>     {test,is_tuple,{f,15},[{x,0}]}.
>     {test,test_arity,{f,15},[{x,0},2]}.
>     {get_tuple_element,{x,0},0,{x,1}}.
>     {get_tuple_element,{x,0},1,{x,2}}.
>     {test,is_eq_exact,{f,15},[{x,1},{atom,high_priority}]}.
>     {save_queue_pos,XXX}.
>     remove_message.
>     {move,{x,2},{x,0}}.
>     return.
>   {label,15}.
>     {loop_rec_end,{f,14}}.
>   {label,16}.
>     timeout.
>     {move,{atom,ok},{x,0}}.
> In the above, XXX represents a value unique to the selective receive and
> is generated an unspecified way.
> 
> Care would need to be taken for dynamic code reloading. E.g. when a
> module is reloaded, the VM might want to reset all the values previously
> stored by save_queue_pos. 
> 
> Static vs. Non-Static selective receives

> This technique will fail for receives that contain bound variables. e.g.
>   Ref = make_ref(),
>   Pid ! {command, Ref},
>   receive
>      {some_rather_unique_response_atom, Ref} -> ...
>      after Timeout -> ...
>   end
> This can not use the above technique since each time this runs, the
> receive is matching against a different reference.
> 
> It's possible to re-write the code to:
> func1(Pid) ->
>    Ref = make_ref(),
>    Pid !  {command,Ref},
>    func2(Ref,Pid).
> 
> func2(Ref1,Pid) ->
>    % This is now a static pattern since Ref2 is unbound
>    receive
>       {some_rather_unique_response_atom, Ref2} ->
>           if
>             Ref1 == Ref2 -> 
>                ...;
>             true ->
>                % We drop messages the we wouldn't have before 
>                func2(Ref1,Pid)
>           end
>       after Timeout -> ...
>    end.
>        


Actually, there's a better way of handling that particular case. One of
the most common selective receive pattern is one that has a bound
variable containing a freshly generated reference. Since, semantically,
references are supposed to be unique, when doing a the receive, we can
ignore all the messages that existed in our queue before the reference
was created (there is no proper way that any of those messages could
contain the reference).

Note: it's techincally possible for an existing message to contain a
reference that has not yet been created (e.g. if the reference was
stored in mnesia and a node restarted), but we wouldn't want to match
those anyway.

This could be implemented with another new opcode "last_queue_pos" which
sets "save" to "last".

For example, this code:
  func(Pid) ->
     MonRef = erlang:monitor(process,Pid),
     Ref = make_ref(),
     Pid ! {request, Ref},
     receive
         {reply, Ref} -> got_response;
         {'DOWN',MonRef,_,_,_} -> pid_is_gone
         after 1000 -> timeout
     end.
could be implemented as this bytecode:

{function, func, 1, 18}.
  {label,17}.
    {func_info,{atom,small},{atom,func},1}.
  {label,18}.
    {allocate_zero,2,1}.
    {move,{x,0},{x,1}}.
    {move,{atom,process},{x,0}}.
    {move,{x,1},{y,1}}.
    {'%live',2}.
    last_queue_pos.
    {call_ext,2,{extfunc,erlang,monitor,2}}.
    {move,{x,0},{y,0}}.
    {'%live',0}.
    {call_ext,0,{extfunc,erlang,make_ref,0}}.
    {test_heap,3,1}.
    {put_tuple,2,{x,1}}.
    {put,{atom,request}}.
    {put,{x,0}}.
    {move,{x,0},{x,2}}.
    {move,{y,1},{x,0}}.
    {move,{x,2},{y,1}}.
    {'%live',2}.
    send.
  {label,19}.
    {loop_rec,{f,23},{x,0}}.
    {test,is_tuple,{f,22},[{x,0}]}.
    {select_tuple_arity,{x,0},{f,22},{list,[2,{f,20},5,{f,21}]}}.
  {label,20}.
    {get_tuple_element,{x,0},0,{x,1}}.
    {get_tuple_element,{x,0},1,{x,2}}.
    {test,is_eq_exact,{f,22},[{x,1},{atom,reply}]}.
    {test,is_eq_exact,{f,22},[{x,2},{y,1}]}.
    remove_message.
    {move,{atom,got_response},{x,0}}.
    {deallocate,2}.
    return.
  {label,21}.
    {get_tuple_element,{x,0},0,{x,1}}.
    {get_tuple_element,{x,0},1,{x,2}}.
    {test,is_eq_exact,{f,22},[{x,1},{atom,'DOWN'}]}.
    {test,is_eq_exact,{f,22},[{x,2},{y,0}]}.
    remove_message.
    {move,{atom,pid_is_gone},{x,0}}.
    {deallocate,2}.
    return.
  {label,22}.
    {loop_rec_end,{f,19}}.
  {label,23}.
    {wait_timeout,{f,19},{integer,1000}}.
    timeout.
    {move,{atom,timeout},{x,0}}.
    {deallocate,2}.
    return.


The compiler just needs to know which BIFs return new references (e.g.
erlang:monitor/2 and make_ref/0), and detect if the results of these
functions are used in *each* branch of a receive. If so, it could do a
last_queue_pos before the BIF that returns the reference, so that the
receive would start with the first message that arrived after the
last_queue_pos was called.




> Because we now drop certain messages, this is not exactly semantically
> the same as the first version. However, in general,  if a message does
> match this pattern, it's most likely a response to a previous call on
> which we timed out. The approriate action in that case would be to drop
> the message.
> 
> 
> 
> > 
> >  - a simple language mechanism to allow function in-lining across
> >    module boundaries
> > 
> >  - a new mechanism for introducing local variables with a more cleanly
> >    defined semantics
> > 
> >  - a mini-language to allow the efficent implementation of low-level
> >    algorithms
> > 
> > The paper can be downloaded here:
> > <http://user.it.uu.se/~svenolof/new-erlang.pdf>
> > 
> > 
> > 
> > Sven-Olof Nyström
> > 
> > _______________________________________________
> > erlang-questions mailing list
> > erlang-questions@REDACTED
> > http://www.erlang.org/mailman/listinfo/erlang-questions
> 




More information about the erlang-questions mailing list