scala (was checksum for distributed debugging?)

Fri Dec 9 00:39:42 CET 2005

[quote="Bob.Smart@REDACTED"]
 P.S. On a completely different matter, I'd be interested in
 comments on the Scala language (http://scala.epfl.ch/) by
 Erlang experts. It can be used in a fairly pure functional
 style using val declarations, and it seems to support
 Erlang-style message exchange (see chapter 3 of "Scala by
 Example" http://scala.epfl.ch/docu/files/ScalaByExample.pdf).
 Far be it from me to suggest that Erlang might not last
 forever, but if not this seems a possible migration path. 
[/quote]

( I've been trying to answer this a couple of times now, 
  but my answer always becomes too long and confusing.
  But since I spent the time writing it I'm posting it anyway. 
  If you have the time to read half of it you have too much
  time on your hands. )

In 2003, after seven years of hacking Erlang, I joined 
Prof. Odersky at EPFL and spent a year and a half
working in the Scala project.  

As both Ulf Wiger and Ke Han notes the biggest problem
with the Scala concurrency model is that it is based on
the underlying model of the VM it is running on.
I will try to go a bit deeper into this problem.

In his thesis Joe Armstrong lists a number of requirements on
a programming language and its libraries. In short,these are 
  R1) Concurrency, 
  R2) Error encapsulation, 
  R3) Fault detection, 
  R4) Fault identification, 
  R5) Code upgrade, and 
  R6) Stable storage.

According to Joe these requirements are fulfilled by Erlang
in the following way: 
  R1--by Erlang's processes, 
  R2--processes are designed as units of errors, 
  R2--processes fail if functions are called with the wrong arguments, 
  R3 and R4--when a process fails the reason for failure is
       broadcasted to linked processes, 
  R5--by hot code loading, and 
  R6--by the Erlang libraries dets and mnesia. 

There are some key elements in Erlang that according
to Joe makes it easier (or indeed possible) to build
fault-tolerant software systems. These are:
  K1 -- Processes,
  K2 -- Asynchronous message passing, 
  K3 -- Process links, 
  K4 -- No shared memory,
  K5 -- Code updates, 
  K6 -- and Lightweight processes.

Most of this can be implemented in Scala today, and
what is left might actually be fixed some time in the
future.

 K1 Processes & K6 Lightweight processes.
 ===========================
Concurrency in Scala is provided by the underlying
backend, at the moment either the JVM or the CLRE, which is assumed
to provide a thread model compatible with the thread model of Java.

This gives Scala concurrency "for free" and the possibility to 
use Java or .NET libraries that uses the native thread model.

A disadvantage is that threads in most Java implementations
are not very lightweight, and starting new threads as well as
context switching between them can be relatively slow. 
But there is nothing in the Scala language definition that
requires a heavy thread implementation. And there are some
experimental Java implementations that actually have very
light weight processes.   

Scala is extensible in many ways which makes it possible
to get a language within the language that looks very much
like Erlang. (We will use imports, def parameters, quoting,
and higher order functions to achive this.)

Given an implementation of processes in the class Process, 
we can in Scala define the spawn function as:

package scala.concurrent;
object Process {
  def spawn(def body:unit):Process = {
    val p = new Process(body);
    p.start();
    p;
  }
}

Here the def parameter body:unit takes a closure of the
type () => unit.  One nice aspect of the def parameter is that
the compiler automatically infers that a closure is needed at 
all call sites of spawn, making it unnecessary for the programmer
to write code to create the closure.  We can now create a new
process by just calling spawn with the code that the new process
should start executing. 

Assuming we have a Server object with a method 
 loop:Int=>Unit, 
we can create a process like this:
 import scala.concurrent.Process.spawn;
 ...
 val pid = spawn(Server.loop(0));

So far we have assumed an implementation of the class Process, 
let us now look at how this can be implemented in Scala.
All we need to do is to extend the Thread class and implement
the abstract run method of Thread:

class Process(def body:unit) extends Thread {
    override def run() = body;
}

Now we can start a process, but how do we stop it? Well just as in
Erlang a process stops when it has no more code to execute, i.e.,
when the code in body reaches its end. Sometimes we would like to
kill the process prematurely, in Erlang this is done by calling the
BIF exit(p:pid, reason:Term), in Scala we can also implement
exit. We implement it both in the Process class and in the
Process object in order to get an Erlang like syntax:

object Process {
  def spawn(def body:unit):Process = {
    val p = new Process(body);
    p.start();
    p;
  }
 def exit(p:Process,reason:AnyRef) =
   p.exit(reason);
}

class Process(def body:unit) extends Thread {
  private var exitReason:AnyRef = null;
  override def run() = {
    try {body}
      catch {
      case e:java.lang.InterruptedException =>
        exitReason.match {
        case null: =>
          Console.println("Process exited abnormally " + e);
        case _: => 
          Console.println("Process exited with reason: " + exitReason);
        }
      }
    } 

  def exit(reason:AnyRef):unit = {
    exitReason = reason;
    interrupt();
  }
}

Processes in Erlang can get their own pid by calling the BIF
self(), this can easily be simulated in the process class
by the method: def self = this;

We have one small problem though, in the example above we started
the process by calling Server.loop(0), but the object server
does not inherit from the class Process, and hence the method 
self is not available in the code of loop. 
We can fix this by implementing self in the process object:

  def self:Process = {
    if (Thread.currentThread().isInstanceOf[Process]) 
      Thread.currentThread().asInstanceOf[Process]
    else error("Self called outside a process");
  }

  K2 Asynchronous message passing
  ======================

So far our processes can only be created and execute code,
which is not too bad in it self, but in order to get Erlang
like processes we need to give the processes the ability to
communicate. In Erlang processes communicate through 
asynchronous message passing implemented with mailboxes.
We can implement the same mechanism in Scala, although here
we will go one step further and first implement a more general
mailbox which can be read by several processes.

On page 138 of the document "Scala by Example" 
(http://scala.epfl.ch/docu/files/ScalaByExample.pdf)
you can find an implementation of Erlang like mailboxes 
with the following signature: 
 class MailBox {
   def send(msg: Any): unit;
   def receive[a](f: PartialFunction[Any, a]): a;
   def receiveWithin[a](msec: long)(f: PartialFunction[Any, a]): a;
 }

There is a special message TIMEOUT which is used to signal a
time-out, implemented as:
 case class TIMEOUT;

The receive method first checks whether the message processor
function f can be applied to a message that has already been sent
but that was not yet consumed. If yes, the thread continues
immediately by applying f to the message. Otherwise, a new
receiver is created and linked into the receivers list,
and the thread waits for a notification on this receiver.
 Once the thread is woken up again, it continues by applying
f to the message that was stored in the receiver.

The mailbox class also offers a method receiveWithin
which blocks for only a specified maximal amount of time.  If no
message is received within the specified time interval (given in
milliseconds), the message processor argument f will be unblocked
with the special TIMEOUT message. 

With an implementation of mailboxes we can now add mailboxes to
our Process class by mixing in MailBox in Process, and we can make
the syntax more Erlang like if we want by defining the method !:

 class Process(def body:unit) extends Thread with MailBox {
   def !(msg:Message) = send(msg);
   ...

In order to be able to do send and receive in code that does
not inherit from Process we supply some methods in the Process
object:

 object Process {
    def send(p:Process,msg:Message) =
	p.send(msg);
    def receive[a](f: PartialFunction[Message, a]): a = 
	self.receive(f);

    def receiveWithin[a](msec: long)(f: PartialFunction[Message, a]):a =
        self.receiveWithin(msec)(f);
  ...

We can also get named process by, as in Erlang, using a name server:

object NameServer {
  val names = new scala.collection.mutable.HashMap[Symbol, Process];

  def register(name: Symbol, proc: Process) = {
    if (names.contains(name)) error("Name:" + name 
                                                   + " already registred");
    names += name -> proc;
  }	

  def unregister(name: Symbol) = {
    if (names.contains(name)) 
      names -= name;
    else 
      error("Name:" + name + " not registred");
  }

  def whereis(name: Symbol): Option[Process] = 
    names.get(name);

  def send(name: Symbol, msg: Actor#Message) =
    names(name).send(msg);

  def view(name: Symbol): Process = names(name);

}

Then we can just write code like
  register('myServer, Server.loop(0));
  'myServer ! Tuple2('myMessage, self);

 K3 Process links
 ==========
One of the most important aspects of Erlang is the ability to link
processes together. When a process is linked to another process it
will send a signal to the other process when it dies. This makes it
possible to monitor the failure of processes and to implement
supervision trees where a supervisor process monitors worker processes
and can restart them if they fail.

In Erlang a process can be linked to its father (creator) by using
the spawn_link BIF when spawning a new process, it is also possible to
link to another process at a later time by calling the link BIF.
To implement this in Scala we have to add a list of links to the
Process class and provide the link methods, as well as signal a
failure to all linked processes. 
We can now see the complete Process class in all its g(l)ory:

class Process(def body:unit) extends Thread with MailBox {
    private var exitReason:AnyRef = null;
    private var links:List[Process] = Nil;	
    override def run() = {
	try {body;signal('Normal)}
	catch {
	    case _:java.lang.InterruptedException =>
    	      signal(exitReason);
	    case (exitSignal) => 
	      signal(exitSignal);
	}
    }

    private def signal(s:Message) = {
	links.foreach((p:Process) => p.send(Tuple3('EXIT,this,s)));
    }

    def !(msg:Message) = send(msg);

    def link(p:Process) = links = p::links;
    def unlink(p:Process) = 
      links = links.remove((p2) => p == p2);

    def spawn_link(def body:unit) = {
	val p = new Process(body);
	p.link(this);
	p.start();
	p
    }

    def self = this;

    def exit(reason:AnyRef):unit = {
	exitReason = reason;
	interrupt();
    }
}

In Erlang links actually work slightly differently, when a process
(p1) dies and that process is linked to another process (p2) then p2
will also be killed unless that process has the flag trap_exit set to
true. With this behavior a whole set of processes that are linked can
all be killed by killing just one of the processes. Often this is the
desired behavior, if the set of processes are dependent upon
each other then there is no reason for any of the processes to continue
executing if one of them dies. 

When the process-flag trap_exits is set to true then the linked process,
p2, will not be killed instead it will receive a message about the
cause of the death of p1. This is the behaviour of the Scala code
above.

It would not be hard to mimic the Erlang behavior in Scala by
adding the trap_exit flag and then test it in the signal method and
either use send or exit.

  K5 Code updates
  ===========
Currently there is no real support for code updates in Scala,
but this might not be as big a problem as one might suspect.
The hot code replacement is a cool feature of Erlang but
it has some shortcomings. There is no automatic way to
update or even detect changes in data structures.

Therefore it seems like many real systems goes through
special upgrade states and conversion functions, often 
even using redundant hardware for upgrades. By just following
some conventions you can actually upgrade running Scala
systems in a similar fashion.

  K4 No shared memory
  ==============
Now, here is the big difference between Scala and Erlang.
Since Scala uses the underlying concurrency model of
Java or .Net data sent as messages will be shared
between processes.

If you (like Joe) belive that sharing is evil, there are
four possible solutions to the problem of shared data.

 1. Make send copying. 
     This requires a copy or serialize method in all messages,
     which could be implemented in two ways in Scala.
     a) Add an abstract copy method to the toplevel class Anyref
         requiering all classes to implement this method.
     b) Have a speciall Message class that has a copy method,
         requiering all classes that needs to be sent to implement
         this method.
     The problem with this solution is that the system would have
     to somehow ensure that a deep copy is performed. An open
     problem at the moment.

 2. The Erlang way
     No updateable structures. 
     You could for example implement all Erlang terms in Scala
     and define the type Message as Term. 
     A simple solution which would make scala Processes as 
     powerful as Erlang processes, but you would loose OO for
     all messages.

 3. A new type of type analysis that can determine that there are
     no mutable structures in objects used as messages. 
     Another open problem.

 4. Head in the sand.
     Just ask the programmer to not send mutable data,
     (unless he knows what he is doing).
     This is the current approach of Scala (and Java).

I will not present a conclusion here, instead i hope to inspire
to some debate.

Anyway... Scala is a really cool language and I encourage
you all to try it out, you just might like it as much as I do. 
Still, though, I am back in the Erlang world now, and I'm
loving every minute of it.

/Erik Happi Stenman

PS.
  Sorry for the badly formatted code, I'm using the trap-exit
  forum and it seems like code either looks bad in the forum
  or in the email version... or perhaps both.
  Any suggestions are welcome.
_________________________________________________________
Sent using Mail2Forum (http://m2f.sourceforge.net)