Erlang hints from an CO junkie

Richard A. O'Keefe ok@REDACTED
Fri Aug 13 05:54:20 CEST 2004


"Vlad Balin" <vlad@REDACTED> wrote:
	Yes, in my terms - difference is that in one case you're using
	"imperative" ADT, in other case "functionally clean", without side effects.
	You suggested to perform high-level design in imperative style, taking
	the natural consurrency in consideration. I understand what you saying,
	just noted that there's much common with OO approach in terms of Alan key.
	
I'm getting a bit tired of seeing Alan Kay's name spelled with an "e".
It's KAY, not KEY.

For what it's worth, Smalltalk, including the Squeak and Croquet systems
on which Alan Kay has most recently worked, does *not* really handle
concurrency via OO.  Processes are objects, and
    pid := [.....block of code....] fork.
is the way to make one.  This is NOT integrated with message sending in any
interesting way; all Process objects have the same protocol, which has more
in common with Java threads than one might expect.  Smalltalk Processes
communicate via shared mutable objects protected using Semaphore objects;
perhaps the simplest kind of shared mutable object to use is a SharedQueue
which is pretty much a classic unbounded buffer.

So a Smalltalk collection of Processes reading messages from SharedQueues
and sending messages to other processes via SharedQueues could look a whole
lot like an Erlang collection of Processes.

Another important point is that if you combine
 - processes which do computations using
 - a pure functional programming language
 - and message sends to other processes
then the potential for things that look like mutable variables is
inescapable.  This happens in CCS, CSP, and the (few) related formalisms
that I am familiar with:  once you can do

    var := receive <store,X> then var(X)

    var(X) := receive <store,Y> then var(Y)
           or receive <fetch,Pid> then send <value,X> to Pid then var(X)

then you have something practically indistinguishable from an imperative
variable.  If you want a solution that doesn't involve something that
looks like mutable state encapsulated in processes (even though it isn't
technically mutable), then you have to ban processes.

This *DOESN'T* look like OO.  It looks like CCS or CSP or mu calculus
or FCP or GHC or whatever.

That doesn't mean you can't implement OO in such a scheme.
I once had an MSc student who designed a highly concurrent OO language
and implemented it by mapping it onto a version of ML with CSP-ish
extensions.  (Was the name of that ML version LCS?  Something like that.)
But OO certainly doesn't feel *natural* in this model.

There are some important differences:

- Unlike the majority of OO approaches, as fossilised in UML,
  inheritance is *not* a core concept in Erlang designs.

- Unlike the majority of OO approaches, as fossilised in UML,
  an Erlang process does not have a fixed number of state variables.
  An Erlang process can have any (fixed) number of "states", with
  transitions between states effected by tail calls, and each state
  can have a different number of arguments.  This means we never have
  to deal with "oh, this variable isn't defined in this state" problems.

  In particular, the entire protocol of a process can shift from state
  to state; this is as if an 'object' could completely change its class.
  (Smalltalk does in fact allow this via #become: but Ada, C++, Java,
  and other such languages do not.)  When you have this ability, it is
  amazingly useful on *rare* occasions, but it makes a nonsense of UML.

  A particularly interesting example of this concerns hot loading.
  A process could be sent an 'upgrade' message which would cause it
  to switch over to a function in another module which would convert
  its old state to a new form and keep running in the new module,
  _without_ losing its identity.  Something rather like this actually
  happens in Smalltalk when you change the definition of a class that
  has existing objects, or when you restore a "pickled" object from an
  ImageSegment and it finds that its class has changed.  This kind of
  thing can be important in real systems, but is unthinkable in UML.

- Unlike the majority of OO approaches, state *changes* cannot be hidden
  deep inside some possibly distant method call buried in another class,
  not in an Erlang application.
  *Any* state change must be effected by a tail call to a 'state'
  function, in which *all* of the state must be explicitly passed.

- By making a non-tail call to a 'state' function, it is easy for an
  Erlang "object" to temporarily shift to a new state (which might even
  be of a different 'class') and then shift back to its original state.

My point here is that if you

	Just look at it from another side.
	
then a whole lot of things which are easy to do in Erlang become
practically impossible to think of.

	It's not natural to send a message to a string, right, but
	there's just no another way in Smalltalk to represent the
	string.  The language is conceptually simple (which is good),
	and we just paying for this simplicity.

As someone who has been using Smalltalk a lot, I would say that it
*is* natural to send a message to a String.  In fact, the protocol
of String includes
	String allSelectors size
    =>  829
methods in Squeak 3.6.  Since Smalltalk Strings are mutable, it's not
clear how else you could handle such strings in an OO language.

There is an incredibly important fact about Smalltalk which actually
makes it much closer to the "1000 functions acting on one data structure"
approach that Joe approves of than to the crippling "strict encapsulation" 
that Java enforces (although Java in fact isn't terribly good at encapsulation;
once they added the reflection methods encapsulation went away and hid in a
corner crying).  (Come to think of it, "829 methods acting on one data
structure" is pretty close to "1000 functions ...")

In Smalltalk, ANYBODY can add a method to ANY class.  There is, in Smalltalk,
no such thing as a "sealed" built-in class.

If I want a new method, let's say #numberOfRuns, that makes sense for any
sequence, I just add it

    SequenceableCollection>>
    numberOfRuns
      |n x|
      n := 0.
      self do: [:each |      
          (n = 0 or: [each ~= x]) ifTrue: [
              n := n + 1.
              x := each]].
      ^n

Now I can ask 'Mississippi' numberOfRuns (the answer is 8).  In fact,
since SequenceableCollection has 64 descendants in the version of Squeak
Smalltalk I've tried, I've added this function to 64 data types including
arrays, bitmaps sound buffers, strings, and various kinds of packed
arrays of primitive types.  Inheritance really can have its uses.
(Also it can have its dangers; Semaphore inherits from SequenceableCollection
and I'm not sure that this really makes much sense.)

The thing that makes it natural to send messages to Strings is that
you can add NEW String methods, as I just illustrated above.  You are
not limited to the set of methods that the original designer thought of.

Oddly enough, despite being the second famous "classic" OO language
(the other famous "classic" OO language is Simula 67), Smalltalk really
doesn't fit very comfortably into the UML straitjacket.

I note, for example, that when I loaded the "Magma" object-oriented
database package into Squeak, it thrust its tentacles all over the place,
so that various kinds of built-in objects would know how to save themselves
into a DB and restore themselves thence.  This is in some ways the very
reverse of encapsulation, but it is necessary if you are to be able to
install such incredibly useful facilities without having them built into
the language and compiler.

(Yes, there is an Erlang analogue.  Mnesia can handle -records precisely
*because* -records are NOT encapsulated.)

	>From the other hand, string object from STL (C++) is an
	excellent example of OO practice.

(a) While C++ strings _are_ part of the C++ standard, they are _not_
    part of the STL.  I have a copy of the 1998 STL Programmer's Guide
    from SGI online, and I can assure readers that 'string' is not there.

(b) C++ strings are widely regarded as a disastrously bad design.
    They have so many problems that I don't know where to begin, but
    you will find plenty of discussion of the topic on the web.

(c) The funny thing about the Standard Template Library is that
    IT IS NOT OBJECT-ORIENTED.  It's all about *templates*, as the name
    says, not about *objects*.  What it depends on is NOT OO in any
    way shape or form but higher-order functions and overloading.
    You can, and Stepanov in fact *did*, produce something very similar
    for Ada 83, which was the last version of Ada not to support OO.

    Typed functional languages without a trace of persistent identity,
    mutable state, or inheritance, can and do support conceptually very
    similar libraries.

	Suppose you want to capitalize letters in the string.  If you
	going to find (sic.) corresponding method in std::string, you are
	wrong.  You will not find it.  Because string expose only an
	iterator interface, like generic container, which doesn't depend
	on string implementation.  _This_ is the right way to represent
	the string object.
	
C++ strings do _not_ "expose only an iterator interface".  A quick check
with an AWK script looking at /opt/SUNWspro/WS6U2/include/CC/Cstd/string
found over 100 public types and methods.  Some of those are the iterator
interface, but the majority of them are not.  There's assignment,
comparison, concatenation, in-place appending, searching, all sorts of
stuff done directly by basic_string<> and _not_ via the STL at all.

	What you have to do, is to apply _generic_ algorithm "transform" (working on
	all container types!) and use scalar function toupper( char ).
	Beutiful. Natural. Clever. And string is still an object.
	
I have The Talking Moose on my MacOS X box.  Only a few minutes ago it popped
up and said 'for every problem there is a solution which is simple, neat, and
wrong'.

Even in ISO Latin 1, converting a string to upper case may yield a result
which is NOT THE SAME SIZE as the string you started with.  In particular,
any char toupper(char) function *has* to give incorrect results for at least
one character in ISO Latin 1.  For Unicode, which Erlang is supposed to be
moving to, any attempt to do case conversion one character at a time is
doomed, DOOMED, D O O M E D, I tell you (runs away cackling into the
distance as the monster stalks out of the crypt door).

I'm actually trying in my spare time to write a "static" Smalltalk compiler
and library, and I've been staring down the barrel of string handling far
too long.  Unicode is just plain nasty, and everything you _think_ you know
about string handling is probably wrong.  Unicode handling is so nasty that
encapsulating it and *not* letting people get their hands on raw Unicodes is
probably the best way to preserve our sanity.

Oh yeah, did I mention that case conversion is locale-sensitive?

D  O  O  M  E  D  !




More information about the erlang-questions mailing list