[erlang-questions] Re: Shared/Hybrid Heap

Thu Nov 4 23:41:58 CET 2010

On 5/11/2010, at 12:52 AM, Paulo Sérgio Almeida wrote:

> Hi Robert,
> 
> there is a point on which I do not agree; when you say:
> 
> On 11/4/10 10:46 AM, Robert Virding wrote:
> 
>> - I think that having immutable data even within only one process is a big win. Having mutable data gives you all the problems of global data. Immutable data makes it much easier to keep track of what is happening with your data, which is always a Big Win.
> 
> I would say that having mutable data gives us SOME problems, but much LESS than when we have global mutable data.

The difficulty is with aliasing.  Data don't have to be global to be aliased.

It *is* possible to have a language that looks and feels like an imperative
language but which is largely free of such problems.  The original design of
Euclid springs to mind.  Quoting "Early Experiences with Euclid":

	In Euclid it is illegal to have two identifiers that both name the
	same variable in a scope.  This means, for example, that it is
	illegal to call a procedure with a variable actual parameter
	which is also imported by the procedure, or for any two variable
	actual parameters to a procedure to specify the same variable.
	The imports clause makes it easy for the compiler to check for non-aliasing.

	The non-aliasing rule effectively eliminates a large class of
	particularly insidious programming errors.  A common example is the
	accidental passing of an element of an array to a procedure which
	changes the same array directly.  In this situation the value of
	the actual parameter changes unexpectedly as a result of an
	assignment to an element of the global array.

	This kind of bug can be very costly in debugging time, as we
	discovered early in the compiler project.  In the bootstrap and
	testing phase of the project, a particularly puzzling bug was
	encountered.  An entry in the compiler's symbol table was being
	changed without ever being assigned to (as was verified by a
	trace).  This problem, which was due to an accidental parameter
	alias, took almost a week of programmer effort to find.  The
	preliminary version of the compiler which we were then using
	did not implement non-alias checking.  Had it been implemented,
	the compiler would have detected the potential problem when the
	program was first compiled and thereby saved us a full week of
	programmer time.

Note that this was nothing to do with *globality* of data, just the
combination of mutability and sharing.

Now here's the cute thing.  There was a Xerox paper that showed that
the constraints on aliasing that made Euclid safe meant that it could
be read as syntactic sugar for a functional language!

> I (and others) think there is a place for a middle-ground where processes give us locality, allowing a divid-and-conquer of the global problem into simple parts, where the data in a given process is manageable, but where if the algorithm calls for mutable data then we should be able to use it. All the nasty problems of shared memory concurrency (from the programmer's point of view) have already disappeared when we have only per-process mutable data as opposed to global mutable data.

The nasty problems of shared memory *concurrency* disappear, true;
the nasty problems of shared memory *mutability* do not.

If you want to add mutable data to an Erlang-like language, you had
better have type checking and no-alias enforcement as good as Euclid had.
You might well end up with an excellent language, superb for its job,
but it would not (and should not) be Erlang any more.

> The idea would be to be able to use the many proven algorithms that were created for the imperative sequential world inside each process, and let the process concept a la Erlang deal with the concurrency aspects.

Come to think of it, this is not entirely unlike what Occam was (and
thanks to free implementations like KROC, still is).

Why not start from Occam and see what you can add in the way of trees
and pattern matching?