[erlang-questions] Re: Shared/Hybrid Heap

Fri Nov 5 12:21:58 CET 2010

On 11/4/10 10:41 PM, Richard O'Keefe wrote:
>
> On 5/11/2010, at 12:52 AM, Paulo Sérgio Almeida wrote:
>
>> Hi Robert,
>>
>> there is a point on which I do not agree; when you say:
>>
>> On 11/4/10 10:46 AM, Robert Virding wrote:
>>
>>> - I think that having immutable data even within only one process is a big win. Having mutable data gives you all the problems of global data. Immutable data makes it much easier to keep track of what is happening with your data, which is always a Big Win.
>>
>> I would say that having mutable data gives us SOME problems, but much LESS than when we have global mutable data.
>
> The difficulty is with aliasing.  Data don't have to be global to be aliased.

Yes, aliasing is the bane of imperative languages. But ... (see below)

>
> It *is* possible to have a language that looks and feels like an imperative
> language but which is largely free of such problems.  The original design of
> Euclid springs to mind.  Quoting "Early Experiences with Euclid":

If only it was that easy. In those times, the worry was about aliasing 
involving variables (e.g. a parameter of a function and a global 
variable). Dynamic allocation was not pervasive as now. With the present 
use of objects and passing references, things become much worse. Many 
people (including myself) have tried to do something about it. There are 
experimental proposals, like ownership types, confined types, etc. But 
we have not won the battle yet. Mainstream languages lack proper support 
to deal with aliasing.

>
> 	In Euclid it is illegal to have two identifiers that both name the
> 	same variable in a scope.  This means, for example, that it is
> 	illegal to call a procedure with a variable actual parameter
> 	which is also imported by the procedure, or for any two variable
> 	actual parameters to a procedure to specify the same variable.
> 	The imports clause makes it easy for the compiler to check for non-aliasing.
>
> 	The non-aliasing rule effectively eliminates a large class of
> 	particularly insidious programming errors.  A common example is the
> 	accidental passing of an element of an array to a procedure which
> 	changes the same array directly.  In this situation the value of
> 	the actual parameter changes unexpectedly as a result of an
> 	assignment to an element of the global array.
>
> 	This kind of bug can be very costly in debugging time, as we
> 	discovered early in the compiler project.  In the bootstrap and
> 	testing phase of the project, a particularly puzzling bug was
> 	encountered.  An entry in the compiler's symbol table was being
> 	changed without ever being assigned to (as was verified by a
> 	trace).  This problem, which was due to an accidental parameter
> 	alias, took almost a week of programmer effort to find.  The
> 	preliminary version of the compiler which we were then using
> 	did not implement non-alias checking.  Had it been implemented,
> 	the compiler would have detected the potential problem when the
> 	program was first compiled and thereby saved us a full week of
> 	programmer time.
>
> Note that this was nothing to do with *globality* of data, just the
> combination of mutability and sharing.

Yes ...

>
> Now here's the cute thing.  There was a Xerox paper that showed that
> the constraints on aliasing that made Euclid safe meant that it could
> be read as syntactic sugar for a functional language!
>
>> I (and others) think there is a place for a middle-ground where processes give us locality, allowing a divid-and-conquer of the global problem into simple parts, where the data in a given process is manageable, but where if the algorithm calls for mutable data then we should be able to use it. All the nasty problems of shared memory concurrency (from the programmer's point of view) have already disappeared when we have only per-process mutable data as opposed to global mutable data.
>
> The nasty problems of shared memory *concurrency* disappear, true;

This is what I meant.

> the nasty problems of shared memory *mutability* do not.

They do not disappear. But they are more confined. With no global 
sharing, each process acts like a course-grained active object that 
TRULY encapsulates its state from the rest of the system, unlike what we 
have now with global mutable data sharing in OO languages.

We can then reason about those mutability problems looking at each 
process separately. Even if we do not need to explore multi-core 
capabilities, the structuring of a system as a bunch of processes will 
already be helpful from a software engineering view point, to tackle 
complexity.

>
> If you want to add mutable data to an Erlang-like language, you had
> better have type checking and no-alias enforcement as good as Euclid had.

That would be overly restrictive. What I meant would be to be able to 
program like, e.g., in Python, within each process, and being able to 
have aliasing, for good or for worse.

But, as I said, I was not trying to advocate adding mutability to Erlang 
now; just to say that actors + local mutability is a useful combination 
in the design space, between the extremes in one direction (Erlang) or 
the other (the typical OO language).

> You might well end up with an excellent language, superb for its job,
> but it would not (and should not) be Erlang any more.
>
>> The idea would be to be able to use the many proven algorithms that were created for the imperative sequential world inside each process, and let the process concept a la Erlang deal with the concurrency aspects.
>
> Come to think of it, this is not entirely unlike what Occam was (and
> thanks to free implementations like KROC, still is).
>
> Why not start from Occam and see what you can add in the way of trees
> and pattern matching?
>

Don´t know how Occam is nowadays. When I used it 20 years ago to build a 
ray-tracer, almost all the pain was due to the lack of dynamic memory 
allocation.

Regards,
Paulo