OO vs. Erlang

Wed Mar 12 08:31:24 CET 2003

I had to do some driving this weekend, so I got a lot of
thinking done.  Unfortunately it would take me days to
post what I have been thinking about so it will have to
come out over the next week or so little by little.

Paraphrasing Chris:

 > imperative and functional
 >    = destructive and derivational,
 >  or = persistent and emphemeral
 >  or = updatable and single-assignment

 > In OO, objects are often attributed with "identity", and
 > this directly relates to ephemeral vs persistent
 > (identity implies persistence.)

 > Erlang can encapsulate ephemeral data (the
 > function arguments) within a persistententity (the server)

When I mentioned Okasaki's term "persistence" it means
something very different than the same term applied to
OO instances.  I suppose it is unfortunate that he chose
that term because it has such a history as a term for storing
objects in a database across invocations of a process.
In erlang:

#state{a = 1, b = 2}  = State1.
State2 = State1#{a = 3}.

creates two versions of a #state object.  The
data structure is called "persistent" because it _cannot_
be modified and so must always be copied leaving both
versions accessible _simultaneously_.  In C++ or OO
languages, modifying the instance automatically changes
all references to it to give only the new value of the state.
One has to explicitly copy the object if the old version is
needed, something we are taught to avoid doing unnecessarily.
It is not "persistent" because a single change destroys
the old version.

 > I'm not actually convinced that undo does become so
 > much easier, because anything should be able to query
 > the value of the textbox at any time - the textbox is persistent.

See above.  In an X-windows or other implementation, the
textbox is not persistent in the Functional sense because it
is destructively modified thus changing its state.  The undo
information in X-windows is not associated with the textbox
but must be cached somewhere else and the internal state
of the textbox requested to be replaced (I believe that is
how it is done).

 > If it comes down to how I'd code it, the textbox would be
 > a server, which would still have to keep a list of all previous
 > values the textbox had, as its state - it would be nice to be
 > able to leverage the recursive nature of functions to implicitly
 > store those old values, but I don't think it could be done if
 > that textbox could be arbitrarily addressed by other processes

The code would work like this:

textbox:new() -> #charset{ version = [], data = [], redo = [] }

add_char(Char, #charset{ version = OldVsns, data = String }) ->
     NewString = [ Char | String ],
     #charset{ version = [ NewString | OldVsns ], data = NewString }.

del_char(#charset{ version = OldVsns, data = [ Char | Rest ] }) ->
     #charset{ version = [ Rest | OldVsns ], data = Rest};

del_char(#charset{ data = [] } = TextBox) ->
     beep(),
     TextBox.

ins_char(Char, Pos, #charset{ version = OldVsns, data = String }) ->
     InsPos = length(String) - Pos,
     NewString = lists:sublist(String, InsPos) ++ [ Char ] ++ 
lists:nthtail(InsPos, String),
     #charset{ version = [ NewString | OldVsns ], data = NewString }.

undo(#charset{ version = [ CurrVsn | OldVsns ], data = String, redo = Redo 
}) ->
     case OldVsns of
         [ Prev | _Older ] ->
             #charset{ version = OldVsns, data = Prev, redo = [ CurrVsn | 
Redo ] };
         [] ->
             #charset{ version = [], data = [], redo = [ CurrVsn | Redo ] }
     end;
undo(#charset{ version = [] } = TextBox) ->
     beep(),
     TextBox.

redo(#charset{ version = OldVsns, data = String, redo = [ Redo | Rest] }) ->
     #charset{ version = [ Redo | OldVsns ], data = Redo, redo = Rest };
redo(#charset{ redo = [] } = TextBox) ->
     beep(),
     TextBox.

[Warning: the above code is untested and typed in for the
first time in this message, but is approximately what you want.
Try it out and then print out the version field with one
entry per line to see the entire typo-prone history of the textbox.]

First note that paying attention to the internals without regard
to the display leads to an efficient and understandable data
structure that is completely unexpected when compared to the
imperative approach used by X-windows, et al.  The string in
the textbox is maintained as a _reverse_ list of characters,
because typically characters are added or removed one at a
time.  You would need to add functions for cut and paste, but
they look the same except for working on sublists at a time.

Second, the code uses the nature of functional data structure
versions that comes automatically to allow undo and redo.  The
internal implementation shares sublists to provide the most
efficient memory utilization that records all versions of the string
ever created.  (I would replace ins_char with a call to a function
that creates the inserted list recursively in an efficient way
rather than using append as I did above for clarity.)  The undo
and redo should still work regardless of the type of edit performed
(even removing every other letter in one shot or some other
strange transmogrification).

To do this in an imperative language you would have to introduce
lists and explicitly copy the string every time (and I'll bet you
wouldn't have stored it backwards because you probably pre-
allocated the memory and used a pointer -- it wouldn't even occur
to you that backwards storage might be beneficial) so I would
wager the imperative implementation would be less efficient.  On
top of that you would have to make new instances with different
length pre-allocated buffers for each textbox you wanted, or else
make it really big and hope you never overrun it (probably the
most commonly occurring and most insecure bug in all of software)
or some complicated realloc refinement would have to be
introduced (which of course would triple the memory use
because you would realloc at double size, copy memory and
then free).

In erlang, manipulating every version of the data structure that
ever existed is more natural and fairly straight-forward.  The
textbox behaviour above could be in a separate process so no
other code could get at the internals other than using the message
interface (strong encapsulation). If a pointer to a particular
version of the textbox internals were obtained outside the
process, it couldn't be changed in a way that would affect the
textbox or undo / redo.  Also you get concurrent execution with
all the other textboxes (think of scrolling stock tickers for different
stock market exchanges instead of text type-in boxes).

In fact, other processes could arbitrarily ask for _any_ version
of the textbox internals if you provided an message interface to
do so.

jay