Thoughts on laziness (long)

Fri Oct 20 02:49:34 CEST 2000

On Thu, Oct 19, 2000 at 03:33:55PM +0200, matthias@REDACTED
wrote:

> Writing data structures in functional languages leads to
> embarassment: almost no matter what you do, it's usually slower and
> often also has worse big-O behaviour than the equivalent in a
> language which allows destructive update.

> One solution is to come up with elegant but complicated data
> structures such as Okasaki's. Many of the more attractive ones
> require lazy semantics. If you go down that road, you can brag on
> comp.lang.functional that your functional language now also supports
> monadical-recursive-slowdown-higher-order-transmogrification. You
> also scare off many potential users.

Well, I don't think the situation is quite as bad as that.

> Another solution is to just provide hashtables a la ETS. Nowhere
> near as elegant, but (a) it doesn't require turning the language's
> semantics upside down and (b) it's reasonable to assume it's
> faster*.

Efficient dictionaries are a very important feature, and hash tables
seem to be the only way to get them.  And for that reason Erlang has
made the right choice in throwing purity to the wind and providing ets
tables and their like.

And I agree that the quest for purity, in whole or part, must be for
something more than bragging rights on c.l.f.  I think what purity we
provide must be there for some good reason, and the case seems to be
the ease of understanding and maintaining functional code over
imperative code.  This argument is based on intangible factors, but in
my experience it is very true.

<snip timing data>

One thing your comparison did not point out is this data is only
relevant in cases where you use the data in a single-threaded manner.
By "single-threaded" here I don't mean to imply concurrency, but
instead threads of data flow.  Here is an example:

This code uses a queue in a single-threaded manner:

Queue1 = processQueue(Queue),
Queue2 = processQueueAgain(Queue1),
Queue2.

This code uses the queue in a multi-threaded manner:

Queue1 = processQueue(Queue),
Queue2 = processQueueDifferently(Queue),
{Queue1, Queue2}.

With functional data structures, the second example does what you
would expect.  And, if neither function call has side effects, then
the order that they're called doesn't matter either -- not that this
is a very big deal, but when maintaining code it is nice not to have
to worry about just where to place a computation to be between the
correct side effects.  This is easier with functional structures.

You get none of the advantages with ets tables.

So, except in cases where the performance cost is prohibitive, I would
prefer to stick with functional data structures.  Even if in you
initial design you only use a structure in a single data flow thread,
you may later decide to add, say, multi-level undo, or some other
operation which is trivial to add to a functional structure.

As with everything, there are tradeoffs. (I'm allowed one blatant
truism per post :)

Now back to the comments on laziness.  In the single-threaded example
above, the stdlib Erlang queues will do fine, giving good amortized
behavior -- every so often they'll take a bit of time to reverse a
list; the cost spreads out.  However, as soon as you switch to the
multi-threaded example, this fails, and you can lose the O(N)
behavior, and in pathological cases end up with O(N^2) (this is from
memory, Okasaki provides detailed analysis).  In these cases laziness
would prove useful.

-- Jeffrey Straszheim              |  A sufficiently advanced
-- Systems Engineer, Programmer    |  regular expression is
-- http://www.shadow.net/~stimuli  |  indistinguishable from
-- stimuli AT shadow DOT net       |  magic