Thoughts on laziness (long)

Wed Oct 18 01:47:39 CEST 2000

Hi,

Given all the recent discussion on lazy lists, I thought I'd make some
comments.  The availability of true call-by-need laziness is something
that I think Erlang sorely lacks; in fact, I think it is the one
useful mechanism that it most sorely lacks.

First off, lazy lists are actually rather familiar objects to us
programmers, even if we don't know it.  For example, almost every
language out there implements a rather specialized version of laziness
called a "file stream".  Think about it, a file stream is really a
list of characters (or bytes) which are read lazily from the
filesystem as needed.

This "lazy list as stream" motif is well established.  In the ML
community, where lists are by default strict, laziness is provided
through optional modules and called, you guessed it, "streams", and it
does seem a useful terminology in languages, such as Erlang, where
lists are by default strict.

Now, some may think that adding laziness to the language is merely a
matter of providing syntactic sugar on top of what can now be done
with Funs, but this is not so.  While a standard module for lazy lists
as well as, perhaps, lazy list comprehension would be welcome, they
don't answer the fundamental problem: Erlang's Funs only allow
call-by-name style laziness, not the more powerful call-by-need.

Call-by-name refers to a style of processing where the next element of
a structure is passed as a closure (read function) instead of a data
element.  When the value is needed the closure is "forced" and the
value thus obtained.  For a stream, the function will return not only
the next value, but also the next closure.  A problem arises, however,
when you force a closure twice, as you might if you pass the stream
around to different functions, which may iterate over it multiple
times.  Each time a computation starts at the head of the list and
iterates over it, the computation of the lists values must be
performed anew.  With a strict list the computation is only performed
once.

For some algorithms, of course, the data only needs to be processed
once, and call-by-name laziness is fine.  If you are using an
algorithm that accesses the list multiple times (in part of whole)
then strictness would serve better.

There is another approach to laziness, however, which is used in
Haskell and available in some versions of ML (and quite a few other
systems), known as "call-by-need" laziness.  In call-by-need laziness,
a lazy closure is called a "suspension", and is only evaluated once.
Then the value is remembered, and if that part of the stream is
visited again, the previously used value is returned.  With
call-by-need laziness you are guaranteed that the computation will be
performed at most once.  So, call-by-need guarantees the same
asomtoptic time efficiency as strict evaluation (although there is a
constant time factor slowdown compared to strict evaluation).

Call-by-need modifies values in the heap when a suspension is forced,
and this may seem to violate functional purity.  However, if you are
thinking at the proper level of abstraction you will realize that this
is not so.  If you think of what is immutable as being the *value* of
the suspension, the fact that is is computed later rather than sooner
in no way "modifies" the value, even though behind the scenes pointers
in the heap are being changed.

That being said, the fact that call-by-need laziness does update the
heap proves quite useful.  In traditional strict languages,
amortization methods prove impossible in algorithm analysis, as spread
out modifications made by expensive operations over the course of
several cheap operations.  This sort of thing is not possible in a
strict language, but there are tricks that can be done via the
call-by-need mechanism to achieve it.  I won't provide much detail
here, but Chris Okasaki's _Purely Functional Data Structures_
discusses the topic at length.  There exist implementations of queues,
for instance, that have much better amortized behavior than what is
possible in Erlang, particularly in cases where more than one
computation iterates across the queue.

Note that laziness does not always get better heap behavior than
strictness.  Depending on how you structure your list iteration, it is
possible that pointers to the head of the list still exist during its
entire lifetime.  If this is so, then the garbage collector may not
clean up the already processed nodes, even if they will not be visited
again.  Special compiler optimizations are sometime required to
mitigate against this.  Also, in many cases the ideal heap behavior is
achieved by combining strictness and laziness.  Haskell compilers, for
instance, do extensive "strictness analysis" to find areas where
strictness can be added to a computation to improve memory usage.
This is the source of the claim that laziness can lead to bizzair heap
behavior.  It can, but in languages that are by default strict it is
less of a problem with the few specifically lazy structures.  And
remember, with a lazy stream the worse case scenario is the same
(within a constant factor) as a strict list: the entire list being
created on the heap.

Minus a constant factor, lazy streams will not have worse heap
behavior than strict lists, and they might be very much better.

Call-by-need laziness seems like a real winner.  However, it would
require compiler support to be used in Erlang, as behind the scenes it
must perform updates on data elements, which is not possible in Erlang
(at least not efficiently).

Also, if one sees the value of lazy streams, then surely lazy trees,
queues and other such are required.  Other strict languages have
provided laziness through a few simple primitives, such as Suspend and
Force.  Means to use these to construct lazy streams, and other such
structures, are well covered in the literature.

-- Jeffrey Straszheim              |  A sufficiently advanced
-- Systems Engineer, Programmer    |  regular expression is
-- http://www.shadow.net/~stimuli  |  indistinguishable from
-- stimuli AT shadow DOT net       |  magic