Thoughts on laziness (long)

Thu Oct 19 15:33:55 CEST 2000

 > Robert Virding wrote:

 > > I [...] have [not] yet come across any applications in Erlang 
 > > where it [laziness] would have been useful.

Jeremy Strazhiem wrote:

 > However, I think Okasaki's amortized data structures might alone 
 > provide a good reason for some call-by-need laziness features.

Writing data structures in functional languages leads to embarassment:
almost no matter what you do, it's usually slower and often also
has worse big-O behaviour than the equivalent in a language which
allows destructive update.

One solution is to come up with elegant but complicated data
structures such as Okasaki's. Many of the more attractive ones require
lazy semantics. If you go down that road, you can brag on
comp.lang.functional that your functional language now also supports
monadical-recursive-slowdown-higher-order-transmogrification. You also
scare off many potential users.

Another solution is to just provide hashtables a la ETS. Nowhere near
as elegant, but (a) it doesn't require turning the language's
semantics upside down and (b) it's reasonable to assume it's faster*.

Matthias

------------------------------

* Every attempt I've seen to build a clever, general data structure in
  Erlang has resulted in something that's slower than either lists or
  ETS. Two examples:

ra_list

   ra_list is one of Okasaki's data structures. It's a data
   structure with both list-like and array-like behaviour;
   you get O(log N) access to any element AND O(1) head()/tail()

gb_trees

   gb_trees is on the user contributions page at erlang.org.
   As far as I can tell, it's supposed to have O(log N)
   access to elements. The documentation makes no claims
   beyond "highly efficient".

Doing a quick benchmark doesn't give very encouraging results.

  Sum: take a list of numbers 1..N and sum them (not Gauss' way ;-)

  Update: change the 3rd, middle and 3rd-last elements in a list

         -----list------     ------ra-list--   -------ets-----  gb_tree
   N     sum     update      sum      update   sum    update    update
   ----------------------------------------------------------------------
   100   107     197         962      84      1660    50        87
   1000  921     3666        9903     127       16k   60        89
   10k   9324    46k         93k      168      168k   74        96
   100k  493k    660k        1357k    188     2109k   84        97

   The table shows execution time, i.e. larger numbers are worse.
   I can't see why anyone would want to use a gb_tree---ETS is
   faster at the very operation gb_tree is supposed to be really
   good at. The picture for ra_list is a little more complex, but
   it doesn't make me want to jump up and down saying "yay! ra-lists".

Reasons why the above tests might be complete crap:

   - I might have implemented ra-list really badly

   - The above benchmark might be contrived specially to make
     ETS look good. Then again, with such small tuples, I think
     it's a worst-case for ETS.

   - I might just be showing that the current Erlang implementation
     sucks. Maybe ETOS or HIPE can show much better results for
     purely functional data structures.

   - I wasn't super-careful with the timing, e.g. it's all the
     same Erlang machine, so garbage collection from earlier test
     runs might have perturbed the results.

Source code is available on request if someone wants to investigate
this some more.

--- eot ---