Lazy lists - an implementation.

Wed Oct 18 16:59:29 CEST 2000

On Wed, 18 Oct 2000, Sean Hinde wrote:

Sean> > Unfortunately it would not directly help Ulf Wiger either as the large
Sean> > lists are generated by calls which generate "proper" lists. To use it
Sean> > you would have to write new interface functions to generate the lazy 
Sean> > lists and force people to use the lazy libraries.  This would still 
Sean> > mean you would have to go through code and modify it, as you would 
Sean> > using th existing "lazy" interface to ETS tables.
Sean> 
Sean> mnesia transactions seem to use pretty much this procedure (generate list,
Sean> convert to ets table, traverse ets table to generate new list - or something
Sean> similar). Obviously the author had some reason to choose this solution.

Similar?

Updates in a Mnesia transaction are stored in a temporary ets
table. At commit time, the transaction coordinator iterates over the
transaction storage once. During this iteration several new lists are
generated (one list per storage type and node). These lists (embedded
in records) are then sent to the other involved nodes, possibly logged
to file and finally written to ets/dets depending on the storage
type. Ok, there are lots of copying involved here but I don't think
that this part of Mnesia could be made so much more efficient without
new features in erts.

Sean> Maybe there's a better one which doesn't involve all that copying but also
Sean> doesn't have the downside which Ulf and the mnesia author are trying to
Sean> avoid (different memory allocation? different GC? different handling of fast
Sean> growing heaps? other?)

I don't know if Mnesia would benefit of lazyness, since the entire
application by nature is a large side effect. But there are other,
less sexy, things that it would benefit of.

Mnesia would benefit of hash tables locally allocated on the process
heap. The effect of this would be twofold. Firstly, it would avoid the
copying of data between the process heap and the ets storage, for each
access of the table. Secondly it would make the disposal of the
temporary hash table much cheaper.

Allocating tables on the process heap would also enable experiments
with more adaptive data structures, where plain lists could
transparently be transformed into tables, trees, tuples etc. depending
on (deduced or explicitly stated) access patterns and data sizes. 

A memory allocation stategy with separate heaps (garbage collected or
not) for each table would give yet other set of characteristics than
the current implementation with a global table heap.

Some clever multi module flow analysis, would enable a wide range of
optimisations, such as avoidance of external calls, destructive
updates of records, etc. etc. Currently, there is an unfair
performance cost for well structured applications, as the compiler
only operates at one module at the time and many optimizations only
are performed within one function. Modules that not are intended to
be used from other applications should be for free.

/Håkan

---
Håkan Mattsson
Ericsson
Computer Science Laboratory
http://www.ericsson.se/cslab/~hakan