Lazy lists - an implementation.

Thomas Lindgren <>
Wed Oct 18 17:54:53 CEST 2000


> Some clever multi module flow analysis, would enable a wide range of
> optimisations, such as avoidance of external calls, destructive
> updates of records, etc. etc. Currently, there is an unfair
> performance cost for well structured applications, as the compiler
> only operates at one module at the time and many optimizations only
> are performed within one function. Modules that not are intended to
> be used from other applications should be for free.

Coming soon ... currently at the research stage, with a clunkily
running prototype. And yes, I am a firm believer in the power of
cross-module optimization.

One solution is to abandon hot code loading, or to modify it so
that some modules must be loaded as a group.

(See also: ftp://ftp.csd.uu.se/pub/papers/reports/0154.ps.gz for
some background.)

OK, since this is a favorite topic of mine, hit 'n' or stay
around for a longish discussion.


Here we go.

I know of two implementations of "module merging" that can also group
modules into a big 'super module': one by Richard Carlsson and one by
me. Module merging is intended to be safe with the current hot code
loading semantics, but you can also (basically) just 'cat' all the
modules to be merged into a single big file and compile that.

The devil is in the details, however. First of all, you have to handle
apply/spawn/spawn_link and similar stuff. Second, exports have to be
accomodated somehow (in the best of worlds, a file should be able to
export several interfaces, but right now it can't). But the main
problem at this time, if we work at the source level, is that of
records:

- We don't want to expand records into tuples, since that carries less
  information for the compiler to work with. So we have to keep the
  record operations around.

- Two record definitions in different modules can have the same name.
  (An example of this is mnesia, where the name 'state' is used for
  several different record definitions.)

- This means the files can't be straightforwardly merged since the names
  may clash and the compiler can't handle that.

- On the other hand, we can't simply give each record definition a unique
  names, since the same record can be used in several places (file.hrl, say,
  included here and there) and all those uses must have the same name.

The current solution is this: records are renamed to have a name
that is unique _per_unique_definition_, not per occurrence. This is done by
constructing an MD5 hash of the fields and their initializers, and
appending that to the name. All equivalent record definitions (same name,
same fields) will then get the same name.

Thus, if your record looks like

-record(state, {supervisor, pid_tab}).

You then get a new, unique name based on the MD5 of the definition:

 'state 2d7265636f72642873746174652c207b73757065727669736f722c7069645f7461627d'

(The 'state ' prefix is retained just to keep things debuggable -- the
MD5 is sufficient.)

Now, if the same record definition occurs in several places then all those
definitions will have the same renaming. 

There are drawback: first, printing the record looks weird (the tag is
'record <md5hash>'); second, the _entire_ system must use this MD5
renaming scheme. Otherwise, some modules will still use the old naming
scheme.

Still, that's not too onerous. And when records are Done Right, this problem
will go away.

Wishlist:

- records to have unique names over the entire system, etc.
  * source-to-source transform has to go

- export many interfaces
  * would also be nice to have -apply([f/1]), -spawn([g/2]), etc
    to make it easier to see what is exported just for those purposes;
    this would also get rid of the comment "internal export".

- better support for detecting module names in code (for instance,
  gen_tcp stores an atom in a table, which is later used as a module
  name ...  there's little hope of detecting that and changing the
  name into optimized_gen_tcp, say)

- no _undocumented_ magic BIFs in os.erl and similar (+ some way to
  detect those programmatically), since renaming os.erl into optimized_os.erl
  will fool the runtime system.

			Thomas
--
Thomas Lindgren					
Alteon Websystems Sweden			http://www.bluetail.com

"The need of a constantly expanding market for its products chases the
bourgeoisie over the entire surface of the globe. It must nestle
everywhere, settle everywhere, establish connections everywhere."
-- Communist Manifesto



More information about the erlang-questions mailing list