Fun syntax & language feature poll

Wed Aug 28 11:58:46 CEST 2002

> Speaking of which, why not remove this feature of Erlang (that a remote
call
> calls the latest version of loaded code) and instead introduce a special
update
> call like:
>  enter_latest foo(...) ?
>
>
> Except in the shell who uses this feature?
> If you want to write an application that really supports upgrades you have
think this
> through and design with it in mind from the beginning anyway.
>
> Removing this feature would do wonders to the possibilities of
optimization...

I'm being a month late in replying, but this is a hobby horse of mine and
YES, permitting optimization across module boundaries helps quite a bit.
Here's why, for those who are interested but not experts.

Basically, a compiler does better the more code it has to work with, and it
seems to me that in the case of Erlang, you need to cross module boundaries
to get enough context. At least my own experience (e.g., EUC'01) shows this
very clearly.

Here is an example of what can be done. Type analysis is an optimization
that permits simplification of primitive operations. When applicable, it
yields excellent speedups for Erlang-style languages (Prolog, Lisp, ...).

For example, take element(N,X). If nothing is known, the VM must check:

- that X is a tuple of size K
- that N is an integer in the range 1 =< N =< K
- finally, select the appropriate word from the heap and return it.

The same kind of tests are done in most BIFs. Often, the code is wrapped in
a C function as well, yielding extra overhead for parameter passing. It
makes you gnash your teeth to see it.

Now assume the analyzer finds that N is a small integer and X is a tuple of
known arity.
A native-code optimizer can then remove the tests, reducing the element/2
call to a single load. This is good, but there's more: the important but
somewhat subtle issue that the compiler can ALSO be more aggressive in
inlining primitives (rather than doing C function calls) if it doesn't have
to lay out lots of code for the failure cases. Even if the failure code is
never executed, it requires more compile-time, reduces the effectiveness of
optimizations and messes up the I-cache.

I duly did an experiment with type analysis of Erlang several years ago
(1997?) and found that PER-MODULE analysis of real code, meaning OTP, is
largely useless for this purpose[*]. I furthermore believe this is the case
for most real code, since modules are used to structure programs into
reusable chunks as well as provide the unit for code loading. Thus,
cross-module analysis and optimization are "on the roadmap" for high
performance, at least in my book.

What to do? A compiler can enable cross-module optimization, say with the
transparent module aggregation method I described at EUC'01, AND/OR we can
provide some mechanism for programmers to group modules on the language
level. Perhaps Richard's hierarchical name spaces, or some suitably modified
version thereof, could be drafted for this purpose?

Depending on how such an extension is done, we may also have to fix records,
since record names often are reused and the scope of a record declaration is
somewhat ad hoc.

Best,
Thomas

[*] What does this mean? The analyzer kept a single version of each
function, and normally found that nearly all functions could be called with
'anything' for every function argument, i.e., no information. This is
equivalent to saying that you could just analyze each function locally,
which normally yields few opportunities for type-based optimization.
(Another alternative to cross-module analysis is for the analyzer to keep
multiple versions of functions; eminently possible, but there is no
consensus on how to do this well, AFAIK, nor do we know how well it works.)