Child modules draft feedback wanted

Tue Apr 4 19:00:59 CEST 2006

Richard A. O'Keefe wrote:
>
> Practically, there is a problem.  Modules have to do at least 
> two different things.  They are the units of hot loading.  
> They are the units of encapsulation.  I'm sure I remember 
> someone talking about using different mechanisms for 
> different jobs.  I'm proposing "full" modules as units of hot 
> loading, and "child" modules as units of encapsulation.

Modules are also, to some extent, units of version control
(more on that later). 

Regarding hot loading - while it's often extremely practical
to hot-load individual modules during experimentation and 
debugging, things tend to work a bit differently when upgrading
large systems. Basically, _OTP applications_ are the unit of
in-service upgrade when using the OTP framework.

For example when discussing cross-module optimization with
Thomas Lindgren, we (the IMS Gateways unit within Ericsson) 
have taken the preliminary stance that it would be OK to 
allow cross-module inlining within an application, at least
in a stable system.

(This could have an effect on per-module hot-loading somewhat
similar to the "sticky" concept, where a module cannot be 
loaded if it is marked as "sticky"/"hard-wired into an 
application".) 

OTP has pretty good support for upgrading an application at 
a time, and I think this might be a good trade-off in order
to allow cross-module inlining.

Semantically there should be no significant difference 
between code that's inlined across modules and code 
that isn't. It might be possible to accept _some_
well-defined differences e.g. as regards exceptions.

In a concurrency-oriented language, any optimization may
affect timing, which may cause problems if the code is 
timing-sensitive.

> To see that there is a problem, I made some size measurements 
> on Erlang R9C.  (The R11 release I downloaded turned out to 
> be corrupted, so I couldn't measure that, and while I _am_ 
> downloading R10 to do my measurements on, it's taking long 
> enough that I decided not to wait.)
> 
> > summary(s)
>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
>     2.0    44.5   127.0   291.4   314.0 22508.0 
> 
> That is, there is at least one module with 2 SLOC,
> 1/4 of the modules have 44 or fewer SLOC,
> 1/2 of the modules have 127 or fewer SLOC,
> 3/4 of the modules have 314 or fewer SLOC, but there is at 
> least one module with 22,508 SLOC.
> 
> The distribution of SLOC sizes shows two peaks, one around 8 
> SLOC (yes, that small) and one around 192 SLOC.  But there 
> are quite a few big ones.

I think this corresponds pretty well to our experience.

There are many modules in our systems as well as in OTP that
really are not reusable entities as such, except in a very
narrow context. Examples that come to mind are 

- hipe_xxx_loader (xxx: amd64 | ppc64 | ppc | sparc | unified | x86)
- disk_log1, disk_log_server
- most modules in mnesia (except mnesia, mnesia_frag, and a few others)

The way to achieve structure today is to break a large module
into several first-class modules, but then having to document
which ones are to be viewed as "official" modules.

> Of course we can manage without that; we have.  But there is 
> a big difference between looking at a function and KNOWING 
> that it is only used in a small part of a large module and 
> having to CHECK the whole module to find out.

We've had a convention to name one module with an "i" suffix,
to delineate where the official interface functions of an
application are to be found. This has several disadvantages,
but has the big advantage (esp in large systems) that you 
can rather easily find the functions that have been written
with the intent of being used by users external to the 
application. It also has the property (here's the "later"
that you can check the revision of the file and tell whether
it were changed since last. If designers follow the rule
that you should put as little implementation as possible
in the interface modules, this check may actually tell
you something.

I think child modules make a lot of sense in terms of trying
to bring more structure into complex systems.

> 	-module(mod1).
> 	-record(record1, {field1, field2}.
> 	...
> 	
> 	-module(mod2).
> 	-import_records(mod1, [record1]).
> 	...
> 	
> 	Would be strictly equivalent (and can be easily transformed 
> 	into):
> 	
> 	-module(mod1).
> 	-record(record1, {field1, field2}).
> 	% Generated automatically by the compiler:
> 	-export([record_info/2]).
> 	record_info(fields, record1) -> [field1, field2];
> 	record_info(size, record1) -> 2.
> 	...
> 	
> 	-module(mod2).
> 	% This -record is generated from the result of calls to
> 	% mod1:record_info/2 by the compiler:
> 	-record(record1, {field1, field2}.
> 
> WHOOPS!  You are doing cross-module inlining!  You just made 
> an incompatible change to the semantics of the language! 

Even if the cross-module inlining step were skipped,
exporting the pseudo function record_info/2 would 
mean that some helper code could be written a lot 
more cleanly. I see very few disadvantages with doing
this, as the preprocessor has reserved both the name
and semantics of the record_info/2 function.

> I think many people trying to get real work done with Erlang 
> would say "if you want me to put up with a change in the 
> meaning of module names just so that you can get rid of the 
> preprocessor, no thanks, I'll stick with the preprocessor".

I agree.

BR,
Ulf W

>