[erlang-questions] Parameterized module idioms

Wed Apr 21 17:31:51 CEST 2010

Hi again,

Richard O'Keefe wrote:

> If they really *are* nonrelated, they should be separate parameters anyway.
> If they are related, they can be in one data structure (which might be
> a closure).  "Pollution" is not a property of closures, but of a coding
> style.

The pollution I mean is the passing around along the invocation chain 
from the "root" function up to the "leaf" functions, of several possibly 
unrelated parameters.

>> If some code would need to be generated if I didn't have parameterized 
>> modules, then parameterized modules already give me something.
> 
> What?  Automatic code generation isn't anything you need to be aware of,
> let alone involved with.

Ok. I can buy that. It is some extra infrastructure but not my problem.

>> Instead of
>>
>>  need_dump(Tab, LogOps) -> LogOps > ?DUMPLIMIT * ets:info(Tab, size).
>>
>> I can have
>>
>>  need_dump(Tab, LogOps) -> LogOps > DumpLimit * ets:info(Tab, size).
>>
>> with no change in the interface of the function.
> 
> 
> But this is the Functional Programming Lesson:
>   there *is* a change in the interface of the function!

The interface is what matters to client code, not internals. The 
interface has not changed:

Before: I supply an atom and an integer and get a boolean.

After: I supply an atom and an integer and get a boolean.

The client code that performs the invocation of the function does not 
see a change in the interface and does not need to be changed. More 
precisely: the invocations in the same module (intra-module calls) do 
not need to be changed. Invocations in other modules will be changed in 
that now the module name in the call is a variable; i.e. from m:f(P) to 
M:f(P).

> In the first case, the function uses nothing but its arguments.
> In the second case, the function has an extra parameter (DumpLimit).
> The function is _really_
> 
>     need_dump(%Hidden%, Tab, LogOps) ->
>     LogOps > %Hidden%#%hidden%.DumpLimit * ets:info(Tab, size).
> 
> How it's _compiled_ is a separate issue; I'm arguing about what it _means_.

But it almost seems like you are talking about how it is compiled ;)
What the function _uses_ is irrelevant to its interface. It is as if 
saying that the values captured by a closure are part of its interface, 
because the closure definition uses them.

That now an extra parameter is used is just an implementation choice, 
not a fundamental thing in the concept. One could have another 
implementation (not that I am saying it would be a good idea) were the 
module source is compiled at runtime with the parameters substituted and 
then loaded, resulting in a module like

m_instance_1213124

which has the same functions, with the exact same interface, and no 
hidden extra parameter. And when a client does

M:f(P)

what _really_ happens could be

m_instance_1213:f(P)

which would not be much different than doing now:

M = lists,
M:sort([3,1,2]).

with no hidden extra parameters being passed. Such an implementation 
would have very different consequences, both disadvantages like runtime 
  instantiation costs and possible errors, and advantages like possible 
optimizations using runtime knowledge, and I am not saying it would be 
realistic or in the spirit of what we expect. I am just saying that the 
interface remains the same, but now the function belongs to a module 
that is only computed at runtime and that for intra-module calls even 
that is irrelevant.

>> For example, a module pets_tm would have somewhere:
>>
>>  Res = pets_lib:delete_table(Tab),
>>
>> How do I make the path (as well as many other configuration 
>> parameters) a value chosen at runtime, with little effort in rewriting 
>> the code which didn´t contemplate such possibility beforehand?
> 
> The problem is that you can't.

Yes I can ;)
I actually could.

> Yes, you *can* replace pets_lib: by Pets_Lib:, but now
>  - either you have to pass Pets_Lib around all over the place,
>    which doesn't count as "little effort", or
>  - you have to pass Pets_Lib as a parameter to the module containing

Exactly: if it is a paramter in the client mode. Then it is little 
effort. It really WAS little effort in rewriting my library.

>    this call, which transitively affects its callers as well, ...

This is an interesting point, which I don´t know if it was much 
discussed here. I tended to notice that to be able to have nice 
little-effort changes, client modules end up having parameters. This is 
a sort of "viral" phenomena, which we want to contain.

This viral aspect made me think that when building an abstraction, the 
module(s) that are exposed to the outside world should NOT be 
parameterized, while the modules used internally can be parameterized if 
it helps productivity in writing code.

This is what I ended up with, looking at each module definition:

pets.erl:-module(pets).
pets_gc.erl:-module(pets_gc, [Lib, MaxTids, CollectRatio, PurgeRatio]).
pets_lib.erl:-module(pets_lib, [PetsDir]).
pets_loader.erl:-module(pets_loader, [Lib, MaxReaders, MaxInserters]).
pets_locker.erl:-module(pets_locker).
pets_tm.erl:-module(pets_tm, [Lib, DumpLimit]).
pets_writer.erl:-module(pets_writer, [Lib, SyncDelay]).
test.erl:-module(test).

The only module that clients use, "pets", is not parameterized. The 
modules used internally are either parameterized:

pets_gc, pets_lib, pets_loader, pets_tm, pets_writer

or not parameterized:

pets_locker
test

>  - and you had better first take care to rename any existing occurrences
>    of "Pets_Lib" to something else

Of course. But it is easy to: invent a nice name; then check that it 
doesn´t occur in any module.

> 
> Perhaps we can name this a "module parameter cascade".
> 
>> Making pets_lib a parameterized module. What is the impact of that on 
>> client code? A simple change to:
>>
>>  Res = Lib:delete_table(Tab),
>>
>> It looks pretty much the same, but now we have this Lib variable.
> 
> Looks, as they say, can be deceiving.
> 
>> If we were using closures, the closure would have to be passed somehow 
>> (who knows how many levels of invocations) until is was available to 
>> the function which performs this invocation.
> 
> (a) People seriously underestimate what closures can do.
> (b) This is not an argument for modules with parameters,
>     it is an argument for nested functions.

Not sure what you mean. Using closures will have a greater impact on 
client code, and also on the implementation code that I am trying to reuse.

> 
>> But if the pets_lib instantiation is a parameter of pets_tm, then I 
>> can use statements like the above all over pets_tm by doing a simple:
>>
>> :%s/pets_lib/Lib/g
> 
> You had better pray desperately to whatever god(s) you recognise
> that there are no other occurrences of Lib, and while you're at

easy: grep Lib *erl and look at the result.

> it, beg forgiveness for breaking the name link between the module
> pets_lib and the module instance variable Lib (which would be
> better as Pets_Lib, so that the only difference between the module
> and the instance variable is capitalisation).

This is a curious point. I am aware that I broke the connection. It was 
intentional. Before it was pets_lib only because the application is 
called pets. But I see it as my general library from this application. I 
  preferably wouldn´t want to worry about what the application is going 
to be called and to have to change all over from "pets_lib" to 
"amnesia_lib" if I decide to rename it. This kind of use is the normal 
convention in Erlang, having to worry about global module namespace 
pollution. Another advantage of using variable for modules, like "Lib" 
above, is having to worry less about that kind of pollution, and what 
the application is going to be called. ;)

>> Then we only need to glue modules together at service starting time; e.g.
> 
> That is, you are making a change to a module which requires
>  - *remote* compensatory changes
>  - to possibly *many* service startups
>  - which previously never mentioned the module in question at all.

I dont't understand what you mean. But all this gluing is done in the 
"main" for the internal modules of the application. As I exemplified, it 
was easy to do. The result is also easy to reason about. After 
instantiation, everything will behave as if the values passed to the 
glued modules had been -define'ed constants; no strange side-effects; 
referential transparency; functional style using "POF"s: plain old 
functions. (Does this term exist? ;))

> Let's take the data base example.
> 
>> need_dump(Tab, LogOps) -> LogOps > ?DUMPLIMIT * ets:info(Tab, size).
> 
> We want to make DUMP_LIMIT something that can be configured at
> run time.  But that's easy!
> 
>     need_dump(Tab, LogOps) ->
>     my_config:dump_limit(Tab) * ets:info(Tab, size).
> 
> where
>     -module(my_config).
>     -export([dump_limit/1]).
> 
>     dump_limit(_Tab) -> ?DUMP_LIMIT.
> 
> To change the configured value, load a new version of my_config.
> To select a value at system startup, select which version of the
> configuration module to load.

Here there is a recompilation and selection of module. But if we want to 
compute at startup time some value to serve as "parameter" , that value 
will have to be stored somewhere in a globally acessible data structure 
like ets, to be consulted by my_config:dumplimit/1. That can sometimes 
be slow.

Now I remember that one of the "permissible" uses for the process 
dictionary is to store "parameters" written once but never changed 
later. This kind of use ties code with the process structure and the use 
of get/1 can be slow. Parameterized modules can make some of these uses 
avoidable.

> The use-case for modules with parameters (if there is one) is
> where there are grounds for believing that there may need to
> be multiple distinct instances of the same module at the same
> time AND where the module parameter cascade is tolerable.

A quite common example would be several instances of a web server 
together listening in different ports. But in this case the module(s) 
exposed to the client being parameterized, contrary to my "guideline" 
above, would lead to the client possibly storing instances in some data 
structures to avoid being itself parameterized and containing the 
"viral" impact of parameterized modules.

Best regards,
Paulo