[erlang-questions] Parameterized module idioms

Thu Apr 22 19:21:49 CEST 2010

> If we want to compute at startup time some value to serve as a "parameter",
> that value will have to be stored somewhere,
> AND THAT SOMEWHERE CAN BE A MODULE.
> 
> There is no law that says a module can't be written by a program.

Your are right, but it is more unpleasant/cumbersome than using 
parameterized modules.

> -module(reload).
> -export([put/1,get/0]).
> 
> put(Datum) ->
>     {ok,Device} = file:open("reloadable.erl", [write]),
>     io:format(Device,
>         "-module(reloadable).~n-export([datum/0]).~ndatum() ->~n    ~p 
> .~n",
>         [Datum]),
>     ok = file:close(Device),
>     compile:file("reloadable", []),
>     code:purge(reloadable),
>     code:load_file(reloadable).

Interesting that this reminds the hypothetical implementation for 
parameterized modules I sketched in the previous mail, in which a module 
would be compiled with the parameter substituted. But here we use a 
well-know name for the module, and do not have to propagate the 
generated module name in a variable.

> And yes, I do realise that what we have here is a global mutable
> variable with an amazingly slow assignment statement,
> but that's precisely what a changeable configuration parameter IS.
> 
> The overhead of of setting the parameter up (or changing it) is
> moderately high (although using compile:forms(..., [binary,...])
> would be more efficient as well as safer).  But the overhead of
> *using* the parameter is the same as the overhead of any
> cross-module function call.  And if that weren't tolerable, we
> wouldn't be using Erlang.

In this case I agree that the overhead of using it is negligible. But 
while it may be common that a configuration is something done at start 
time and we can afford it to be slow (like in my database); not always 
may that be the case, and having to resort to a mechamism (like module 
compilation) that implies slow (re)configuration may sometimes be a problem.

>> Now I remember that one of the "permissible" uses for the process 
>> dictionary is to store "parameters" written once but never changed 
>> later. This kind of use ties code with the process structure and the 
>> use of get/1 can be slow.
> 
> Slow?  Time for some numbers.
> 
> 6> getput:k(100000000).
> [{variant,constant},{result,100000000},{time,530}]
> 7> getput:b(100000000).
> [{variant,direct},{result,100000000},{time,1730}]
> 8> getput:t(100000000).
> [{variant,dictionary},{result,100000000},{time,2290}]
> 
> Here's the code that produced those:
> 
> -module(getput).
> -export([t/1, b/1, k/1]).
> 
> t(N) ->
>     put(key, 1),
>     {T0,_} = statistics(runtime),
>     R = loop(N, 0),
>     {T1,_} = statistics(runtime),
>     [{variant,dictionary},{result,R}, {time,T1-T0}].
> 
> loop(0, R) -> R;
> loop(N, R) -> loop(N-1, R+get(key)).
> 
> b(N) ->
>     {T0,_} = statistics(runtime),
>     R = loup(N, 0),
>     {T1,_} = statistics(runtime),
>     [{variant,direct},{result,R}, {time,T1-T0}].
> 
> loup(0, R) -> R;
> loup(N, R) -> loup(N-1, R+(N div N)).
> 
> k(N) ->
>     {T0,_} = statistics(runtime),
>     R = lowp(N, 0),
>     {T1,_} = statistics(runtime),
>     [{variant,constant},{result,R}, {time,T1-T0}].
> 
> lowp(0, R) -> R;
> lowp(N, R) -> lowp(N-1, R+1).
> 
> So
>  - the loop that just  adds 1 takes  5.3 ns per iteration
>  - the loop that adds N div N takes 17.3 ns per iteration
>  - the loop that uses get()   takes 22.9 ns per iteration
> We conclude that
>  - N div N  takes 12 ns
>  - get(key) takes 17.6 ns
> and therefore
>  EITHER my benchmark and interpretation are hopelessly fouled up
>  OR using get/1 is NOT particularly slow.
> 
> To be honest, I incline to the former; how can looking something up
> in a hash table be so good compared with an integer division?

I also found these numbers fishy. I thought about it and wondered if a 
considerable part of time is being spent dealing with operations on 
integers. I have a doubt: do big integers start larger in 64 bits than 
in 32? The efficiency guide still just says for small integers: Integer 
(-16#7FFFFFF < i <16#7FFFFFF). So either things are still the same or 
the guide has not been updated. If things are the same as for 32 bits, 
the time may be spent manipulating big integers. To test this hypothesis 
I added another variant:

s(N) ->
     {T0,_} = statistics(runtime),
     R = losp(N, 0),
     {T1,_} = statistics(runtime),
     [{variant,small_numbers},{result,R}, {time,T1-T0}].

losp(0, R) -> R;
losp(N, R) -> losp(N-1, (R+1) band 255).

This variant also does some manipulation of a parameter, but the result 
is always a small integer. Running:

air:tmp psa$ erlc +native getput.erl
air:tmp psa$ erl
Erlang R13B01 (erts-5.7.2) [source] [64-bit] [smp:2:2] [rq:2] 
[async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.7.2  (abort with ^G)
1> getput:k(100000000).
[{variant,constant},{result,100000000},{time,610}]
2> getput:s(100000000).
[{variant,small_numbers},{result,0},{time,340}]
3> getput:b(100000000).
[{variant,direct},{result,100000000},{time,3620}]
5> getput:t(100000000).
[{variant,dictionary},{result,100000000},{time,7110}]

Here, the "s" variant is event faster than the "k" one. This version is 
also accessing a parameter and doing some computation, but only doing an 
increment at most on a big integer. But this does not explain the large 
time in the "b" variant. I though the problem could be the "div" 
operation. Added the variant:

d(N) ->
     {T0,_} = statistics(runtime),
     R = lodp(N, 0),
     {T1,_} = statistics(runtime),
     [{variant,no_div},{result,R}, {time,T1-T0}].

lodp(0, R) -> R;
lodp(N, R) -> lodp(N-1, R + ((N+1) - N)).

which results in:

1> getput:d(100000000).
[{variant,no_div},{result,100000000},{time,730}]

Conclusion: the time for the non-get versions comes from the use of 
"div" and the number of times a possibly big integer is manipulated.

To measure more accurately what time "get" takes. I wrote:

-module(m).
-export([v/1, g/1]).

loop(_F, 0) -> ok;
loop(F, N) -> F(), loop(F, N-1).

run(F, N) ->
   T1 = now(),
   loop(F, N),
   T2 = now(),
   timer:now_diff(T2, T1) * 1000 div N.

v(N) -> run(fun() -> 1 end, N).

g(N) -> put(key, 1), run(fun() -> get(key) end, N).

Here the "v" function loops a function which just returns 1, and the "g" 
version is only different in that a get is performed. The result is the 
time elapsed in nanoseconds.

erlc +native m.erl
air:code psa$ erl
Erlang R13B01 (erts-5.7.2) [source] [64-bit] [smp:2:2] [rq:2] 
[async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.7.2  (abort with ^G)
1> m:v(100000000).
8
2> m:g(100000000).
73

So, the get takes around 65 nanosecs. To test the use of parameters of 
parameterized modules I wrote a module:

-module(mp, [P]).

exactly the same as "m", with an extra function:

p(N) -> run(fun() -> P end, N).

Running again:

air:code psa$ erlc +native mp.erl
air:code psa$ erl
Erlang R13B01 (erts-5.7.2) [source] [64-bit] [smp:2:2] [rq:2] 
[async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.7.2  (abort with ^G)
1> M = mp:new(10).
{mp,10}
2> M:v(100000000).
8
3> M:p(100000000).
8
4> M:g(100000000).
71

Conclusion: using a parameter is negligible; using a parameterized 
module in this case is negligible compared to what the module did before.

I remain with the opinion I had before, that get is too slow (for this 
purpose, compared with using parameters). If I need to lookup up a dozen 
parameters to serve a request, there goes almost 1 microsecond of wasted 
CPU.

And I use the process dictionary. I basically ignore all the fuss about 
how bad it is to use it. It is the fastest hash table we have in Erlang, 
appropriate to store large terms with possible substructure sharing, 
which would grind ets to a halt or possibly make memory usage explode.

But they are not a substitute for "instantly" acessible parameters.

> 
> While it may be true that get/1 _can_ be slow (I'd need to see the
> numbers), you should never just _assume_ that get/1 is slow for
> the use you intend to make of it.

I didn´t. ;)

>> A quite common example would be several instances of a web server 
>> together listening in different ports.
> 
> It's not clear why the port should be part of a web server's context
> rather than part of its state, or why these instances need to be all
> together in a single Erlang node (because if they aren't, module
> parameters offer us no convenience), or why if they are all in a single
> node "they" shouldn't be "it", a single system listening on several
> ports and doing load balancing of some sort.

Ok. Thinking about it, it was a bad example. It may be common, but in 
this case one would expect port numbers to change and plan ahead 
accordingly; and as you say, they would belong naturally to its state.

Regards,
Paulo