Memoization in Erlang?

Wed Feb 1 09:39:52 CET 2006

I'm not going to comment on the general approach,
just answer the specific questions.

Pete Kazmier wrote:
> 1) Writing higher-order functions seems to be complicated
>    in Erlang because there is no support for varargs.
>    Thus, in my example, I have this kludge to return an
>    N-arg closure.  Is there a better way of handling this
>    keeping in mind that the wrapper function
>    should have the same interface as the memoized one?

In this particular case, perhaps apply(Module, Function, Args)
would be better than using funs?

> 2) In my kludge, I use erlang:fun_info/1 which states that
>    it is intended for debugging use.  Does this preclude
>    its use in my code? Should I instead just define
>    memoize1/1, memoize2/1, memoize3/1, etc ...

Instead of fun_info/1, you could use the guard function
is_function(F, Arity), i.e.

     if is_function(F, 0) ->
             fun() -> callit(Pid, []) end;
        is_function(F, 1) -> 
             fun(A1) -> callit(Pid, [A1]) end;
        is_function(F, 2) -> 
             ...
     end.

> 3) In lieu of my kludge, I started to wonder if I could
>    dynamically construct a string of code, then somehow
>    compile that on the fly, and return the resulting
>    function to the caller.  Is this even possible?

It is.

Here's an example of how to generate "forms", compiling
them, and then loading them into a running system:

    case generate_bin(Tabs, 1) of
        {ok, ?MOD, Bin} ->
            code:purge(?MOD),
            {module, ?MOD} =
               code:load_binary(?MOD, OutFile, Bin);
        {error, Reason} ->
            erlang:error(
               {Reason, erlang:get_stacktrace()})
    end

generate_bin(Tabs, L1) ->
    [{attribute,_,file,_}|_] = Forms = codegen(Tabs, L1),
    compile:forms(Forms, [no_warnings]).

codegen(Tabs, L1) ->
    L2 = L1+2, L3 = L2+1,
    {IndexFun, L4} = codegen_indexes(Tabs, L3+2),
    {RefsFun, L5} = codegen_references(Tabs, L4+2),
    {TIFun, L6} = codegen_table_info(Tabs, L5+2),
    {ModFun, L7} = codegen_module(Tabs, L6+2),
    {AclFun, L8} = codegen_acl(Tabs, L7+2),
    {AccessFun, L9} = codegen_access(Tabs, L8+2),
    {TabFuns, L10} = codegen_tabs(Tabs, L9+2),
    {TabPropFuns, L11} = codegen_tab_props(Tabs, L10+2),
    [{attribute,L1, file, {"rdbms_verify_jit.erl",1}},
     {attribute,L2, module, rdbms_verify_jit},
     {attribute,L3, export, [{validate_rec, 2},
                             {indexes, 1},
                             {references, 1},
                             {acl, 1},
                             {verify_access, 3},
                             {module, 1},
                             {table_info, 2},
                             {table_property, 2}]} |
     IndexFun
     ++ RefsFun
     ++ TIFun
     ++ ModFun
     ++ AclFun 
     ++ AccessFun
     ++ TabFuns
     ++ TabPropFuns
     ++ [{eof, L11+1}]].

codegen_indexes(Tabs, L1) ->
    {Body, L2} = lists:mapfoldl(
                   fun(Tab, L) ->
                           Indexes = indexes(Tab),
                           {{clause, L, [{atom, L, Tab}],
                             [],
                             [erl_parse:abstract(Indexes, L+1)]},
                            L+2}
                   end, L1+1, Tabs),
    {[{function, L1, indexes, 1, Body}], L2}.

(I actually don't think one has to bother about getting
the line numbers right, but I think it helps if/when
your generated code doesn't compile.)

Pretty-printing the forms can be done like this:

generate_src_file(File, Tabs, L1) ->
    [{attribute,_,file,_}|_] = Forms = codegen(Tabs, L1),
    Out = [[erl_pp:form(F), "\n"] || F <- tl(Forms)],

You can also generate code using the erl_syntax module.

The best way to understand what forms to generate may be
to first write a sample program by hand, and compiling
it with 'erlc +debug_info ...'. Then, you can fire up 
an erlang shell and extract the parsed forms using 
beam_lib:chunks("mymod.beam", [abstract_code]).

The output will most likely be truncated. Use e.g.
io:format("~p~n", [beam_lib:chunks(...)]).

>  If so, would it have access to the closed over
> variable?

Well, when you generate code at runtime, you should be
able to solve that problem to your liking.

You can also use erl_eval. See e.g. file:script/[1,2]:

eval_stream(Fd, Handling, Last, E, Bs) ->
    eval_stream(io:parse_erl_exprs(Fd, ''), Fd, Handling, Last, E, Bs).

eval_stream({ok,Form,EndLine}, Fd, Handling, Last, E, Bs0) ->
    case catch erl_eval:exprs(Form, Bs0) of
        {value,V,Bs} ->
            eval_stream(Fd, Handling, {V}, E, Bs);
        {'EXIT',Reason} ->
            eval_stream(Fd, Handling, Last,
[{EndLine,?MODULE,Reason}|E], Bs0)
    end;
eval_stream({error,What,_EndLine}, Fd, H, L, E, Bs) ->
    eval_stream(Fd, H, L, [What | E], Bs);
eval_stream({eof,_EndLine}, _Fd, H, Last, E, _Bs) ->
    case {H, Last, E} of
        {return, {Val}, []} ->
            {ok, Val};
        {return, undefined, E} ->
            {error, hd(lists:reverse(E, [{error, undefined_script}]))};
        {ignore, _, []} ->
            ok;
        {_, _, [_|_] = E} ->
            {error, hd(lists:reverse(E))}
    end.

(The Bs variable is a list of variable bindings passed
to erl_eval.)

> 4) Are there better ways to implement memoization
>    in Erlang?  Would using the process dictionary
>    be a bad idea?  My first thought was to store
>    the cached values in it, and then the whole
>    thing could be implemented in a single function
>    without the use of processes.

Using the process dictionary in a process that you
control altogether is not necessarily a bad idea,
but in your example code, I don't think you'd have
trouble using e.g. dict.erl and caching the data
in a dictionary 'loop variable' instead.

Some modules, like 'random' and 'mnesia', use the
process dictionary of the current process, so it's
not unheard of. I don't think it's a practice that 
should be encouraged, though.

Regards,
Ulf Wiger