[erlang-questions] Feedback for my first non-trivial Erlang program

Tue Dec 15 15:28:06 CET 2015

On 12/15, zxq9 wrote:
>The more fundamental problem, though, is that this is a compounding 
>function that steps singly as it increments, but it is written in a way 
>that it counts down, not up, computing the *entire* value chain below 
>it all over again each step, and within each sub-step once the input 
>value is over 60.
>
>Its an arithmetic fun-bomb!
>
>If you remove most of the intermediate functions (I think most but 
>maybe profit/3 can be removed... ?) and compute the stepping values as 
>arguments to an expanded defintion of total_balance this should run 
>close to O(n).
>
>Refactoring for that may also expose some bugs...

This is the true problem with this code. It's calling itself in so many 
ways the whole thing explodes combinationally and yields exponential 
increases in time.

One quick way to work around that is through memoization with a macro 
such as:

    %% memoization uses a process dictionary. While hackish and not portable
    %% through processes, a PD of this kind has the advantage of being
    %% garbage collected and being returned in a stack trace. There is also no
    %% copying (see: comments over the macros) or locking needed.
    -define(memoize(E), lazy_memoize(fun()-> E end)).
    lazy_memoize(F) when is_function(F) ->
        case erlang:get(F) of
            undefined ->
                erlang:put(F,F()),
                erlang:get(F);
            X -> X
        end.

Once that's done, wrap up the costly functions in ?memoize(Expression) 
and the value will be stored in a process dictionary.

I've done it a bit haphazardly, memoizing stuff at a glance that looked 
recursive: https://gist.github.com/ferd/ab5fed3b8ffe4b226755

While the raw implementation stalled at ~80 iterations and became 
impossible to run, memoization makes it far more workable:

    1> c(pension2).
    {ok,pension2}
    2> timer:tc(pension2, totalBalance, [8000]).
    {354750,-5.145757524814171e19}
    3> timer:tc(pension2, totalBalance, [8000]).
    {11,-5.145757524814171e19}

Every follow-up call is then cheaper. Of course, the additional cost 
comes in memory:

    4> erlang:process_info(self(), dictionary).
    {dictionary,[{#Fun<pension2.1.58422776>,0},
                {#Fun<pension2.10.58422776>,-1325845925.5710535},
                {#Fun<pension2.7.58422776>,0.0},
                {#Fun<pension2.6.58422776>,16858.175941786},
                {#Fun<pension2.3.58422776>,-4116.625402543886},
                {...}|...]}
    5> length(element(2,erlang:process_info(self(), dictionary))).
    103703
    6> erlang_term:byte_size(erlang:process_info(self(), dictionary)).
    66125376

(the last call here uses https://github.com/okeuday/erlang_term). That's 
66mb of storage for these terms in the process dictionary at the very 
least (for 8000 iterations) and the host OS reports ~100mb usage for the 
whole VM.

Rewriting things to repeat fewer of these operations (building up from 0 
to N, rather than N down to 0 with a lot of repetitions) would probably 
save memory a whole lot. If refactoring is out of the question and 
memory is cheap to you, then memoization is a quick way to get something 
out of it.

Be mindful however that when memoizing in the process dictionary, every 
process that runs the operation will end up carrying that data. Moving 
it to ETS if the calls have to be distributed is an option, after a 
while it becomes very cheap. Flushing the cache from time to time (like 
a manual GC) is also a thing one may want to do.

Ultimately refactoring, if possible, will prove the best long-term 
solution.

Regards,
Fred.