[erlang-questions] Feedback for my first non-trivial Erlang program

zxq9 zxq9@REDACTED
Wed Dec 16 05:14:55 CET 2015


On 2015年12月15日 火曜日 09:28:06 you wrote:
> On 12/15, zxq9 wrote:
> >The more fundamental problem, though, is that this is a compounding 
> >function that steps singly as it increments, but it is written in a way 
> >that it counts down, not up, computing the *entire* value chain below 
> >it all over again each step, and within each sub-step once the input 
> >value is over 60.
> >
> >Its an arithmetic fun-bomb!
> >
> >If you remove most of the intermediate functions (I think most but 
> >maybe profit/3 can be removed... ?) and compute the stepping values as 
> >arguments to an expanded defintion of total_balance this should run 
> >close to O(n).
> >
> >Refactoring for that may also expose some bugs...
> 
> This is the true problem with this code. It's calling itself in so many 
> ways the whole thing explodes combinationally and yields exponential 
> increases in time.
> 
> One quick way to work around that is through memoization with a macro 
> such as:
> 
>     %% memoization uses a process dictionary. While hackish and not portable
>     %% through processes, a PD of this kind has the advantage of being
>     %% garbage collected and being returned in a stack trace. There is also no
>     %% copying (see: comments over the macros) or locking needed.
>     -define(memoize(E), lazy_memoize(fun()-> E end)).
>     lazy_memoize(F) when is_function(F) ->
>         case erlang:get(F) of
>             undefined ->
>                 erlang:put(F,F()),
>                 erlang:get(F);
>             X -> X
>         end.
> 
> Once that's done, wrap up the costly functions in ?memoize(Expression) 
> and the value will be stored in a process dictionary.
> 
> I've done it a bit haphazardly, memoizing stuff at a glance that looked 
> recursive: https://gist.github.com/ferd/ab5fed3b8ffe4b226755
> 
> While the raw implementation stalled at ~80 iterations and became 
> impossible to run, memoization makes it far more workable:
> 
>     1> c(pension2).
>     {ok,pension2}
>     2> timer:tc(pension2, totalBalance, [8000]).
>     {354750,-5.145757524814171e19}
>     3> timer:tc(pension2, totalBalance, [8000]).
>     {11,-5.145757524814171e19}
> 
> Every follow-up call is then cheaper. Of course, the additional cost 
> comes in memory:
> 
>     4> erlang:process_info(self(), dictionary).
>     {dictionary,[{#Fun<pension2.1.58422776>,0},
>                 {#Fun<pension2.10.58422776>,-1325845925.5710535},
>                 {#Fun<pension2.7.58422776>,0.0},
>                 {#Fun<pension2.6.58422776>,16858.175941786},
>                 {#Fun<pension2.3.58422776>,-4116.625402543886},
>                 {...}|...]}
>     5> length(element(2,erlang:process_info(self(), dictionary))).
>     103703
>     6> erlang_term:byte_size(erlang:process_info(self(), dictionary)).
>     66125376
> 
> (the last call here uses https://github.com/okeuday/erlang_term). That's 
> 66mb of storage for these terms in the process dictionary at the very 
> least (for 8000 iterations) and the host OS reports ~100mb usage for the 
> whole VM.
> 
> Rewriting things to repeat fewer of these operations (building up from 0 
> to N, rather than N down to 0 with a lot of repetitions) would probably 
> save memory a whole lot. If refactoring is out of the question and 
> memory is cheap to you, then memoization is a quick way to get something 
> out of it.
> 
> Be mindful however that when memoizing in the process dictionary, every 
> process that runs the operation will end up carrying that data. Moving 
> it to ETS if the calls have to be distributed is an option, after a 
> while it becomes very cheap. Flushing the cache from time to time (like 
> a manual GC) is also a thing one may want to do.
> 
> Ultimately refactoring, if possible, will prove the best long-term 
> solution.


To beat a dead horse...

I messed around with this a little to demonstrate to the OP the difference between stepping and constantly re-computing.

Testing the original functions is not very easy because they aren't tail recursive (so I can't just jump into the middle of a single iteration and check values), but taking a look at values as they occur, the remove/3 function appears to never be called, and tax/2 returns 0.0 in every actual case except when T is 60 (where the case is always fixed: `tax(2214.5751488127876, 60) -> 306.95219504336956`):

tax(X,T) ->
    Portfolio = portfolio(T),
    Tax = profit(X,T,Portfolio)*taxRate(),
    io:format("tax(~tp, ~tp) -> ~tp~n", [X, T, Tax]),
    Tax.

%...

profit(X,T,[{Amount,Time}|Tail]) ->
    Profit = case currentValue(Amount,T-Time) > X of
        true -> X - originalValue(X,T-Time);
        false -> X - (Amount + profit((X - currentValue(Amount,T-Time)),T,Tail))
    end,
    ok = io:format("profit(~tp, ~tp, [{~tp, ~tp} | _]) -> ~tp~n", [X, T, Amount, Time, Profit]),
    Profit.

Gives:

1> pension2:totalBalance(63).
     [-- Snipping a *lot* of iterations out of this... it goes wild --]
tax(2214.5751488127876, 60) -> 306.95219504336956
profit(831.592400470033, 60, [{993.4028899999998, 2} | _]) -> 171.8792078975349
profit(2086.6665047965516, 60, [{991.6999999999998, 1} | _]) -> 923.087296899017
profit(2214.5751488127876, 60, [{127.90864401623617, 60} | _]) -> 1163.5792078975344
tax(2214.5751488127876, 60) -> 306.95219504336956
profit(2218.3399265657695, 61, [{2649.3502215622093, 61} | _]) -> 0.0
tax(2218.3399265657695, 61) -> 0.0
profit(831.592400470033, 60, [{993.4028899999998, 2} | _]) -> 171.8792078975349
profit(2086.6665047965516, 60, [{991.6999999999998, 1} | _]) -> 923.087296899017
profit(2214.5751488127876, 60, [{127.90864401623617, 60} | _]) -> 1163.5792078975344
tax(2214.5751488127876, 60) -> 306.95219504336956
profit(2218.3399265657695, 61, [{2649.3502215622093, 61} | _]) -> 0.0
tax(2218.3399265657695, 61) -> 0.0
profit(831.592400470033, 60, [{993.4028899999998, 2} | _]) -> 171.8792078975349
profit(2086.6665047965516, 60, [{991.6999999999998, 1} | _]) -> 923.087296899017
profit(2214.5751488127876, 60, [{127.90864401623617, 60} | _]) -> 1163.5792078975344
tax(2214.5751488127876, 60) -> 306.95219504336956
profit(2222.1111044409313, 62, [{4868.812299814968, 62} | _]) -> 0.0
tax(2222.1111044409313, 62) -> 0.0
87032.35394558453

I'm *pretty* sure this is not what was intended. It makes the totalBalance/1 function create a curve that very quickly represents a negative pension -- which I also don't think was the intent.

So that is almost certainly a logical flaw in the program. If it is not, then these fixed values should be precalculated as constants, because they never change (only the final element of the "portfolio" list does).

But looking at the efficiency angle once again. This is a compound *stepping* function that can iterate UP to T = X where X is the argument to total_balance/1, taking advantage of all the computation that has gone before, instead of trying to compute downward toward T = 0 and having to create the entire identical forest of computation trees that underlie each descending value.

Here is a version that has a VERY SIMILAR LOGICAL FLAW to the original program, but presents a slightly different diverging value over values of X > 60 (because I have no idea what the actual intent of the original program was -- in this case because the values count up instead of down the head of the portfolio list is the highest value, so instead of constantly replacing the lowest value {990, 0} the highest value continues to climb). This version is written to step up, and is basically just one big function:


-module(pension3).

-export([total_balance/1]).

%%% CONSTANTS
-define(working_time_units, 60).

transaction_fee()    -> 10.
ca_balance_start()   -> 10000.
sa_balance_start()   -> 10000.
net_income()         -> 3000.
expenses_start()     -> 2000.
inflation_rate()     -> 1.0017.
yield_rate()         -> 1.004.
tax_rate()           -> 0.2638.

%% total of what you own:
total_balance(T) when T < 0 ->
    0;
total_balance(0) ->
    ca_balance_start() + sa_balance_start();
total_balance(T) ->
    Current = 0,
    CA = ca_balance_start(),
    SA = sa_balance_start(),
    Salary = net_income(),
    Expenses = expenses_start(),
    Portfolio = [],
    total_balance(T, Current, CA, SA, Salary, Expenses, Portfolio).

total_balance(T, T, CA, SA, _, _, _) ->
    CA + SA;
total_balance(T, Current, CA, SA, Salary, Expenses, Portfolio)
        when Current < ?working_time_units ->
    Inflation = inflation_rate(),
    TransactionFee = transaction_fee(),
    Investments = Salary - Expenses - TransactionFee,
    Fees = Investments + TransactionFee,
    NewCurrent = Current + 1,
    NewCA = CA + Salary - Expenses - Fees,
    NewSA = SA * yield_rate() + Investments,
    NewSalary = Salary * Inflation,
    NewExpenses = Expenses * Inflation,
    NewPortfolio = [{Investments, Current} | Portfolio],
    total_balance(T, NewCurrent, NewCA, NewSA, NewSalary, NewExpenses, NewPortfolio);
total_balance(T, Current, CA, SA, Salary, Expenses, [{Amount, Time} | Rest]) ->
    Inflation = inflation_rate(),
    TransactionFee = transaction_fee(),
    Portfolio = [{Amount - original_value(Expenses, Current - Time), Current} | Rest],
    Tax = profit(Expenses, Current, Portfolio) * tax_rate(),
    Investments = -Expenses - Tax - TransactionFee,
    Fees = Investments + Tax + TransactionFee,
    NewCurrent = Current + 1,
    NewCA = CA + Salary - Expenses - Fees,
    NewSA = SA * yield_rate() + Investments,
    NewSalary = 0,
    NewExpenses = Expenses * Inflation,
    total_balance(T, NewCurrent, NewCA, NewSA, NewSalary, NewExpenses, Portfolio).

profit(X, T, S) -> profit(X, T, S, 0).

profit(X, _, [], A) ->
    X - A;
profit(X, T, [{Amount, Time} | Tail], A) ->
    TLessTime = T - Time,
    case current_value(Amount, TLessTime) > X of
        true  -> X - original_value(X, TLessTime);
        false -> profit(X, T, Tail, A + Amount)
    end.

%% helper functions for calculation the current and original value of a position
current_value(X,TD)  -> X * math:pow(yield_rate(), TD).
original_value(X,TD) -> X / math:pow(yield_rate(), TD).


This can *certainly* be compressed in terms of lines of text, but when a function has 7 arguments I like the names of arguments and their origins to be painfully obvious. Like reading that function is slightly painful, and it should be in this case, because I want all that variable assignment to stick out and make the next call absolutely clear. The variable assignments in there was really documentation for myself as I wrote the different clauses.

I can easily imagine people finger-pointing, saying "Oh, look at all those variable assignments! That's sooooo wasteful and will hurt performance" and *completely* overlook the nature of the function that is actually written compared to the original (which at first glance looks so nice with all those tiny functions, well, too many case statements for my taste, but whatever).

Obviously the code above can be reduced in ways other than inline argument value construction, and there is no possible way that all the helper functions that existed originally are of no value -- but they should have been called from within a single, main iteration like this one that relied on the previously computed values instead of generating their own from scratch every time they were invoked. But since I have no idea what they were really supposed to do and they didn't work anyway, I'm left with this version.

So let's compare these with timer:tc/3

1> timer:tc(pension2, totalBalance, [1]).
{13,21030.0}
2> timer:tc(pension3, total_balance, [1]).
{4,21030.0}
3> timer:tc(pension2, totalBalance, [25]).
{221,47557.60140838164}
4> timer:tc(pension3, total_balance, [25]).
{18,47557.60140838164}
5> timer:tc(pension2, totalBalance, [59]). 
{1018,91630.97931148569}
6> timer:tc(pension3, total_balance, [59]).
{47,91630.97931148569}
7> timer:tc(pension2, totalBalance, [70]). 
{785665,73294.80080230196}
8> timer:tc(pension3, total_balance, [70]).
{328,206288.05299963654}
9> timer:tc(pension2, totalBalance, [74]). 
{12821710,65186.052900269984}
10> timer:tc(pension3, total_balance, [74]).
{430,234679.70050062318}
11> timer:tc(pension3, total_balance, [100]).
{1107,184895.28643276813}
12> timer:tc(pension3, total_balance, [100000]). 
{138851,-1.7311541595393215e180}

pension3:totalBalance/1 stalls out at 75 for me. I'm sure it can complete, but I didn't wait for it. It obviously explodes at that size. pension3:total_balance/1 blows up at 1000000, apparently because the float value is exhausted:

13> timer:tc(pension3, total_balance, [1000000]). 
** exception error: an error occurred when evaluating an arithmetic expression
     in function  pension3:total_balance/7 (pension3.erl, line 54)
     in call from timer:tc/3 (timer.erl, line 197)

Computing 100,000 months of pension time by incremending values is 92 times faster than computing 70 months by decrementing. That ratio only gets worse as the values increase, as pension2 literally explodes in complexity when X > 60.

The overall fact remains, though, that each input always maps to a fixed output ("bijective"? Not good with math terms.), and it appears this is dealing with months of real time. I doubt any pension calculation is going to remain accurate more than 5 years (or more than 5 months, given current events...), and making pension projectsion is almost *certainly* unnecessary beyond, say, 1200 months. With that in mind it makes sense to run this computation *once* and store the result in a chart. Then it doesn't matter if it takes 5 days to compute the result, the answers are in a K-V table and instantly available for lookup.

Also -- I wrote above that the helper functions "didn't work". They MAY HAVE BEEN ACCURATE. Occasionally in finance and some other arithmetic-heavy domains you find yourself needing a ton of little functions that calculate a particular value, but you don't initially have time to analyze the nature of those functions. Sometimes even a cursory analysis will lead you to recognize that the nature of those functions lends themselves to shortcuts (actually, functional shortcuts pop up a lot in game servers as well). If the functions above are actually accurate most of them can be collapsed to simple cases or single replacements, instead of complete sequential searches to test for things you already know don't apply.

-Craig



More information about the erlang-questions mailing list