[erlang-questions] Feedback for my first non-trivial Erlang program

Thu Dec 17 14:27:08 CET 2015

Hi, Paul.

On 2015年12月17日 木曜日 10:39:22 Paul Wolf wrote:
> - I understood from the beginning what (in theory) the problem with my
> program was. I intentionally (and I hope that was clear from the
> beginning) didn't try to solve it, because I feared that I solve those
> thing the wrong and not idiomatic way (because of the lack of my
> Erlang experience). I take that you're suggestions are the way to go.

Performance isn't everything -- its only a problem once you're *sure* it
is. With a bijective function like this with a limited range of values
that will ever be needed in production anyway, it really wouldn't matter
how inefficient it was because you would only execute it one time and
record the results in a table.

Sometimes readability can conflict with performance. But computers
will continue to get faster (in terms of clock speed, bus speed, per-clock
efficiency, cycles/money cost (the *real* one you care about), etc.).

In this case time that passes in execution is temporarily your enemy, but
the time that passes in your life is your friend.

On the contrary, performance tricks ALWAYS conflict with readability. And
you WILL forget what you were thinking as time passes, so when the day comes
that you want a readable version because computers have become faster you
will get to enjoy rewriting all those nifty, confusing little hacks out of
your code later.

In this case time that passes in execution is still your enemy and you spend
time in your life once fighting it, but time that passes in your life is now
also your enemy because your memory of the difference between the readable
code and the "performant" code will fade. (Unless Management is paying you
fat in any case and directs you to geek out on performance -- in that case
have fun geeking out on screaming performance. Sometimes this is the right
attitude, just not nearly as often as Management typically leads itself to
believe.)

The worst thing you can do is not write something that works at all because
you are trying from the outset to write something perfect. With this in
mind your code was a fine first version -- and it indeed did uncover what
is either a bug in one important function *or* an obvious way to completely
avoid a computation entirely (which is way better).

So whatever.

> - The main issues I anticipated were (as correctly identified by you as well):
> (a) problematic tail recursion and

Use accumulators. This isn't just "an optimization" -- this is also a good
way to *understand* what is happening within your functions, because you
can check the accumulator value at any point, whereas a non-tail recursive
function will not be able to report a value to you until the whole
computation is rolled up again -- and if the function is crashing somewhere
this could be anywhere (and that could mean sifting through a whopper of a
stack trace, considering how HUUUUGE the stack might be). Crashy values
often occur at edge cases, and edge cases are often close to or the same as
your base case. This can make debugging a super annoying puzzle.

In Erlang (and most other functional languages) tail-recursion is idiomatic,
simple recursion is not.

> (b) log(k^n) runtime (b) for n>working units. Also the memory I think
> would be bad, because k^n lists are build, if i am not mistaken.

The memory is exploding because its leaving a bunch of unfinished
computations lying around, pending the next result which itself is going
to wait around for more dependent results, etc.

Since many of the original functions did things like rebuild Pensions
where Pensions :: [{Value, Time}] by calling pensions(T) instead of
appending new *relevant* pension values of T over time, there were a
bajillion versions of Pensions temporarily in memory, and that gets even
crazier whenever the expenses/1 part of tax/2 is called when T > 60.

> (c) You mentioned something about refactoring. I am not 100% sure that
> this is viable in terms of readability and performance. The fact that
> remove/3 isn't used rather makes me think, that I have a bug somewhere
> in the logic.

This is *probably* a bug. But don't just take remove/3 as some abstract
function when you test it. Test it first for correctness as an abstract
function (based on its spec, it should always give correct answers regardless
the ordering or values given as input) -- but *then* you should test it
as a function over your *actual* values of Pensions, because those will
only grow at a specific rate based on your constants, so you can know
exactly what values will ever be passed to remove/3 with *those* specific
constants.

If remove/3 was correct per spec, and it turns out Pensions really will
never match its alternate clause, then you can comment out remove/3 and
its calling location and replace it with direct logic that shortcuts that
operation entirely. (Do NOT delete it, though. Your constants may change
someday and it would be silly to re-write it for no reason if it was
already correct.)

It is surprising how often computations that are central to an algorithm
across its entire domain can be skipped entirely within the range of
actual values that can possibly be passed as arguments to it. This comes
up in graphics, all sorts of game calculations (movement, speed, distance,
damage, auction prices, etc.), pathfinding, and sometimes finance.

So do it properly, but keep an open mind.

Skipping computations entirely is the Arch King of Speed Hacks.

> (d) The thing with counting down instead of up, I havent properly
> understood yet, but I will have a proper look into that.

You are calculating a compound value over time. Sort of like Fibonacci.

(I hate fib examples because they are almost always inapplicable to
anything -- but in this case, holy crap, it actually is a good toy
example!)

Any fib(N) value depends on all the fib(N-1) values until fib(1). That
means that to get *any* fib(N) you have to compute all the underlying
ones already. Counting DOWN instead of UP means you are going to calculate
*every* value that supports a given fib(N) a total of N-1 times FOR EACH
fib(N - 1), and this means its N! times. If you are going to calculate
them once, why not be done with it the first time?

I can't explain it any better than an example:

-module(fibses).

-export([fib_desc/1, fib_asc/1, fib_asc/4]).

%% Simple recursive, descending definition.
fib_desc(N) ->
    ok = io:format("Calling: fib_desc(~tp)~n", [N]),
    fib_d(N).

fib_d(0) -> 0; 
fib_d(1) -> 1; 
fib_d(N) -> fib_desc(N - 1) + fib_desc(N - 2).

%%% Tail-recursive, ascending definition.
fib_asc(N) ->
    ok = io:format("Calling: fib_asc(~tp)~n", [N]),
    fib_a(N).

fib_a(0) -> 0;
fib_a(1) -> 1;
fib_a(N) -> fib_asc(N, 2, fib_asc(0), fib_asc(1)).

fib_asc(N, Z, A, B) ->
    ok = io:format("Calling: fib_asc(~tp, ~tp, ~tp, ~tp)~n", [N, Z, A, B]),
    fib_a(N, Z, A, B).

fib_a(N, N, A, B) -> A + B;
fib_a(N, Z, A, B) -> fib_asc(N, Z + 1, B, A + B).

The way we expect the function to work in our minds is like fib_asc/1:

1> fibses:fib_asc(6). 
Calling: fib_asc(6)
Calling: fib_asc(0)
Calling: fib_asc(1)
Calling: fib_asc(6, 2, 0, 1)
Calling: fib_asc(6, 3, 1, 1)
Calling: fib_asc(6, 4, 1, 2)
Calling: fib_asc(6, 5, 2, 3)
Calling: fib_asc(6, 6, 3, 5)
8

But the naive definition EXPLOOOOODES!

2> fibses:fib_desc(6).
Calling: fib_desc(6)
Calling: fib_desc(5)
Calling: fib_desc(4)
Calling: fib_desc(3)
Calling: fib_desc(2)
Calling: fib_desc(1)
Calling: fib_desc(0)
Calling: fib_desc(1)
Calling: fib_desc(2)
Calling: fib_desc(1)
Calling: fib_desc(0)
Calling: fib_desc(3)
Calling: fib_desc(2)
Calling: fib_desc(1)
Calling: fib_desc(0)
Calling: fib_desc(1)
Calling: fib_desc(4)
Calling: fib_desc(3)
Calling: fib_desc(2)
Calling: fib_desc(1)
Calling: fib_desc(0)
Calling: fib_desc(1)
Calling: fib_desc(2)
Calling: fib_desc(1)
Calling: fib_desc(0)
8

OUCH!

When people say they want to parallelize fib(), this is what they wind up
doing in a bunch of processes because they don't quite understand why this
function is a horrible candidate for parallelization. Sure, you can run each
of these in a separate process, but everything else depends on the same set
of values over and over anyway, so nothing runs faster and you occupy every
drop of processing muscle you have DOING USELESS WORK YOU ALREADY DID AND
THEN ZOMG WHAT ARE YOU DOING DO YOU *ENJOY* GLOBAL WARMING WTF?!?!? AHHH!

This is also why I hate most node.js code I've ever read.

I'm not going to paste it here (people are unbelievably patient with me on
this list as it is!) but try calling fib_desc/1 with any value over 8.

(Actually, this version that wraps each call in an output function really
gives me a better idea why Fibonacci was fascinated with this phenomenon.)

Now, as for memory consumption, consider that *each* one of those needless
calls to fib_d/1 in that module has to wait in memory, waiting for its
tree of dependent results. The rate at which the *function* explodes makes
the storage size of each value carried over the stack meaningless in terms
of memory optimization -- the number of calls explodes so fast that even
stack frames full of [reference (int)] can eat gigabytes before you can
blink.

> - Regarding (b): You suggested to cache results. This would be exactly
> my approach, but as I stated before, I was worried, this is not the
> right thing to do in a functional language: I was hoping/expecting
> Erlang to might cache the results itself. My reasoning here is (more
> abstract in terms of functional languages general and not Erlang in
> specific) that functions are side effect free and two function calls
> with the same parameter always returns the same result. Therefore I
> sensed a possibility for optimisation Erlang doesn't seem to do. Do
> you know how other functional languages behave in this regard?

It depends. This is called "memoization" and is a common technique, but
not many language runtimes memoize results for you without you explicitly
asking it to, or implementing it yourself (and any language I've used to
write actual production software requires you to implement the memoization
yourself, its too hard to guess what makes sense to keep around -- this
would require *actually* magical GC).

You have to be careful in functional languages written for imperative
language runtimes and in multi-paradigm languages that can be used for FP
that didn't anticipate functional programming. Generally problematic
recursion in Scala and recursion depth limits in Python, for example, can
catch trip you up, and that's a simple problem. Lots of languages that are
just sorta-functional will manifest surprisingly weird processes from
innocent looking function definitions if you're not careful. Erlang
refreshingly straightforward and simple by comparison.

The general attitude here toward bottlenecks is NOT to make them more
performant or sit around praying for compiler or runtime optimizations,
but to instead AVOID HAVING BOTTLENECKS to begin with.

Also recognize that there are simply some places Erlang is not the proper
tool. Writing some number cruncher in a language tailored for that use
case and talking to it over a socket can be the super speed hack you are
looking for sometimes.

> - Regarding the macro/functions constant flavours: This is a
> discussion you can have in many languages I guess and I don't feel to
> interested in it. What sparked my interest however are the benchmarks
> by Technion. Function performance might be "good enough", but I was
> suprised, that the Erlang compiler has a real difference in runtime. I
> was thinking, that my very simple constant functions would be the most
> trivial thing to optimize away for the compiler. I have the overall
> impression by now that the Erlang compiler optimisations are not very
> strong. Can you support/oppose that impression? Also regarding the
> same topic: I understand, that functions in guards are not allowed,
> because they might have side-effects. But I also understand, that the
> compiler (in theory) could check wether a function has sideeffects or
> not?

This exact optimization exists as an option to the compiler. (This is
what Richard was referring to in his response.) But inlining like this is
NOT the default. The default is to build a non-magical Erlang assembly
of the program you wrote. Play with compiler options -P, -E, and -S to
see the process yourself -- you can learn a lot about the way things
work this way. The VM would be rather less accessible (to humans) if it
defaulted to doing lots of neat little tricks.

I think this is a fair tradeoff. The version I wrote doesn't really benefit
from using macros over functions for your constants anyway -- because *most*
of the values wind up in function arguments after the first iteration.

-Craig