[erlang-questions] Erlang and the learning curve

Wed Jan 5 18:40:46 CET 2011

On Tue, Jan 4, 2011 at 12:31, Morten Krogh <mk@REDACTED> wrote:

I am late to the ball, but:

> Definitely not much easier. It depends on the problem, and whether you
> include
> performance requirements. As soon as you want to control memory and
> performance,
> functional languages can actually be much much harder. And I mean much
> harder.

I disagree totally. First of all, performance and control of memory
are orthogonal entities. You don't need memory control to get
performance. In an imperative setting, it is clear that you can
explicitly manipulate memory to avoid allocation, go for cache
optimizations and so on. In functional languages, the trick is to
prefer algorithmic optimizations. Take operations research for
instance: The algorithm is everything. Even if you get the problem to
run 300 times faster, it is only a little bit more than 8 variables
(math:pow(2,8)) you can add to the problem. An algorithmic
optimization, like a cutting-plane algorithm or a clique cut generator
easily lets you add 5000-10000 variables.

The advantage of a higher level language in general is that you can
write code faster. So you can choose the optimal algorithm everywhere
for a fraction extra development time. For large projects where there
is no singular hot-spot intensive kernel in the code that takes all
the cycles, C is dwarfed. A video decoder is a prime example of an
application where Assembly language or C is beneficial. There are
certain operations which benefit from running as fast as possible -
and hence they are written in assembly.

Functional languages naturally have their blessings and curses. But I
think the blessings far outweigh the curses. Functional languages tend
to be simpler than imperative counterparts - especially in the
semantics department. And by default, there are virtually no errors
pertaining to state manipulation, good locality for the reader and
predictable execution rules. Of course the price is to be had in raw
speed - something the functional programmer will try to counterbalance
by algorithmic optimization.

Performance optimizations are not free. They cost resources, most
notably time. The point is: How do you prioritize your resources to
maximize their potential. My claim is that you can get a much better
benefit from using a functional language because:

  * There are far fewer errors in the resulting programs to weed out.
  * They are way easier to maintain - everything is essentially
straightforward data flow questions.
  * It is way easier to abstract code (Higher order functions and
anonymous functions is a blast here).
  * Symbolic manipulation is much easier in functional programming.

and hence, more time can be spent on getting clever algorithms into
your designs.

And if the etorrent application I write is any measurement point, we
have around the same performance as the C BitTorrent clients now. This
is with no HiPE by the way :) Why? Because much of the performance is
handled by the VM and because the key algorithms are chosen to be
fast. And I am not even counting the algorithmic optimizations a
co-author is doing on etorrent right now. We expect it to be even
faster.

> Why are one fourth, or whatever, of the posts on this list about nif or c
> nodes or ports?

It has already been covered: Often it is way easier to reuse a library
written in another language than it is to rewrite it yourself. If you
can make the use seamless by the use of a port or a NIF, then why
shouldn't we? If somebody spent 3-5 years meticulously optimizing a
library in another language, we can expect to have to use some time on
getting there. Even if we utilize that we are the second person to try
it so we don't have to carry out all the mistakes, and even if we
hypothesize we can write it faster in pure Erlang, it is simply not
worth the resource cost. Nay, it is a better solution to steal and
reuse.

> Even forgetting completely about performance, there are lots of problems
> that are more easily
> solved and written using imperative techniques.

Hardly. The last formal definition I made embedded an imperative
language inside a functional one because it is that easy. It is easy
to transform an imperative program into SSA-form and SSA is functional
programming, see:

http://www.cs.princeton.edu/~appel/papers/ssafun.ps

which of course begs the question "Is imperative and functional
programming really that different?" At least it shows what we already
knew: Any imperative program is easily transformed into a functional
one. But the problem you are hinting at is that the resulting program
might not be idiomatic to functional principles. This is the reason it
is "more easily" solved in imperative programs: The exposition is
imperative in the paper from which you take the algorithm. But if you
think about it, you will realize that many of the techniques have a
"dual" counterpart in FP so you can get an elegant solution anyway.

> You have pure functions in all languages, even if the type system doesn't
> force it on you.

I fail to see how the type system matter. It is more a question of semantics.

> I don't think it is such a big problem in C. You almost always understand
> whether a function has side effects or not.
> Library functions rarely do, except for possibly changing pointer arguments,
> which is documented in the api.

Interestingly, library subroutines may be re-entrant or not. So even
if the API interface is nicely pure, the internals of the library call
may pose to be problematic if you have thread or co-routine access to
the library. You can of course document this behavior as well - but in
a functional setting this is way easier to ensure.

> But is Erlang so different here. I could write a sin function, that changed
> a file, say, during execution.
> The issue is the same in C and Erlang. Library writers should not do it and
> mostly don't.
> Do you feel this is a real problem in C?

Erlang is not a purely functional language (Its message communication
is deeply imperative). Neither is Ocaml (it has an imperative subset).
Writing code in functional style is a matter of responsibility on the
programmer. Yet I claim that the primitives for doing this in Erlang
or Ocaml are by far better than those of C for achieving this style.

> What are the functions that are reused in Erlang but not in C?

C doesn't have closures. So anything with closure has to be explicitly
closure-converted by the programmer. In effect, you don't see code
that pass function pointers that much. Contrast with Erlang, where you
can do it easily. Since hoisting pieces of code and abstracting
function control flow desperately needs closures, I think I've made my
point.

>> I really think that FPLs are an order of magnitude easier to
>> understand than conventional PLs -
>>
> What? An order of magnitude?

This is probably true. The key difference is the execution model. In C it is

  <Stmt, W> -> W'

where Stmt is a statement to be executed, W and W' are *worlds* which
is the current state of the machine. In FP the execution model is
closer to:

  Exp -> Exp'

Where Exp, Exp' are *expressions* and execution happen by reducing
expressions. The key difference is that state in FP is explicitly
mentioned in the expression, whereas in C the state is partially
hidden in W. To understand the program, you must understand what
values W can take as well. Erlang is of course more complicated
because there is a reduction possibility Exp -> Exp for any process
that is alive at the moment - and the execution is non-deterministic
to boot. Yet, in my experience, it is still easier to grasp what the
program is doing.

> And most lines of code in C are easy to understand:
>
> sum = 0;
> for (i = 0; i < 1000000; i++) {
>    sum += i;
> }
>
> It is longer than a one liner in a FPL, but it is effort less to write.

It is not effortless to read though.

> I think it is more correct to say that all languages have domains where they
> are strong and others where they are weak, but if
> memory and performance really matters, you actually need languages that are
> closer to the hardware, and they are all imperative at the moment.

It is more specific than that. It takes a specific kind of performance
requirements before it is worth getting closer to the hardware. For a
lot of programs, C is a much better choice than assembly language.
Because the compiler is able to do optimizations automatically which
are good enough to dominate the effort it would take a human being to
do the same. The very same argument is true with Erlang: It is simpler
to express the program and get algorithmic optimizations into it -
especially if the program is concurrent in Erlangs case.

The elephant in the room currently that might completely change the
status quo is of course the multi-core question. It doesn't pertain
that much to Erlang as it does to FP in general: If you have 1024
cores or more, the raw execution speed of each core matter less in the
long run compared to how many of those you can get to do useful work.
And it may also be that the shared-memory model has to go - completely
altering the state of what is the "best thing" to do.

-- 
J.