[erlang-questions] Erlang Cost Model

Tue Sep 22 04:05:29 CEST 2015

On 18/09/2015, at 9:59 pm, Adam Krupicka <akrupicka@REDACTED> wrote:

>> Note that O(1) does not mean fast - it just means "constant" time
>> without knowing *what* the constant is this information is useless.
> 
> It's not completely useless. More than anything it tells you that this
> function scales perfectly, independent on the amount of data you shovel
> into the function. Of course, the definition of O is such that O(9999) = O(1).

Joe has already hinted at this, but let me be explicit.
Most of the data-structures-and-algorithms books give you
cost formulas for data structures that are perfectly true
for a PDP-6 or a Univac 1108 but not terribly realistic on
today's machines.

Let me give you an example from about 30 years ago.
I was working at Quintus.
The atom table for Quintus Prolog could hold 2^20 atoms.
I wondered how much of a limit that was.
So I wrote a little program like this:
:- between(1, 1000000, N),
   (N mod 1000 =:= 0 -> write(N), nl ; true),
   name(N, Digits),
   name(_, [0'x|Digits]),
   fail.
It ticked along at something like a thousand atoms per second,
and then suddenly, the speed dropped to 6 atoms per second.
Hey, it's a hash table!  They are supposed to be O(1)!

Well yes, it sort of was O(1), but the constant factor depended
on whether the entire table fitted into your share of the 4 MiB
memory or whether you were paging (over the network...).

In *theoretical* terms, creating a new atom really was an O(1)
constants, but in *practical* terms performance fell off such
a steep cliff that it wasn't even a fast vs slow question but
a fine vs don't even think about doing that question.

Here's another example from a couple of years ago.
I wanted to be able to compute the Singular Value Decompositions
of some matrices.  The department had bought me a thick book of
numerical algorithms in Java and an accompanying CD-ROM.
I decided to benchmark the code before committing to writing any
serious amount of Java.  The code was claimed to be "high quality".

The performance was pretty bad.

(1) Look at the code.  Notice that it uses structures like
    for (col = 0; col < nCols; col++) {
        for (row = 0; row < nRows; row++) {
            use array[row][col];
        }
    }
    Whoops!  That's a great way to access an array that's stored
    in column-major (Fortran) order, but a lousy way to access an
    array stored in row-major (Pascal) order.  Guess what Java uses?

    Switching things around to access arrays in the right order
    speeded the code up by a factor of TWO.

(2) Rewrite the code in C.
    Another factor of TWO.

(3) Rewrite the C code to use the level-1 BLAS, using the
    hardware vendor's library.
    Another factor of TWO.

Now an enormous amount of labour had gone into transcribing an
*old* library developed for old machines into a new language
for new machines.  But the *cost model* of the old language
-- namely that all array indexing is O(1) -- wasn't even close
to reality on new machines.

Bearing in mind things like cache conflicts, lock contention,
inter-core communication costs, I don't think our dear old
data-structures-and-algorithms books are going to be much of
a guide for Erlang programs.