[erlang-questions] Priority queues and Erlang

Wed Jun 17 02:17:35 CEST 2009

http://xkcd.com/386/

2009/6/17 Richard O'Keefe <ok@REDACTED>

> It's time for some numbers.
> I apologise to the list for the delay;
> I had this stuff ready yesterday, but wasn't able to get to
> work because of snow.
>
> Definitions:
>
> 1. An object is mutable if and only if it is possible for its
>   state to change without its identity changing (without regard
>   for how *fast* such changes might be; SQL tables are mutable).
>
> 2. A programming language supports mutability if and only if it
>   provides or allows the programmer to create mutable objects
>   (without regard for how *fast* such changes might be; SQL
>   supports mutability).
>
> 3. A programming language supports mutability *efficiently* if and
>   only if it provides at least some "small" mutable data structures
>   whose state can be changed in about the same time as a function call.
>
> 4. Mutability does not guarantee an absence of copying.  Assigning one
>   Pascal record to another or one C struct to another is defined
>   as copying (which is why Simula 67 distinguished between
>   := content-copying assignment and :- pointer assignment), but
>   is none-the-less change of state without change of identity.
>
> 5. Compile-time garbage collection occurs in a programming language
>   implementation when the compiler determines that at least some
>   data structures are no longer useful and arranges for their
>   memory to become available for reuse without any runtime tests.
>
> 6. For the purposes of this message, "leftist heaps" are defined by
>   the Wikipedia entry http://en.wikipedia.org/wiki/Leftist_tree
>
> 7. "Imperative" code referred to below alters the nodes of a tree
>   in place using assignment statements.  "Functional" code
>   referred to below never alters a node once created, but allocates
>   new ones.  No claim is made that these labels have any merit or
>   any wider application other than making that distinction in this
>   message.
>
> According to these definitions, Erlang does support mutability
> through the process dictionary, the process registry, the module
> system, ETS, DETS, and connections to external data bases.  It
> does not support mutability *efficiently*, except perhaps in the
> case of the process dictionary.  Only benchmarks could determine
> whether that should count as efficient.  Also according to these
> definitions, Erlang does NOT have compile-time garbage collection.
>
> There are functional languages, notably Clean (with uniqueness types)
> and Mercury (with dead-input/unique-output modes) that do provide
> compile-time garbage collection.  *Semantically* these languages do
> not support mutability, but the implementation behaves *as if* they
> did; the 'merge' function will not in fact allocate any new storage.
>
> Here's what I did.
>
> I found the Leftist_tree page of the Wikipedia.
> It provides "imperative" code for merging leftist heaps in Java.
> I took that and wrote a complete Java program that
>
>   - creates a heap with n entries (command line parameter)
>   - for several millions of iterations
>       - remove the smallest entry
>       - create a new entry with the same key + a random number.
>
> This is pretty much the "hold" model for heap performance.
> Once this was working, I adapted it to produce a "functional" version.
> I then translated both versions to C and to Smalltalk.
>
> The programs were run for values of n ranging from 2 to 100,
> and a linear model
>
>    (user+system time in nanoseconds) ~ constant + speed * log(n)
>
> fitted to each using the lm() function in R.  (The model is called
> "linear" because it is linear in the _coefficients_; the fact that
> it isn't linear in n doesn't matter.)  I tried other models as
> well, but these fitted best.  The constant factor was ignored
> because (except in the case of Smalltalk) that includes the cost
> of program startup, dynamic library loading, and in the case of
> Java, JIT compilation.
>
> Rather surprisingly, in the case of the Java programs, the
> *actual* measured times for the "functional" version were in
> every case LESS than the actual measured times for the "imperative"
> version.  Yes, the imperative version was asymptotically more
> efficient, but the cross-over was estimated at n = 4,500.  I have
> no idea why this is so.
>
>
>                C       Java    My ST   Squeak
> "Imperative"    40       70     150     1123
> "Functional"   240      126     290     1895
>
>
> C means gcc 4.0; Java means JDK 1.5.0; My ST means my
> ANSI Smalltalk via C whole-program compiler; Squeak means
> Squeak Smalltalk 3.10.
>
> Interpretation:
>
>    Java is about 2 times slower than C.
>
>    My Smalltalk -> C is about 2 times slower than Java.
>    This is a consequnce of
>        - the absence of a type system
>        - strict encapsulation, so "x->f" and "x->f = e" are
>          function calls, not simple loads and stores.
>
>    Squeak is about 7 times slower than my Smalltalk compiler.
>    This is a consequence of
>        - a fully dynamic system (mine is static whole program)
>        - byte code emulation
>
>    Except for C, the "functional" versions are about twice as
>    slow as the "imperative" versions.
>    This is a consequence of
>        - extra memory allocation and garbage collection
>        - cache dilution
>
>    The C/Functional case stands out as particularly bad.
>    This is a consequence o
>        - using an unadapted version of the Boehm GC;
>          absolutely no attempt whatever was made to tune it.
>
>
> Let's consider the My ST/Java pair in a little more detail.
> In Java,
>        x.s = x.r.s + 1;
> should, with the aid of a JIT, turn into something like
>        LD t, r(x)
>        LD t, s(t)
>        ADD t, t, 1
>        ST t, s(x)
> In Smalltalk, encapsulation means that this has to be
>        set_s(x, get_s(get_r(x)) + 1)
> with each of get_r(), get_s(), and set_s() involving a
> dynamic dispatch, thanks to strict encapsulation, and
> even a+b turns into
>        if (((a|b)&3) == 0) {
>            long t = (long)a >> 2 + (long)b >> 2;
>            r      = t << 2;
>            if ((r >> 2) != t) r = make_bignum(t);
>        } else {
>            r = b_p(a, b);
>        }
>
> This could be much faster on a SPARC, at least for 30-bit fixnums.
> Something not entirely unlike it is necessary because Smalltalk
> integers don't have any arbitrary size limits, just like Erlang.
>
> It took careful coding, especially careful placement of code in
> the right objects, to get Smalltalk code running only twice as
> slow as Java.  For the Smalltalk/Java pair then
>
> A.  The combination of
>        - compile-time type system
>        - weak encapsulation
>        - crippled integers
>    is more important FOR THIS PROBLEM than efficient mutability.
>
> Let's consider the My ST/Squeak pair.
>
> My (not quite finished) Smalltalk system only supports ANSI
> Smalltalk plus enough to make up about 200 system classes.
> It compiles whole programs to ANSI C.  It has no debugging
> support, which is a right pain in the backside.  Compilation
> to C is blindingly fast; compilation _of_ the resulting C is
> painfully slow.  ANSI Smalltalk provides no way to change a
> class at run time, so my system doesn't either.  Currently, no
> inlining is done, though it's planned.  Thanks to compilation
> via C, "Smalltalk" methods may contain C code, although it
> doesn't matter for this benchmark.
>
> Squeak has nearly 4000 classes, and that's the stripped down
> version with all the fancy stuff moved out.  It is highly incremental.
> Methods and classes can be added, renamed, or removed while a program
> is running.  In fact _some_ program is _always_ running.  Its object
> representation is large than mine, due in part to garbage collector
> differences.  There have been attempts at a JIT for Squeak but they've
> always fallen out of date or in one case not yet been finished.  I
> didn't try the benchmark in a commercial Smalltalk, but other
> benchmarks I've done using VisualWorks NonCommercial have shown it
> performing pretty close to my Smalltalk.  So we conclude that
>
> B.  Native code -vs- emulated byte code
>    is FAR more important FOR THIS PROBLEM than efficient mutability.
>
> The C/Functional case is seriously anomalous.  I attribute this to
> the use of the Boehm garbage collector straight out of the box
> without the slightest attempt at tuning or adaptation.  I just
> used GC_malloc() instead of malloc().  I conclude that
>
> C.  FOR THIS PROBLEM, the relative cost of immutability depends
>    a very great deal on the garbage collector.
>
> I note that Erlang's immutable data structures and its per-process
> heaps have the _potential_ to allow a garbage collector to be
> more efficient than "general purpose" ones like the Boehm one.
>
>
> What does all of this tell us about priority queues in Erlang?
> Well, the machine I did the benchmarks on didn't have Erlang
> installed, but from what we know about it, it seems safe to conclude
> that
>        - the absence of an enforced static type system
>        - seamless integration of fixnums and bignums
> will make at least as much of a difference as the lack of
> efficient mutability, FOR THIS PROBLEM.  The presence of HiPE
> means that that shouldn't be _too_ much of an issue.
>
>
> Of course support for efficient mutability doesn't just affect the
> speed of an algorithm; it affects the very choice of an algorithm
> in the first place.
>
> all have either
>
>
>
>
> ________________________________________________________________
> erlang-questions mailing list. See http://www.erlang.org/faq.html
> erlang-questions (at) erlang.org
>
>