[erlang-questions] Priority queues and Erlang
Michael Richter
ttmrichter@REDACTED
Wed Jun 17 02:17:35 CEST 2009
http://xkcd.com/386/
2009/6/17 Richard O'Keefe <ok@REDACTED>
> It's time for some numbers.
> I apologise to the list for the delay;
> I had this stuff ready yesterday, but wasn't able to get to
> work because of snow.
>
> Definitions:
>
> 1. An object is mutable if and only if it is possible for its
> state to change without its identity changing (without regard
> for how *fast* such changes might be; SQL tables are mutable).
>
> 2. A programming language supports mutability if and only if it
> provides or allows the programmer to create mutable objects
> (without regard for how *fast* such changes might be; SQL
> supports mutability).
>
> 3. A programming language supports mutability *efficiently* if and
> only if it provides at least some "small" mutable data structures
> whose state can be changed in about the same time as a function call.
>
> 4. Mutability does not guarantee an absence of copying. Assigning one
> Pascal record to another or one C struct to another is defined
> as copying (which is why Simula 67 distinguished between
> := content-copying assignment and :- pointer assignment), but
> is none-the-less change of state without change of identity.
>
> 5. Compile-time garbage collection occurs in a programming language
> implementation when the compiler determines that at least some
> data structures are no longer useful and arranges for their
> memory to become available for reuse without any runtime tests.
>
> 6. For the purposes of this message, "leftist heaps" are defined by
> the Wikipedia entry http://en.wikipedia.org/wiki/Leftist_tree
>
> 7. "Imperative" code referred to below alters the nodes of a tree
> in place using assignment statements. "Functional" code
> referred to below never alters a node once created, but allocates
> new ones. No claim is made that these labels have any merit or
> any wider application other than making that distinction in this
> message.
>
> According to these definitions, Erlang does support mutability
> through the process dictionary, the process registry, the module
> system, ETS, DETS, and connections to external data bases. It
> does not support mutability *efficiently*, except perhaps in the
> case of the process dictionary. Only benchmarks could determine
> whether that should count as efficient. Also according to these
> definitions, Erlang does NOT have compile-time garbage collection.
>
> There are functional languages, notably Clean (with uniqueness types)
> and Mercury (with dead-input/unique-output modes) that do provide
> compile-time garbage collection. *Semantically* these languages do
> not support mutability, but the implementation behaves *as if* they
> did; the 'merge' function will not in fact allocate any new storage.
>
> Here's what I did.
>
> I found the Leftist_tree page of the Wikipedia.
> It provides "imperative" code for merging leftist heaps in Java.
> I took that and wrote a complete Java program that
>
> - creates a heap with n entries (command line parameter)
> - for several millions of iterations
> - remove the smallest entry
> - create a new entry with the same key + a random number.
>
> This is pretty much the "hold" model for heap performance.
> Once this was working, I adapted it to produce a "functional" version.
> I then translated both versions to C and to Smalltalk.
>
> The programs were run for values of n ranging from 2 to 100,
> and a linear model
>
> (user+system time in nanoseconds) ~ constant + speed * log(n)
>
> fitted to each using the lm() function in R. (The model is called
> "linear" because it is linear in the _coefficients_; the fact that
> it isn't linear in n doesn't matter.) I tried other models as
> well, but these fitted best. The constant factor was ignored
> because (except in the case of Smalltalk) that includes the cost
> of program startup, dynamic library loading, and in the case of
> Java, JIT compilation.
>
> Rather surprisingly, in the case of the Java programs, the
> *actual* measured times for the "functional" version were in
> every case LESS than the actual measured times for the "imperative"
> version. Yes, the imperative version was asymptotically more
> efficient, but the cross-over was estimated at n = 4,500. I have
> no idea why this is so.
>
>
> C Java My ST Squeak
> "Imperative" 40 70 150 1123
> "Functional" 240 126 290 1895
>
>
> C means gcc 4.0; Java means JDK 1.5.0; My ST means my
> ANSI Smalltalk via C whole-program compiler; Squeak means
> Squeak Smalltalk 3.10.
>
> Interpretation:
>
> Java is about 2 times slower than C.
>
> My Smalltalk -> C is about 2 times slower than Java.
> This is a consequnce of
> - the absence of a type system
> - strict encapsulation, so "x->f" and "x->f = e" are
> function calls, not simple loads and stores.
>
> Squeak is about 7 times slower than my Smalltalk compiler.
> This is a consequence of
> - a fully dynamic system (mine is static whole program)
> - byte code emulation
>
> Except for C, the "functional" versions are about twice as
> slow as the "imperative" versions.
> This is a consequence of
> - extra memory allocation and garbage collection
> - cache dilution
>
> The C/Functional case stands out as particularly bad.
> This is a consequence o
> - using an unadapted version of the Boehm GC;
> absolutely no attempt whatever was made to tune it.
>
>
> Let's consider the My ST/Java pair in a little more detail.
> In Java,
> x.s = x.r.s + 1;
> should, with the aid of a JIT, turn into something like
> LD t, r(x)
> LD t, s(t)
> ADD t, t, 1
> ST t, s(x)
> In Smalltalk, encapsulation means that this has to be
> set_s(x, get_s(get_r(x)) + 1)
> with each of get_r(), get_s(), and set_s() involving a
> dynamic dispatch, thanks to strict encapsulation, and
> even a+b turns into
> if (((a|b)&3) == 0) {
> long t = (long)a >> 2 + (long)b >> 2;
> r = t << 2;
> if ((r >> 2) != t) r = make_bignum(t);
> } else {
> r = b_p(a, b);
> }
>
> This could be much faster on a SPARC, at least for 30-bit fixnums.
> Something not entirely unlike it is necessary because Smalltalk
> integers don't have any arbitrary size limits, just like Erlang.
>
> It took careful coding, especially careful placement of code in
> the right objects, to get Smalltalk code running only twice as
> slow as Java. For the Smalltalk/Java pair then
>
> A. The combination of
> - compile-time type system
> - weak encapsulation
> - crippled integers
> is more important FOR THIS PROBLEM than efficient mutability.
>
> Let's consider the My ST/Squeak pair.
>
> My (not quite finished) Smalltalk system only supports ANSI
> Smalltalk plus enough to make up about 200 system classes.
> It compiles whole programs to ANSI C. It has no debugging
> support, which is a right pain in the backside. Compilation
> to C is blindingly fast; compilation _of_ the resulting C is
> painfully slow. ANSI Smalltalk provides no way to change a
> class at run time, so my system doesn't either. Currently, no
> inlining is done, though it's planned. Thanks to compilation
> via C, "Smalltalk" methods may contain C code, although it
> doesn't matter for this benchmark.
>
> Squeak has nearly 4000 classes, and that's the stripped down
> version with all the fancy stuff moved out. It is highly incremental.
> Methods and classes can be added, renamed, or removed while a program
> is running. In fact _some_ program is _always_ running. Its object
> representation is large than mine, due in part to garbage collector
> differences. There have been attempts at a JIT for Squeak but they've
> always fallen out of date or in one case not yet been finished. I
> didn't try the benchmark in a commercial Smalltalk, but other
> benchmarks I've done using VisualWorks NonCommercial have shown it
> performing pretty close to my Smalltalk. So we conclude that
>
> B. Native code -vs- emulated byte code
> is FAR more important FOR THIS PROBLEM than efficient mutability.
>
> The C/Functional case is seriously anomalous. I attribute this to
> the use of the Boehm garbage collector straight out of the box
> without the slightest attempt at tuning or adaptation. I just
> used GC_malloc() instead of malloc(). I conclude that
>
> C. FOR THIS PROBLEM, the relative cost of immutability depends
> a very great deal on the garbage collector.
>
> I note that Erlang's immutable data structures and its per-process
> heaps have the _potential_ to allow a garbage collector to be
> more efficient than "general purpose" ones like the Boehm one.
>
>
> What does all of this tell us about priority queues in Erlang?
> Well, the machine I did the benchmarks on didn't have Erlang
> installed, but from what we know about it, it seems safe to conclude
> that
> - the absence of an enforced static type system
> - seamless integration of fixnums and bignums
> will make at least as much of a difference as the lack of
> efficient mutability, FOR THIS PROBLEM. The presence of HiPE
> means that that shouldn't be _too_ much of an issue.
>
>
> Of course support for efficient mutability doesn't just affect the
> speed of an algorithm; it affects the very choice of an algorithm
> in the first place.
>
> all have either
>
>
>
>
> ________________________________________________________________
> erlang-questions mailing list. See http://www.erlang.org/faq.html
> erlang-questions (at) erlang.org
>
>
More information about the erlang-questions
mailing list