[erlang-questions] Priority queues and Erlang

Mon Jun 15 06:55:53 CEST 2009

On 15 Jun 2009, at 4:08 pm, John Haugeland wrote:

>> If by "efficient" you mean "same as C or Fortran, including constant
>> factors equal to within a few percent, then Erlang is not likely to
>> ever accomplish that with or without mutability.
>
> I think maybe you should profile before making the assumption that
> we're talking about a handful of percent.  The real world cost of
> immutability in this kind of context is actually very high.

Nonononono.
"A handful of percent" is what I'm saying you WON'T get.
The whole point of what I wrote is that I DON'T know what YOU
are talking about when you talking about "efficient".

>> Mutability per se is no guarantee of speed.
>
> Agreed.  However, in certain contexts - and this happens to be one of
> them - immutability is a guarantee that the work load will be several
> growth orders slower than is required.

If by "growth orders" you mean the exponent inside big-Oh,
this is very clearly wrong.  If you mean the CONSTANT FACTORS,
and I think you do, then one order of magnitude is credible
(especially for the more elaborate data structures), but "several
... orders" is not.

Amongst other things, the asymptotically efficient priority queue
data structures are about a decimal order of magnitude than
array-heaps in *imperative* languages.  It's not mutability per se
that is the issue, but the elaborate data structures required to
get those asymptotic bounds.

For example, AVL trees actually do pretty well.
An AVL tree node in C has left, right, key, value, delta.
An AVL tree node in Erlang will be about 20% bigger.
I'll buy a CONSTANT factor of 10 slow-down due to allocating
new nodes whenever you change the tree, although I think that's
on the high side.  But I cannot and will not believe "several
... orders" in the constant factors without benchmark evidence
to support it, and "several GROWTH orders" is completely wrong.
>
> Unfortunately, in practice, the implementation that achieves those
> growth rates has positively enormous growth coefficients.  I actually
> discussed this explicitly already in the mail to which you're
> replying, sir, hence the comment about needing a dataset of several
> tens of gigabytes in order for those growth rates to actually matter.

Yes, and I'm calling your bluff.

>
> Many programming language communities have already done this; there's
> no need for me to repeat existing work.

No, but there is a need for you to CITE it.
>> Mutability is available NOW, and in two different ways.
>
> No, it isn't.
>
>
>
>
>
>> (1) You can use ETS/DETS tables, which are mutable.
>
> No, they aren't.  Mutability doesn't mean "data may be replaced."
> Mutability is a question of altering data in place in a standing
> structure.  ETS and DETS don't do that; you're still paying for the
> copy because, under the hood, they're immutable.

Er, you have _looked_ at how ETS works, haven't you?
>
> Incidentally, one who uses ETS/DETS to discuss the efficient
> implementation of datastructures really probably needs to think
> through the matter.

I have.  I was pointing out that mutability is not enough in and
of itself.
>
>
>
>
>
>
>> (2) You can represent mutable variables by processes in a well
>> known way.
>
> I'm sorry, Mr. O'Keefe, but that's also not mutability.

Yes it is.  MUTABILITY is about whether state can be changed in
place or not.  With a mutable variable simulated by a process,
mutation IS done in place.  Except for performance, it has all
the semantic characteristics of mutation.

>  Mutability
> isn't a concern of usage.  It's a concern of method of implementation.

Now you are playing games with language.

Look, we agree, or I hope we do, that
  - Erlang is not as efficient as C
  - it is a proven fact that any imperative algorithm using O(N)
    memory can be emulated by a pure functional language using
    O(N) memory and an extra factor of log(N) in time, BUT NO MORE
  - asymptotically efficient algorithms do not always pay off
    for practical problem sizes
  - empirical evidence trumps a-priori beliefs

>

> Involving a message passing system, context switches and the
> significant number of copies involved in Erlang send and receive makes
> me very concerned about how far the question of efficiency has been
> thought through.

I didn't say it was FAST or EFFICIENT, just that it is an
accurate *semantic* model of MUTABILITY.  Heck, Erlang has
mutability all over the place:  the process registry, the
module system, even the external file system.  And of course
the Erlang system is built using mutability underneath.

> Blurring the meaning of the word "mutability" is a disservice to
> everyone involved.

True, so why are you doing it?

>  Mutability is simple: you can change data in-place
> without a copy.

Then int x = 0; ... x = 1; is not "mutability", because it
semantically makes a copy.

>  There is no mechanism for doing that in Erlang, short
> of something asinine like an external port.

I mentioned the process dictionary.  That is _precisely_
a per-process mutable hash table.  Why do you say there is
"no mechanism" when there is one?

>
>
> It's really pretty important to get things like that correct when
> discussing the impact of mutability on the implementation of
> datastructures.  Substitutes like processes-as-variables are
> critically different in actual performance, and suggesting otherwise
> leads to a deep fundamental misunderstanding of appropriate
> implementation strategy.

It is also important to read and comprehend correctly.
I have n ever at any time suggested that variables-as-processes
is FAST, only that it is *semantically* mutability.  This is not
my idea.  It's the standard way of modelling mutability in
process algebras.  And in a shared-heap system -- as some versions
of Erlang have been -- it doesn't involve any copy.

>
> Also, the hash table isn't O(1) at all.  It's amortized O(1) insert,
> and insert is the dominant concern here.  Further than that, I wonder
> if maybe you've realized the amount of work that's involved in phash2,
> and hash tables in general;

Having built many hash table implementations in my time, yes.

> indeed using a hash table to implement
> fake mutability

So Perl hash tables aren't REAL mutability, only "fake mutability"?

This is a very very strange redefinition of "mutability".

My previous message was not, repeat not, an attempt to claim that
any particular form of mutability in Erlang was efficient, but to
find out WHAT YOU MEANT.

So can we clarify things a little?

Consider four ways that an assignment statement X := Y
might be implemented.

(0) It compiles into a single store instruction.
     Nothing else happens.  There is no garbage collector.
(1) It compiles into a single store instruction.
     There is a garbage collector.
(2) The garbage collection facility uses reference counting,
     so this compiles into
	Y->ref++;
	if (0 == --X->ref) rec_free(X);
	X = Y;
(3) There is a generational collector with a write barrier.
     An assignment compiles into something that might be 10 times
     slower than a single assignment statement, but it's still O(1).

Clearly (0) is "mutability" to both of us,
and all of them are "mutability" to me.
Are any of them other than (0) "mutability" to you?

>
> The idea of invoking a hash table lookup and alteration to prevent
> copies is frankly pretty concerning.

"Preventing copies" is not something I have discussed recently
in that message or any other.  The idea may be concerning, but it
is your idea, not mine.

> I think perhaps it may be of
> value to you, sir, to look at the C which results from compiling the
> erlang to Gambit Scheme then to C, so that you will begin to
> understand the enormous overhead of the strategies you're discussing.

The problem is that you apparently believe that anything I
*mention* is something I *recommend*.

> Nobody ever said it was.

Basically, you said that Erlang is useless without it.
I'm saying that there is no reason to expect Erlang to
be useful to you WITH it.

>  However, immutability is frequently a
> guarantee of inefficiency.  It may be appropriate to learn how these
> things are implemented in mutable languages; there's no reason for a
> tree to be involved at all, and these things should be significantly
> more efficient

Can you name a priority queue implementation which is NOT a tree?
(The heap-in-an-array implementation *is* a tree, the links are
cunningly implicit, but the algorithm follows them none the less.)

>> It is news to me that bubble sort is EVER a reasonable choice.
>
> Bubble sort being a good choice on tiny data is well known in the
> graphics community, and is a common strategy for occlusion.

I have done extensive benchmarking of many kinds of sorts,
INCLUDING on small arrays up to size 50.  Bubble sort never ever
came out on top.

> Typically bubble sort stops being a good idea around 15-20ish items.
> Most advanced sorts won't even have gotten themselves set up by the
> time bubble sort on a five item container is done.

>> Had you written 'insertion sort', I'd have believed you.
>
> I don't expect anyone to believe me without benchmarking.  However,
> it'd be nice also if nobody vocally disbelieved me in public without
> benchmarking.  Real performance is frequently counterintuitive.

I don't believe it because I've benchmarked.

> Because this is part of a fundamental computer science education, and
> well known to engineers.  I also don't spend the time proving
> quicksort right.

By the way, it's "Dr" O'Keefe, not "Mr" O'Keefe, and I know what's
in a computer science education, because that's what I teach.  And
I know about sorts because I came up with several "improved"
sorting methods and did extensive benchmarking, determining as a
result that they weren't worth writing up.

> Yes, that was kind of my point.  To be useful in erlang, a priority
> queue implementation has to beat the trees in Erlang.  None of the
> strategies you've offered come anywhere near that.

I haven't OFFERED any strategies.

I've mentioned the Brodal and Okasaki algorithm and the King one.

Now, what *specifically* is wrong with the code Ulf Wiger posted?