[erlang-questions] Priority queues and Erlang
Mon Jun 15 06:08:47 CEST 2009
> If by "efficient" you mean "same as C or Fortran, including constant
> factors equal to within a few percent, then Erlang is not likely to
> ever accomplish that with or without mutability.
I think maybe you should profile before making the assumption that
we're talking about a handful of percent. The real world cost of
immutability in this kind of context is actually very high.
> Mutability per se is no guarantee of speed.
Agreed. However, in certain contexts - and this happens to be one of
them - immutability is a guarantee that the work load will be several
growth orders slower than is required.
> If you mean good asymptotic bounds, no, the Okasaki and Brodal
> paper shows that it can be done with O(1) everything except
> delete-min, which is O(lg n).
Unfortunately, in practice, the implementation that achieves those
growth rates has positively enormous growth coefficients. I actually
discussed this explicitly already in the mail to which you're
replying, sir, hence the comment about needing a dataset of several
tens of gigabytes in order for those growth rates to actually matter.
> If you mean something in between, then I think an example is
> needed to make your case. Take a functional language which supports
> mutability (such as SML or OCAML, or for that matter Haskell using
> the ST monad, or Clean using uniqueness annotations, or perhaps
> Mercury), and implement both functional priority queues and some
> sort of imperative priority queue. That will be informative.
Many programming language communities have already done this; there's
no need for me to repeat existing work. This was discussed at length
several years back on the Mozart-Oz users list, wherein someone
finally broke down and wrote out the code and benchmarks to resolve
people's tendency to go based on expectation.
Frankly, I'm much too lazy to do something like that.
>> The general performance of a tree cannot be matched, let alone
>> beaten, by an immutable priority queue.
> This is more than a little confusing: immutable implementations
> of priority queues ARE trees.
Not all of them. Okasaki's implementation isn't honestly that great,
and there are many ways of skinning a cat. The desirable
implementation is a skew heap, which is a tree in the way that a
mansion is a house - when you say house you don't typically assume 45
bedrooms and a chocolate fountain. The performance of a simple tree
is pretty fundamentally different than whatever long descended
still-technically-a-tree one might discuss; one may as well suggest
that the various other trees' behavioral characteristics be
>> There is no purpose to
>> implementing a priority queue in Erlang until mutability is available.
> Mutability is available NOW, and in two different ways.
No, it isn't.
> (1) You can use ETS/DETS tables, which are mutable.
No, they aren't. Mutability doesn't mean "data may be replaced."
Mutability is a question of altering data in place in a standing
structure. ETS and DETS don't do that; you're still paying for the
copy because, under the hood, they're immutable.
Incidentally, one who uses ETS/DETS to discuss the efficient
implementation of datastructures really probably needs to think
through the matter.
> (2) You can represent mutable variables by processes in a well
> known way.
I'm sorry, Mr. O'Keefe, but that's also not mutability. Mutability
isn't a concern of usage. It's a concern of method of implementation.
Involving a message passing system, context switches and the
significant number of copies involved in Erlang send and receive makes
me very concerned about how far the question of efficiency has been
Blurring the meaning of the word "mutability" is a disservice to
everyone involved. Mutability is simple: you can change data in-place
without a copy. There is no mechanism for doing that in Erlang, short
of something asinine like an external port.
It's really pretty important to get things like that correct when
discussing the impact of mutability on the implementation of
datastructures. Substitutes like processes-as-variables are
critically different in actual performance, and suggesting otherwise
leads to a deep fundamental misunderstanding of appropriate
> Both of these provide O(1) fetch and store, so, mutability.
Mutability has nothing to do with the order of complexity of engaging
Also, the hash table isn't O(1) at all. It's amortized O(1) insert,
and insert is the dominant concern here. Further than that, I wonder
if maybe you've realized the amount of work that's involved in phash2,
and hash tables in general; indeed using a hash table to implement
fake mutability to implement a tree to implement a queue is kind of
bizarre. There are many strategies available to implement these
things which would not require such a juxtaposition.
> For that matter, there is the process dictionary, which uses
> a per-process hash table.
The idea of invoking a hash table lookup and alteration to prevent
copies is frankly pretty concerning. I think perhaps it may be of
value to you, sir, to look at the C which results from compiling the
erlang to Gambit Scheme then to C, so that you will begin to
understand the enormous overhead of the strategies you're discussing.
These strategies are significantly worse in practical terms than the
strategy already suggested. The reason one turns to a skew heap is
because of the relatively low count of updates required to continue;
to implement that on top of fake mutability is sort of a red flag that
the underlying implementation choices are not well understood.
Respectfully, please implement and profile before continuing to offer
advice on the matter; this is not a topic which may be speculated
> Mutability is no guarantee of speed!
Nobody ever said it was. However, immutability is frequently a
guarantee of inefficiency. It may be appropriate to learn how these
things are implemented in mutable languages; there's no reason for a
tree to be involved at all, and these things should be significantly
> It is news to me that bubble sort is EVER a reasonable choice.
Bubble sort being a good choice on tiny data is well known in the
graphics community, and is a common strategy for occlusion. I seem to
remember learning that in Graphics Programming Black Book by Abrash,
which is freely available online, but I'm not 100% on that. This is
also something that is typically taught in sophomore level software
engineering courses as an advisory on the danger of getting lost in
believing that growth rates are the only factor of concern when
Typically bubble sort stops being a good idea around 15-20ish items.
Most advanced sorts won't even have gotten themselves set up by the
time bubble sort on a five item container is done.
> Had you written 'insertion sort', I'd have believed you.
I don't expect anyone to believe me without benchmarking. However,
it'd be nice also if nobody vocally disbelieved me in public without
benchmarking. Real performance is frequently counterintuitive.
>> I would love to be proven wrong with benchmarks, though; a priority
>> queue would be epic useful to me in Erlang.
> Why don't you try proving yourself *right* with benchmarks?
Because this is part of a fundamental computer science education, and
well known to engineers. I also don't spend the time proving
> To be useful in Erlang, a priority queue implementation does not
> have to be as fast as one in C. It just has to beat the alternatives
> (like gb_trees) currently available in Erlang.
Yes, that was kind of my point. To be useful in erlang, a priority
queue implementation has to beat the trees in Erlang. None of the
strategies you've offered come anywhere near that.
Please consider re-reading my original post. You'll discover that
this was generally covered ground.
More information about the erlang-questions