[erlang-questions] Priority queues and Erlang

Mon Jun 15 08:20:42 CEST 2009

> The whole point of what I wrote is that I DON'T know what YOU
> are talking about when you talking about "efficient".

Indeed.

> If you mean the CONSTANT FACTORS,
> and I think you do, then one order of magnitude is credible

I do not.

> Amongst other things, the asymptotically efficient priority queue
> data structures are about a decimal order of magnitude than
> array-heaps in *imperative* languages.  It's not mutability per se
> that is the issue, but the elaborate data structures required to
> get those asymptotic bounds.

And those elaborate datastructures are critically dependant on not
copying data, which in turn is reliant on in-place editing of data.
Which is, wait for it... mutability.

>> Unfortunately, in practice, the implementation that achieves those
>> growth rates has positively enormous growth coefficients.  I actually
>> discussed this explicitly already in the mail to which you're
>> replying, sir, hence the comment about needing a dataset of several
>> tens of gigabytes in order for those growth rates to actually matter.
>
> Yes, and I'm calling your bluff.

Respectfully, sir, you aren't.

>> Many programming language communities have already done this; there's
>> no need for me to repeat existing work.
>
> No, but there is a need for you to CITE it.

I have.  It's amusing that you removed the two cases thereof.

>> Incidentally, one who uses ETS/DETS to discuss the efficient
>> implementation of datastructures really probably needs to think
>> through the matter.
>
> I have.  I was pointing out that mutability is not enough in and
> of itself.

Nobody ever said that mutability was enough on its own; it would be
nice if you would focus on what was actually said.

What was actually said was "immutability is one lynchpin requirement."
 There are indeed many others.

>>> (2) You can represent mutable variables by processes in a well
>>> known way.
>>
>> I'm sorry, Mr. O'Keefe, but that's also not mutability.
>
> Yes it is.  MUTABILITY is about whether state can be changed in
> place or not.  With a mutable variable simulated by a process,
> mutation IS done in place.

No, sir.  "In place" means at a physical location in ram or other
storage.  What you are discussing is providing a uniform point of
reference.  This is no more in place than would be a call to a
database.

The germane issue is about editing the value as stored in memory
without a copy.  You may recognize this under the alternative
terminology "destructive updates."  Sending a message to a process
which makes a non-destructive discarding update then refers the value
back over a messaging system provides an algorithmically correct
substitute for mutability, but is nonethless not actual mutability.

> Except for performance, it has all
> the semantic characteristics of mutation.

This is, as I already mentioned, not a semantic issue, sir.  It is
also semantically equivalent to fetch the values by carrier pigeon.
Mutability is an implementation strategy; concerning yourself with
whether the resulting value of an alternative implementation strategy
is equal misses entirely the consideration of the workload required to
implement said strategy.

>>  Mutability
>> isn't a concern of usage.  It's a concern of method of implementation.
>
> Now you are playing games with language.

It is deeply disturbing that you feel that correct usage of
engineering terminology is "playing games with language."

> Look, we agree, or I hope we do, that
>  - Erlang is not as efficient as C

Generally not.  We agree.  Germane is that Erlang offers tools of
efficiency that C does not attempt to provide at all, which leads
(particularly in timeslicing) to Erlang having significantly more
efficient real world software than its reasonably expectable C
counterparts, which is why for example YAWS spanks almost any C
webserver, even stripped down.  Also, tail recursion can have a fairly
significant impact on the efficiency of a strategy; whereas tail
recursion can be implemented in C, it is not natively a part of C, so
there's open question whether or not that's a valid consideration.

But agreed, it is possible in almost most cases to implement things in
C more efficiently than in Erlang.

>  - it is a proven fact that any imperative algorithm using O(N)
>   memory can be emulated by a pure functional language using
>   O(N) memory and an extra factor of log(N) in time, BUT NO MORE

I am unaware of such a proof, but it seems reasonable, so I'm willing
to take your word for it.

>  - asymptotically efficient algorithms do not always pay off
>   for practical problem sizes

Sure.

>  - empirical evidence trumps a-priori beliefs

Indeed.  I'm not sure what any of these beliefs gain for the discussion, though.

>> Involving a message passing system, context switches and the
>> significant number of copies involved in Erlang send and receive makes
>> me very concerned about how far the question of efficiency has been
>> thought through.
>
> I didn't say it was FAST or EFFICIENT, just that it is an

Dude, you suggested it in the middle of a discussion between other
people about why this couldn't be done efficiently.  It's implied.

> accurate *semantic* model of MUTABILITY.  Heck, Erlang has

Yet again, semantic equivalence has no place in a discussion of efficiency, sir.

> mutability all over the place:  the process registry, the
> module system, even the external file system.

None of these are mutability, and I retain my belief that you are not
showing an understanding of what mutability actually is.

Think "destructive update" from here on in, please.

> And of course
> the Erlang system is built using mutability underneath.

This isn't germane, as it isn't available to the Erlang programmer.
This is a straw man (as has been much of this discussion).  You might
as well bring up that Erlang's implemented with pointers, for all the
value it has regarding the efficient pure erlang implementation of a
priority queue.

Please try to stick to topic.  You've been discussing things that have
nothing to do with the conversation you entered.

>> Blurring the meaning of the word "mutability" is a disservice to
>> everyone involved.
>
> True, so why are you doing it?

I will not engage in "no you", sir.

>>  Mutability is simple: you can change data in-place
>> without a copy.
>
> Then int x = 0; ... x = 1; is not "mutability", because it
> semantically makes a copy.

Mr. O'Keefe, your inability to grapple with the difference between
semantics and implementation strategies is disappointing.  That code
cannot be deemed mutable or immutable.  It could be implemented as
mutable (destructive update), or as label rebinding, or as series
access (like in APL), or in several other fashions.

This is the stuff of an introductory freshman course on datastructures, sir.

>>  There is no mechanism for doing that in Erlang, short
>> of something asinine like an external port.
>
> I mentioned the process dictionary.  That is _precisely_
> a per-process mutable hash table.  Why do you say there is
> "no mechanism" when there is one?

Because you have confused semantic equivalence for implementation
equivalence.  Yes, I'm well aware that you've proposed five different
things now that you think are mutability, because you seem to believe
that mutability is something other than destructive update.

You could also propose that the calendar module was mutability.  I
would continue to say that there was no mutability after the fact,
because of course, that _also_ isn't mutability.

Simply asserting something doesn't make it true, sir.

>> It's really pretty important to get things like that correct when
>> discussing the impact of mutability on the implementation of
>> datastructures.  Substitutes like processes-as-variables are
>> critically different in actual performance, and suggesting otherwise
>> leads to a deep fundamental misunderstanding of appropriate
>> implementation strategy.
>
> It is also important to read and comprehend correctly.

Agreed.  That would be a large part of my confusion why you've reacted
to the response to your barging into this conversation about
efficiency by saying you weren't talking about efficiency, or why you
continue to insist that things which are semantically equivalent to
mutability are in fact mutability.

It's about as valid as suggesting that an API call to a Vonage library
to place the results of a text to speech synthesizer into an outbound
call to Google Voice which returned the data to text and emailed it to
a system which then parsed out the value and provided it to the
application over a socket:

1) Is a good idea
2) Has any place in a discussion of efficient datastructures
3) Is mutability, since it's semantically equivalent

We aren't talking about "can you change the value".  There are a
million ways to do that.  Nobody's arguing that.  That just isn't what
mutability is, even if you pull your hair, stomp your feet and say
"well why are you blurring it then?"  There's a reason that the
keyword mutability doesn't mean that in any programming language, and
is not cited as meaning that in any engineering text.

Software engineering turns out not to be The Richard O'Keefe Show.
You, too, sir, make mistakes, and could occasionally learn something.

This is one of those times.

> I have n ever at any time suggested that variables-as-processes
> is FAST, only that it is *semantically* mutability.

I am aware of the thing you have suggested.  At each point I have
explicitly rejected it, and explained why.  That you continue to
believe I don't understand you is fairly annoying.

I get it.  You think that I said "can't be done without variable
updates."  That isn't what I said.

> This is not my idea.  It's the standard way of modelling mutability in
> process algebras.

No, sir.  It's the standard way of modelling updates in process
algebras.  I would greatly appreciate your sourcing any process
algebra which makes any effort to specify the implementation strategy
and also calls this mutability, which wasn't written by you or one of
your students.

> And in a shared-heap system -- as some versions
> of Erlang have been -- it doesn't involve any copy.

If it isn't defined as mutable, it isn't mutable.  Something undefined
is not an appropriate candidate for inclusion in a discussion of
efficiency, even if you really really want to change the discussion to
be about semantic equivalence.

>> Also, the hash table isn't O(1) at all.  It's amortized O(1) insert,
>> and insert is the dominant concern here.  Further than that, I wonder
>> if maybe you've realized the amount of work that's involved in phash2,
>> and hash tables in general;
>
> Having built many hash table implementations in my time, yes.

Frankly, sir, if you're aware of those costs and still brought this up
in a discussion that was explicitly about efficiency before you spoke
up, then I just don't know what to tell you.  You might as well bring
up cavalry strategy in a discussion of tank warfare.

>> indeed using a hash table to implement
>> fake mutability
>
> So Perl hash tables aren't REAL mutability, only "fake mutability"?

I don't know.  I don't speak perl.

> This is a very very strange redefinition of "mutability".

Mutability is well defined and well understood.  The way I'm using the
term is consistant and required by every language I'm aware of which
uses the term.  I would invite you to cite even one authoritative
example of your interpretation of this term which everyone else
understands to mean something specifically different.

It would be of interest your reaction to that The Art of Computer
Programming, CLRS, the C++ standard and the Java standard all
explicitly agree with the definition I've provided, and rely on it to
offer performance guarantees in other places.  In particular I'm
curious whether you know what keyword mutable does in C++.  There's a
reason that there's an entire keyword in C++ for something which
changes data internally in a constant instance of a class whose value,
from the outside world, is perceived to be unchanging.

It's not because mutable means "changes the value."

> (0) It compiles into a single store instruction.
>    Nothing else happens.  There is no garbage collector.

Could be either.  If the store instruction explicitly targets the same
physical location in RAM that held the old value (or a part thereof),
this is mutability.

> (1) It compiles into a single store instruction.
>    There is a garbage collector.

Could be either.  Probably a sequence update.

> (2) The garbage collection facility uses reference counting,
>    so this compiles into
>        Y->ref++;
>        if (0 == --X->ref) rec_free(X);
>        X = Y;

This has nothing to do with mutability or immutability.  This is about
lifecycle.

> (3) There is a generational collector with a write barrier.
>    An assignment compiles into something that might be 10 times
>    slower than a single assignment statement, but it's still O(1).

This is a sequence update (which means neither mutable nor immutable).

> Clearly (0) is "mutability" to both of us,

Probably.  It's clearly intended to be, but needs to be more strictly
defined to be known to be.

> and all of them are "mutability" to me.
> Are any of them other than (0) "mutability" to you?

I really don't understand why you find the following sentence so difficult.

"Mutability is about the ability to alter the value stored in-place in
a physical location in RAM".

It has nothing to do with the semantic value provided to programmers.
Indeed, anyone who knows what keyword mutable is for in C++ is pretty
stunned right now.  Mutability is frequently used for efficiency
reasons in things which are presented to the outside world as constant
valued.

This is a freshman or sophomore level topic at any reasonable computer
science program.

>> The idea of invoking a hash table lookup and alteration to prevent
>> copies is frankly pretty concerning.
>
> "Preventing copies" is not something I have discussed recently
> in that message or any other.  The idea may be concerning, but it
> is your idea, not mine.

Yes, it is.  It is the idea you attempted to argue with, and the
explicit purpose of the message which was sent to someone else before
you were involved at all.  It is the critical basis of the original
statement.

Now you're suggesting that it isn't germane.  This is why I suggested
to you previously to re-read the mail you're attempting to argue with.

>> I think perhaps it may be of
>> value to you, sir, to look at the C which results from compiling the
>> erlang to Gambit Scheme then to C, so that you will begin to
>> understand the enormous overhead of the strategies you're discussing.
>
> The problem is that you apparently believe that anything I
> *mention* is something I *recommend*.

Yes.  When you butt into someone else's discussion about efficiency,
and say "this is one way to do it," a normal person will understand
you to be recommending something.

However, I actually used the word "discussing".

>> Nobody ever said it was.
>
> Basically, you said that Erlang is useless without [mutability: ed].

I said no such thing.  I merely said that certain things could not be
done efficiently without it.  What would I be doing on an Erlang
mailing list, releasing tens of thousands of lines of Erlang code, if
I believed Erlang was useless?

Please stop inventing things I've said, sir.  It's significantly disappointing.

> I'm saying that there is no reason to expect Erlang to
> be useful to you WITH it.

That is entirely tangential to the conversation you invaded.  You
would be as well off to suggest that I've chosen a poor brand of
automobile, for all the germanity it has to the discussion at hand.

>>  However, immutability is frequently a
>> guarantee of inefficiency.  It may be appropriate to learn how these
>> things are implemented in mutable languages; there's no reason for a
>> tree to be involved at all, and these things should be significantly
>> more efficient
>
> Can you name a priority queue implementation which is NOT a tree?

Yes.  Indeed, any sophomore CS student can.  Perhaps you should catch
up to them.

If you'd like to pay me to be your private tutor, I will happily
hand-walk you through these things.  However, if you'd prefer to be
angry at and disrespectful to me for something I've not said, no, sir,
I will not provide you with the knowledge that well educated engineers
have, even if you believe that "proves I'm wrong".

Had you taken a somewhat more responsible tone, the outcome might be
different.  Indeed, I attempted to discuss that with you in private,
when I asked you to please withdraw from this conversation.

> (The heap-in-an-array implementation *is* a tree, the links are
> cunningly implicit, but the algorithm follows them none the less.)

Yes, we've been over that.  This is news to nobody.  Indeed, in the
original mail, I suggest a tree as the most sensible Erlang
implementation strategy.

Take a breath, sir.  You're arguing with phantoms in a particularly
disrespectful tone.

> I have done extensive benchmarking of many kinds of sorts,
> INCLUDING on small arrays up to size 50.  Bubble sort never ever
> came out on top.

Well, then either you had a poor implementation or a poor benchmark.
I'm not sure what to tell you.

I mean, Abrash is crazy smart.  He wouldn't have put it in as the
culling algorithm for faces in Quake if he hadn't benchmarked the
silly bejesus out of it.

>> I don't expect anyone to believe me without benchmarking.  However,
>> it'd be nice also if nobody vocally disbelieved me in public without
>> benchmarking.  Real performance is frequently counterintuitive.
>
> I don't believe it because I've benchmarked.

Okay.  I do believe it because I've benchmarked, and because I've
cycle counted it.  When I was a professional gameboy color developer,
I learned more about scraping single cycles than most programmers ever
do.

> By the way, it's "Dr" O'Keefe, not "Mr" O'Keefe, and I know what's
> in a computer science education, because that's what I teach.

How sad for your students.  Respectfully, Dr. O'Keefe, you might sit
in on a class at a more respectable university.

Or did you think you were the only doctor in the room?

>> Yes, that was kind of my point.  To be useful in erlang, a priority
>> queue implementation has to beat the trees in Erlang.  None of the
>> strategies you've offered come anywhere near that.
>
> I haven't OFFERED any strategies.

Ah, right.  You were just mentioning semantic equivalence when
shooting down my statements.

I'll be sure to note the difference there.  What is it, again?

> I've mentioned the Brodal and Okasaki algorithm and the King one.

Actually, sir, I mentioned those.  Not you.  I would invite you yet
again to read the original mail.

> Now, what *specifically* is wrong with the code Ulf Wiger posted?

Nothing.  And nobody ever said anything was.  It's indeed quite
fascinating how more than half the things you're arguing with were
never actually said by anyone other than you, and how you're
withdrawing from the critical junctures of the conversation not
involving you as non-germane (particularly efficiency) because they
undermine your commentary.

At this point I believe I will withdraw from conversing with you, as
you don't appear to have any understanding of the statements I made,
and are arguing with things that I did not say.

Have an enjoyable day, please.  And do read the email you were sent in private.