[erlang-questions] VM & BEAM Specs : idea for improving, the lists support

Tue Aug 7 04:43:32 CEST 2007

On 7 Aug 2007, at 5:41 am, Denys Rtveliashvili wrote:
>

> Well.. Your mileage may vary. The difference of performance depends  
> on lots of things. So, you can tell the difference only if there  
> are two finely tuned implementations.

in the Xerox Quintus Prolog case, an implementation that *wasn't* finely
tuned beat the pants off an implementation that *was* finely tuned by  
the
same people who designed and built the hardware that it ran on.

Only the loser needs to be "finely tuned"; it's even more impressive  
when
the winner isn't.

>>> "Use unrolled linked lists to improve the performance"
> OK, you've got a performance penalty here.

NO!  That is not what I wrote at all!  My point was that unrolled lists
***DO*** improve performance.  It's just that they don't improve it  
*much*.
Unrolling lists by a factor of four (typically 6 words for 4 elements =
1.5 words per element, instead of 2 or 3) could be expected to improve
space by a factor of 2 in the systems I tried (Clean and Haskell) and
we expect the space factor to be an upper bound on the time factor, as
indeed it was.

> Also it has sense to compare native implementations of dumb linked  
> list and an unrolled linked list.

But that is EXACTLY what I did.  The GHC compiler doesn't know  
anything about
built-in lists that it doesn't know (or more precisely, cannot be  
told) about
any other data structure.  Ditto Clean.

>>> ---  Idea #2:
>>>
>>> "Use resize-able vectors to store lists"
>>>
>>> I guess this change would give even more benefits comparing to  
>>> the  Idea #1
>>>
>>
>> My guess is that this would have higher COSTS for existing code.
>>
> Which costs do you mean? Could you give an example where an vector- 
> based list would be slower or consume more space than a linked list?

Of course I can.  Can't you?  There are many MANY implementations of the
"sequence" abstract data type in the literature.  What distinguishes  
them
is the space and time costs of the basic operations.  As you must know,
inserting an element as the kth element of an n element sequence takes
O(k) time and space with singly linked lists but O(n) time and space  
with
a vector.  Existing code has been written with knowledge of the costs of
the basic operations.  If you change those costs, what *was*  
efficient code
may become very *in*efficient.

>>> - The _amortized_ computational complexity is O(1) for
>>> adding/removing/taking a remainder no matter if it is done from the
>>> beginning or from the end of the list.
>>>
>> We can get that with a pair of singly linked lists right now.
>>
> Of course, you can emulate almost everything. It's just not too  
> much convenient.

That particular one is very convenient.

There is an interesting paper
"An Applicative Control-Flow Graph Based on Huet's Zipper"
by Norman Ramsey and Joâo Dias, in the 2005 ACM SIGPLAN Workshop on ML.
They moved from an imperative-style implementation of control flow  
graphs
to one which worked in a way strongly reminiscent of the back-to-back  
lists.
(It's known as "Huet's zipper" in the literature, although David H.  
D. Warren
was using it in 1979.)  This approach turned out to be smaller,  
easier to
work with, and best of all, faster than the imperative approach.

>>> - It takes O(1) time to find out the size of the list.
>>>
>> It is amazing just how seldom this is interesting.
>>
> Sure it is. But if you can get it for free, then why not?

But you CAN'T get the list size for free.  You just can't.
You are right that _at the point of use_ some sequence implementations
let you just fetch the size.  But you paid, heavily, for it to get  
there.
I've implemented sequences about a dozen different ways, and keeping
the size has two kinds of costs:
  - direct: it takes TIME to update the size whenever you update a list.
            it takes SPACE to hold the size.  So if I do
		X0 = [],
		X1 = [e|X0],	% space needed for 1, time to compute it
		X2 = [d|X1],	% space needed for 2, time to compute it
		X3 = [c|X2],	% space needed for 3, time to compute it
		X4 = [b|X3],	% space needed for 4, time to compute it
		X5 = [a|X4]	% space needed for 5, time to compute it
	We need a minimum of 3 words per cell instead of 2 (and if you
	don't share tails, you will need even more), and we not only have
	to fill in the new count, we have to fetch the old one and
	increment it.  So we can confidently expect that the cost is at
	least 1.5 times the existing cost.
  - indirect: it limits the kinds of implementation you are prepared to
	countenance.

By the way, if you don't allow lists to be built as shown above, then
what you are implementing is not Erlang lists.  This has an even more
interesting consequence:  not all Erlang "lists" *have* a length.  We
can make a data structure like Rect = [[A|B]|[C|D]] where A, B, C, D
are all numbers.  Rect doesn't have a length.  So what do you store for
its size?

> I might be missing something.

Paging.  There is a program I've written that wants to use about 400 GiB
of memory.  (It's in C, not Erlang, and I think I've figured out a  
way to
get that down to 200 MiB, but haven't had time to try it yet.)

> However, I don't think it matters if the cells and elements are  
> close to each other.

That's tantamount to saying "caching doesn't matter", because "close"  
means
"sometimes in the same L2 cache line".

>>> A few words regarding the implementation:
>>> If you take a look at Java LinkedList and ArrayList, the  
>>> ArrayList is
>>> generally 3 times faster.
>>>
>>
>> Yes, but that's not really comparable.
>> A Java LinkedList is a record containing a stored size (this is  
>> one of
>> the things you are arguing FOR) and a pointer to a DOUBLY linked  
>> list.
>> An Entry<E> has to be at least 4 words and could be 6 depending on  
>> the
>> Java implementation.  Oh yes, it's a doubly linked list with a  
>> header,
>> so a completely empty Java LinkedList starts out holding at least as
>> much space as an Erlang list with 4 elements. Adding a single new   
>> element
>> at the front
>> (a) is destructive, unlike Erlang
>> (b) has to initialise at least four words of memory,
>>      has to change two existing pointers, and
>>      has to increment two counters.
>> An Erlang cons just has to initialise two words of memory.
>>
>> Java's LinkedList is, in short, a textbook example of how *NOT* to do
>> linked lists, and if ArrayLists are only 3 times faster, that is a
>> really damning criticism of the implementation quality of ArrayList.
>>
> I understand your dislike to Java, but I think you are too biased  
> against it.

You h ave completely mistaken my point.  I am not attacking ***JAVA***,
I am attacking ***LinkedList***.

Here's another example:  in our Information Retrieval paper students  
have
to write a small IR Engine.  Every year we say "do in in C" and every  
year
some students insist on doing it in Java.  And whenever they do, the
performance is really dreadful.  Like 150 times slower than the model
answer.  Why?  Because the basic random I/O class does no buffering.
Staying in Java, it is possible to speed these student programs up by
about a factor of 15.  (They are still slower than AWK, but you can't
have everything.)  To say that that one *class* is horrible is not to
attack the whole language.

> Java lists _are_ mutable.

And Erlang's aren't.  Which is why your comparison of LinkedList with  
Erlang's
lists is invalid.  Erlang's lists just don't pay that price.

> They _do_ allow you to iterate in both directions.

And Erlang's don't.  Which is why your comparison of LinkedList with  
Erlang's
lists is invalid.  Erlang's lists just don't pay that price.

> If you can, you can always try to invent a better way to implement  
> a doubly linked mutable list.

Actually, it isn't in the least difficult to implement doubly linked  
lists
better than Java's LinkedList class does, as in "require less memory"  
and
"take less time".  But that's not the point.

YOU made the argument that because ArrayList is faster than  
LinkedList in
Java, it should be easy to beat Erlang's lists.  But LinkedList is slow
because
(a) it's in Java
(b) it is not that well written
(c) it is a doubly linked list with a header and stored size.

For what it's worth, I just ran three benchmarks on a 500 MHz machine.
Erlang: R11B-4.  Java: Sun JDK 1.5.0_08, with Sun's HotSpot JIT, running
on an UltraSPARC II (a machine designed and built by the same company  
that
made the JDK (:-)).  The benchmark was a trivial one:
     make a list of 2000 elements,
     add up all its elements, 10 000 times.

Java LinkedList:	5.26 seconds of CPU
Java ArrayList:		7.74 seconds of CPU
Erlang:                 3.56 seconds

Erlang?  Better than twice as fast as ArrayList?  Yes.

Of course different benchmarks will show different things.
But it has to be of *some* interest.

> But I guess its not easy.
> As for a simple immutable linked list, the Erlang version is great.  
> There is no doubt about it.
>
> By the way, in case you are attentive, you probably noticed that I  
> did not mean to say that the Java version of LinkedList is a great  
> thing. What I said is that Java's ArrayList can be an example of a  
> vector-based list, nothing more.

Not quite.  What you said, paraphrased, was "Java's ArrayList is  
three times
faster than Java's LinkedList, so we can expect vector-based lists to  
be an
improvement for Erlang."

> And of course, a much better thing can be implemented for a purely  
> functional language.

Why "of course"?  Better in what ways?

There certainly are many other implementations of the sequence  
abstract data
type which are good at things (like snoc or append) that Erlang lists  
are not
great at.  It would be nice to have some of them, and indeed, some  
are available
as contributed libraries.  It would be even nicer to have one or two  
"native".
But while *adding* such a thing would be good, we have as yet no  
reason to
believe that *replacing* the built in list implementation with  
something else
would be an improvement.

> Yes, I heard such stories. Are you sure they have implemented that  
> _properly_? I mean, not just like endless runtime checks for type  
> and all such horrid stuff. I mean an implementation were all  
> possible conversions and type checks are eliminated. For some  
> reason, I am sure they did not do that. :-)

Possibly because Prolog, like Erlang, is not a statically typed  
language?

I have to go and pick my elder daughter up from school.
>
> You might want to look at modern implementations of Common LISP,

Look at?  I *have* some, and use them.  However, I never take advantage
of the ability to turn off runtime checks...

> I believe there are some ways which can be probed to improve Erlang  
> performance.

Everyone knows that much.

> I do not say that they surely will give you a boost. They are just  
> some things that can be tried.

The point is that this *specific* one is unlikely to do anything to  
speed
up *EXISTING* Erlang code which was written for the cost profile Erlang
lists currently have.

> And two more points:
> - I do not see any easy way to try it except for deviating from the  
> standard and creating a different implementation of VM and some  
> basic libraries.

Trying small scale experiments would be a darned good start.

> - Perhaps everybody is happy about the speed, so let's just forget  
> about it.

Everyone would say "thank you" for something that made existing code  
faster.
My point is that intuitions about what makes for a fast list, based on
other languages, can be an extremely unreliable guide.