[erlang-questions] VM & BEAM Specs : idea for improving the lists support

Mon Aug 6 08:41:18 CEST 2007

On 5 Aug 2007, at 11:09 pm, Denys Rtveliashvili wrote:
> As far as I understand, the underlying implementation of lists  
> which is
> used in Erlang is based on linked lists.

Just like lists in Lisp, Prolog, ML, CAML, Clean, and Haskell,  
amongst others.
>

> The problems I see with the current implementation are:
> - The computational complexity required to do certain list  
> operations is
> O(n), not O(1)

Just like the computational complexity required to do certain operations
with tuples is O(n), not O(1).

> - The linked list is not CPU cache - friendly. It can be dispersed in
> the memory, reducing the cache - hits and thus killing the  
> performance.

Half true.  One of the oldest ideas in list processing (1960s?) is a
garbage collector that automatically straightens lists.  And the
Interlisp-D "cons" function used to go to some trouble to put new pairs
on the same page as the tail (if possible) or the head (if not).

> - Each of the list items is a quite big object. It takes 64bits if  
> I am
> not mistaken. So, if you'd like to store a list of, say, 16bit
> characters / ordinary Unicode strings it takes 4 times more memory.

Which is why you don't do that for large strings, but use binaries,
lists of binaries, or lists of tokens, or whatever.

> I believe, the current implementation can be (theoretically) improved.

Of the many systems I've used in the past, the ones with "smart" lists
were invariably worse than the ones with "dumb" ones.  My favourite
example is Interlisp-D, with CDR-coding and a smart CONS.  Xerox Quintus
Prolog, with "dumb" lists, ran list processing code 5 times faster on
the same hardware.  One would expect *idiomatic* Erlang code to be
similarly slowed down by a "smart" implementation of lists.

> "Use unrolled linked lists to improve the performance"
>

I have unrolled strict list code for Haskell and Clean.
See http://www.cs.otago.ac.nz/staffpriv/ok/software.htm/

A neural net program written in Haskell used 3650 seconds with
plain lists and 3057 seconds (about 1.2 times faster).  The best
I've ever seen, on a benchmark doing practically nothing else,
was 2 times faster.  (By making it GHC-specific I could probably
do somewhat better, but probably not much.)  That was for
numeric code where GHC's ability to treat !Double as unboxed
paid off.  Erlang has no such ability, so I suspect that we are
looking at the 1.2 factor at best.
>
> ---  Idea #2:
>
> "Use resize-able vectors to store lists"
>
> I guess this change would give even more benefits comparing to the  
> Idea
> #1

My guess is that this would have higher COSTS for existing code.
>
> - The _amortized_ computational complexity is O(1) for
> adding/removing/taking a remainder no matter if it is done from the
> beginning or from the end of the list.
We can get that with a pair of singly linked lists right now.

> - It takes O(1) time to find out the size of the list.
It is amazing just how seldom this is interesting.

> - Cache-hits ratio is seriously improved as the list is located in a
> single chunk of memory.
You are assuming that the list *elements* are not scattered all over
the place.  There seems to be no reason to believe that (except for
strings).  If you are mapping along a list doing some computation that
produces a new result, e.g.,
	zip([X|Xs], [Y|Ys]) -> [{X,Y}|zip(Xs, Ys)];
	zip(_, _) -> [].
the present scheme tends to put the list cell near the newly created
element it points to.  (If we ran heaps DOWN instead of up, the list
cell would precede the element, which would be perfect for a cache.
That's an experiment someone should try some day.)  Your proposal
would put a list element and the cell that points to it far apart,
which is bad for the cache.

> - Much less work for a GC.

That may well be true, but it is very far from obvious.

> A few words regarding the implementation:
> If you take a look at Java LinkedList and ArrayList, the ArrayList is
> generally 3 times faster.

Yes, but that's not really comparable.
A Java LinkedList is a record containing a stored size (this is one of
the things you are arguing FOR) and a pointer to a DOUBLY linked list.
An Entry<E> has to be at least 4 words and could be 6 depending on the
Java implementation.  Oh yes, it's a doubly linked list with a header,
so a completely empty Java LinkedList starts out holding at least as
much space as an Erlang list with 4 elements. Adding a single new  
element
at the front
(a) is destructive, unlike Erlang
(b) has to initialise at least four words of memory,
     has to change two existing pointers, and
     has to increment two counters.
An Erlang cons just has to initialise two words of memory.

Java's LinkedList is, in short, a textbook example of how *NOT* to do
linked lists, and if ArrayLists are only 3 times faster, that is a
really damning criticism of the implementation quality of ArrayList.

The functional programming literature has many interesting data  
structures,
including ones with fast cons, fast snoc, fast append, and fast  
index.  I've
tried a couple of them out.  They generally have high constant factors.

--- Idea #3:
>
> "Typed lists"
>
> Both kinds of list implementations can benefit from knowing what  
> type of
> data is going to be stored in them. It would lead to a decrease in the
> consumed space and improved performance.
>
> It is possible to implement a "general" kind of list which can hold
> anything plus some specific implementations for storing 8bit and 16bit
> values.

The NuProlog implementation of Prolog from the University of Melbourne
did this.  I've spoken in person with the guy who implemented that, and
he said it was a big headache and if they were doing it all again that's
one hack they would never ever try again.  Again a data point:   
Interlisp-D
handled strings as packed arrays of bytes (or packed arrays of 16-bit  
codes)
and had special microcode support for them.  Xerox Quintus Prolog,  
running
on the same hardware, in the Interlisp environment, used ordinary Prolog
list cells (just like Erlang ones).  Guess which one was faster at  
string
manipulation?

> The benefit is again in both performance and memory consumption.
> Imagine that a program has consumes a large text document with  
> English,
> Arabic and Russian characters. The current implementation will  
> consume 4
> times more memory than an automatically-typed vector-based version. In
> case the document contains only ASCII characters, the memory  
> consumption
> will drop 8 times.

Only if someone is daft enough to store the whole thing as a simple list
of characters.  There are much better ways, in the language and using  
the
implementation that we now have.