[erlang-questions] VM & BEAM Specs : idea for improving the lists support

Sun Aug 5 13:09:25 CEST 2007

>> I join to this question.
>>
>> I was also looking for the specifications once as I had an idea how to
>> improve the performance of list operations by changing the underlying
>> implementation.
>>     
>
> It would be interesting to share your ideas about improving the
> performance of list operations. What improvements did you have in
> mind? If you have some good ideas
> that we have not thought about we might implement them.
>
> I am also curious about which list operation you don't find effiecient
> as it is today.
>
> /Kenneth (Erlang/OTP team at Ericsson)
Hi Kenneth,

As far as I understand, the underlying implementation of lists which is 
used in Erlang is based on linked lists. So, each list is a number of 
objects with a value (the n'th element of the list) or a reference to it 
plus a link to the next element (if it exists).

My understanding is that only a number of operations take O(1) time to 
complete. These are:
* Adding an element to the beginning of the list
* Taking the first element of the list or its remainder.

However, the following operations take O(n) time:
* Finding out the  length of the list
* Adding an element at the end of the list
* Taking the last element of the list or everything before it

As a result, one of the common patterns in Erlang is to construct a list 
by appending the elements from the beginning and then reversing the list.

The problems I see with the current implementation are:
- The computational complexity required to do certain list operations is 
O(n), not O(1)
- The linked list is not CPU cache - friendly. It can be dispersed in 
the memory, reducing the cache - hits and thus killing the performance.
- Each of the list items is a quite big object. It takes 64bits if I am 
not mistaken. So, if you'd like to store a list of, say, 16bit 
characters / ordinary Unicode strings it takes 4 times more memory.

------------------
I believe, the current implementation can be (theoretically) improved. 
Of course, you will never know the exact numbers until someone 
implements the ideas. And the main factor which stops me is that (as far 
as I understand) Erlang uses that 64bit objects for almost everything so 
the change will be enormous.

---  Idea #1:

"Use unrolled linked lists to improve the performance"

The unrolled linked lists 
(http://en.wikipedia.org/wiki/Unrolled_linked_list) should give a 
slightly better cache-hit ratio and consume less memory. Also, they will 
maintain strict O(1) time complexity for operations which are O(1) now.

---  Idea #2:

"Use resize-able vectors to store lists"

I guess this change would give even more benefits comparing to the Idea 
#1. These are:
- The _amortized_ computational complexity is O(1) for 
adding/removing/taking a remainder no matter if it is done from the 
beginning or from the end of the list.
- It takes O(1) time to find out the size of the list.
- Cache-hits ratio is seriously improved as the list is located in a 
single chunk of memory.
- Much less work for a GC. However, I am not sure if it is an issue.

The drawback is that the time complexity is amortized. Again, it is not 
clear what is better: an amortized O(1) or strict O(1) and strict O(n). 
I'd prefer the amortized O(1).

A few words regarding the implementation:
If you take a look at Java LinkedList and ArrayList, the ArrayList is 
generally 3 times faster. It can be used as a reference to some extent. 
However, it does not allow O(1) adding/removing at the beginning of the 
list. So, instead of implementation of a list which can grow only in one 
direction, it is necessary to implement the one which can grow any way.
I am sure that you can also benefit from pureness or the functional 
approach in Erlang and from the fact that only one thread accesses the 
list all the time as you will not need to care about synchronization much.

--- Idea #3:

"Typed lists"

Both kinds of list implementations can benefit from knowing what type of 
data is going to be stored in them. It would lead to a decrease in the 
consumed space and improved performance.

It is possible to implement a "general" kind of list which can hold 
anything plus some specific implementations for storing 8bit and 16bit 
values.
If a value is appended to the list and it can not be (potentially) 
stored within it, the list changes its type. The operation will take 
O(N) time. However, it will benefit from the flat memory structure. The 
type checks are not necessary to implement in runtime. They can be made 
by analyzing the possible scenarios during the compile-time or during 
the loading of the module.
However, in most cases it will not be necessary to change the type of 
the list and lists of 8bit / 16bit values will do their job.
Also, you will not need to change the syntax of the programs as 
everything can be done on VM/OTP side.

The benefit is again in both performance and memory consumption.
Imagine that a program has consumes a large text document with English, 
Arabic and Russian characters. The current implementation will consume 4 
times more memory than an automatically-typed vector-based version. In 
case the document contains only ASCII characters, the memory consumption 
will drop 8 times.

---
With kind regards,
Denys Rtveliashvili