[erlang-questions] VM & BEAM Specs : idea for improving, the lists support

Mon Aug 6 19:41:57 CEST 2007

> Of the many systems I've used in the past, the ones with "smart" lists
> were invariably worse than the ones with "dumb" ones.  My favourite
> example is Interlisp-D, with CDR-coding and a smart CONS.  Xerox Quintus
> Prolog, with "dumb" lists, ran list processing code 5 times faster on
> the same hardware.  One would expect *idiomatic* Erlang code to be
> similarly slowed down by a "smart" implementation of lists.
>   
Well.. Your mileage may vary. The difference of performance depends on 
lots of things. So, you can tell the difference only if there are two 
finely tuned implementations.
>> "Use unrolled linked lists to improve the performance"
> I have unrolled strict list code for Haskell and Clean.
> See http://www.cs.otago.ac.nz/staffpriv/ok/software.htm/
>
> A neural net program written in Haskell used 3650 seconds with
> plain lists and 3057 seconds (about 1.2 times faster).  The best
> I've ever seen, on a benchmark doing practically nothing else,
> was 2 times faster.  (By making it GHC-specific I could probably
> do somewhat better, but probably not much.)  That was for
> numeric code where GHC's ability to treat !Double as unboxed
> paid off.  Erlang has no such ability, so I suspect that we are
> looking at the 1.2 factor at best.
>   
OK, you've got a performance penalty here. However, I assume you would 
not argue that an unrolled list consumes less space. Also it has sense 
to compare native implementations of dumb linked list and an unrolled 
linked list. Otherwise you can have additional penalties you would not 
get with a native version.

>> ---  Idea #2:
>>
>> "Use resize-able vectors to store lists"
>>
>> I guess this change would give even more benefits comparing to the  
>> Idea #1
>>     
>
> My guess is that this would have higher COSTS for existing code.
>   
Which costs do you mean? Could you give an example where an vector-based 
list would be slower or consume more space than a linked list?
>> - The _amortized_ computational complexity is O(1) for
>> adding/removing/taking a remainder no matter if it is done from the
>> beginning or from the end of the list.
>>     
> We can get that with a pair of singly linked lists right now.
>   
Of course, you can emulate almost everything. It's just not too much 
convenient.
>> - It takes O(1) time to find out the size of the list.
>>     
> It is amazing just how seldom this is interesting.
>   
Sure it is. But if you can get it for free, then why not?
>> - Cache-hits ratio is seriously improved as the list is located in a
>> single chunk of memory.
>>     
> You are assuming that the list *elements* are not scattered all over
> the place.  There seems to be no reason to believe that (except for
> strings).  If you are mapping along a list doing some computation that
> produces a new result, e.g.,
> 	zip([X|Xs], [Y|Ys]) -> [{X,Y}|zip(Xs, Ys)];
> 	zip(_, _) -> [].
> the present scheme tends to put the list cell near the newly created
> element it points to.  (If we ran heaps DOWN instead of up, the list
> cell would precede the element, which would be perfect for a cache.
> That's an experiment someone should try some day.)  Your proposal
> would put a list element and the cell that points to it far apart,
> which is bad for the cache.
>   
I might be missing something. However, I don't think it matters if the 
cells and elements are close to each other. There are different 
architectures of CPU caches, however I do not remember any modern one 
which would map, say, the whole 2M of cache to a flat block of memory. 
Instead, the cache is a number of cells, each of them maps to its small 
piece of memory. As a result, both current implementation and the one 
with cells and elements grouped into lines apart would have similar 
performance while zipping.
>> A few words regarding the implementation:
>> If you take a look at Java LinkedList and ArrayList, the ArrayList is
>> generally 3 times faster.
>>     
>
> Yes, but that's not really comparable.
> A Java LinkedList is a record containing a stored size (this is one of
> the things you are arguing FOR) and a pointer to a DOUBLY linked list.
> An Entry<E> has to be at least 4 words and could be 6 depending on the
> Java implementation.  Oh yes, it's a doubly linked list with a header,
> so a completely empty Java LinkedList starts out holding at least as
> much space as an Erlang list with 4 elements. Adding a single new  
> element
> at the front
> (a) is destructive, unlike Erlang
> (b) has to initialise at least four words of memory,
>      has to change two existing pointers, and
>      has to increment two counters.
> An Erlang cons just has to initialise two words of memory.
>
> Java's LinkedList is, in short, a textbook example of how *NOT* to do
> linked lists, and if ArrayLists are only 3 times faster, that is a
> really damning criticism of the implementation quality of ArrayList.
>   
I understand your dislike to Java, but I think you are too biased 
against it. Java lists _are_ mutable. They _do_ allow you to iterate in 
both directions. If you can, you can always try to invent a better way 
to implement a doubly linked mutable list. But I guess its not easy.
As for a simple immutable linked list, the Erlang version is great. 
There is no doubt about it.

By the way, in case you are attentive, you probably noticed that I did 
not mean to say that the Java version of LinkedList is a great thing. 
What I said is that Java's ArrayList can be an example of a vector-based 
list, nothing more. And of course, a much better thing can be 
implemented for a purely functional language.
> --- Idea #3:
>   
>> "Typed lists"
>>
>> Both kinds of list implementations can benefit from knowing what  
>> type of
>> data is going to be stored in them. It would lead to a decrease in the
>> consumed space and improved performance.
>>
>> It is possible to implement a "general" kind of list which can hold
>> anything plus some specific implementations for storing 8bit and 16bit
>> values.
>>     
>
> The NuProlog implementation of Prolog from the University of Melbourne
> did this.  I've spoken in person with the guy who implemented that, and
> he said it was a big headache and if they were doing it all again that's
> one hack they would never ever try again.  Again a data point:   
> Interlisp-D
> handled strings as packed arrays of bytes (or packed arrays of 16-bit  
> codes)
> and had special microcode support for them.  Xerox Quintus Prolog,  
> running
> on the same hardware, in the Interlisp environment, used ordinary Prolog
> list cells (just like Erlang ones).  Guess which one was faster at  
> string
> manipulation?
>   
Yes, I heard such stories. Are you sure they have implemented that 
_properly_? I mean, not just like endless runtime checks for type and 
all such horrid stuff. I mean an implementation were all possible 
conversions and type checks are eliminated. For some reason, I am sure 
they did not do that. :-)

You might want to look at modern implementations of Common LISP, for 
instance. They do such kind of job nicely. Also, you can for instance 
compare Erlang HIPE to SBCL on http://shootout.alioth.debian.org/. The 
only test case where Erlang wins is a cheap concurrency. Do you think it 
is a coincidence? Take a look at "reverse-complement" for example. What 
is so special in SBCL Lisp that it is 50 times faster and consume 12 
times less memory? I do not know the answer, but I guess it is because 
it can do much more when analyzing the bytecode.

I do not want to say that SBCL Lisp is superior to Erlang. No way! I 
just want to bring your attention to the fact, that there are approaches 
in implementation of VM which can be considered valuable for 
experimenting with.

The bottom line is:

I believe there are some ways which can be probed to improve Erlang 
performance. I do not say that they surely will give you a boost. They 
are just some things that can be tried.
And two more points:
- I do not see any easy way to try it except for deviating from the 
standard and creating a different implementation of VM and some basic 
libraries.
- Perhaps everybody is happy about the speed, so let's just forget about it.

Regards,
Denys Rtveliashvili