Optimized lists

Mon Mar 14 03:49:54 CET 2005

Zoltan Peter Toth <zoltan.peter.toth@REDACTED> wrote:
	Idea:
	Store in each list element the length of the remaining part of the list.

Better idea:  if you have an algorithm where you need a sequence data type
where determining the size is O(1), USE THE EXISTING DATA TYPE THAT'S
SUITED TO THE JOB, namely tuples.

	This way, length() would become an O(1) operation, 

So what is the length of [1|2]?
What are you going to store as its length?

	[this may be a benefit if it's used frequently enough]

There are three bad consequences of the proposed change.
(1) The Warren Abstract Machine stores list pairs in 2 words.  I don't
    know what BEAM uses.  Probably 3 words.  The change would thus
    add 33% to the size of every list.  This leads to
    - less effective use of cache.  This has a bigger effect than
      people commonly realise.  Here are some measurements from a 1GHz
      G4 PowerPC running MacOS 10.3

	 Level    Sequential access Random access 
	 L1 cache 1.7 GB/sec        2.1 GB/sec    
	 L2 cache 1.0 GB/sec        0.7 GB/sec    
	 main     0.4 GB/sec        0.05 GB/sec   

      And here are some measurements from a 500MHz SunBlade100 (UltraSparc II)
      running Solaris 2.10:

	 Level    Sequential access Random access 
	 L1 cache 2.8 GB/sec        1.6 GB/sec    
	 L2 cache 1.4 GB/sec        0.8 GB/sec    
	 main     0.3 GB/sec        0.04 GB/sec   

      Adding an extra word to every list cell could be enough to push
      a program from L1 cache to L2 cache, cutting speed by a factor of 2,
      or from L2 cache to main memory (a factor of 10 or more).
      (The same 40:1 ratio between main memory speed and L1 cache explains
      why compiling everything to native code doesn't always pay off.)

(2) Making a cons cell would no longer be a simple matter of bumping a
    register (the required check for overflow can be amortised over
    several allocations; I don't know whether it is or not) and filling
    in a couple of words.  The length field has to be computed and stored
    as well.  So the time to make *every* list goes up by 33% as well,
    maybe more.

(3) There are plenty of list-processing functions which, to my understanding,
    don't involve tail recursion at the moment, but with a slightly different
    abstract machine *could* be tail recursions.  Here's one:

	append([X|Xs], L) -> [X | append(Xs, L)];
	append([], L) -> L.

    Making this and similar functions tail recursive would be very good
    news for memory use (using O(1) stack space instead of O(n)).  Such code
    is _very_ common, much more common than length/1 is.  Switching to tail
    recursion here would be IMPOSSIBLE if each list cell carried its length.

Bad consequences (1) and (2) mean that length/1 would have to be
vastly commoner than it is now to pay off; bad consequence (3) means that
we'd be locked out of an improvement that WOULD be an improvement.

	but also, the list BIFs could count as variable number
	of reductions (based on length), keeping the real time behaviour.

There's a much simpler way to do that.
(A) Ban length/1 from guards, replacing it by length<relop> comparisons
    with slightly different semantics.
(B) Turn any "BIF" that can traverse an entire list into plain Erlang code.
Then all 'reduction counting', time-slicing, &c, would be accurate without
any special machinery.

By the way, there are data structures with O(1) concatenation and
O(lgN) length.  I think at least one such data structure has been
implemented in Erlang.  The space and time constant factors are
much higher than ordinary Erlang lists, of course.  Erlang lists are
for use when you don't care about length/1 all that much.