Performance of length and spawn

Tue Jan 26 08:53:11 CET 1999

On Fri, 22 Jan 1999, Craig Dickson wrote:

> A couple of questions:
> 
> Is the performance of length/1 O(1) or O(n)? (I.e. is a count of the number
> of items in the list stored, or is the length determined by walking the list
> every time?)
> 
	O(n)

> When spawning a new process, is it necessary to duplicate in memory every
> argument passed to it?

	All arguments are copied.
> 
> The specific case is that I'm writing a server that spawns new processes to
> handle most of the requests sent to it, in the hopes of maximizing the
> server's availability to accept requests. If a request basically amounts to
> "How long is a certain list?", then I can see a few possibilities:
> 
> (1) If length/1 is O(n) and lists don't have to be copied for spawn/3: the
> list might be quite long, so perhaps it's worthwhile to spawn a process to
> get the length and send it back to the caller.
> 
> (2) If length/1 is O(1): it probably is fast enough that it isn't worth the
> trouble to use spawn/3 even if copying would not be necessary.
> 
> (3) If arguments have to be copied for spawn/3: since copying a list
> probably takes longer than counting its elements, spawn is probably taking
> more of the server's time than just calling length/1 directly.
> 

	If the lists are short then I wouldn't bother with any optimisations.
I can't define short - but I wouldn't be looking at any optimisations if
I was thinking in terms of say less than 5,000 elements.

    If  the lists  get  big then  I'd use  a  data  structure called a
binary.  Binaries are  untyped chunks of memory.   They are not copied
between  processes on the same  node (they are reference counted). For
messages:

	The bif size(Binary) is O(1)

    If my data  structure were say  100,000 element then I'd certainly
use binaries. Somewhere between 5,000  and 100,000 elements I'd change
representation from lists to binaries.

    (Hopefully I'd have  written the program in such   a way that  the
choice of representation would now impact the code much).

    I  don't know what  your problem is but  usually a major source of
inefficiency is getting data from  the outside world into Erlang, here
you should  almost always  use binaries -  the  break even point comes
with very short lists (i.e. even with a few hundred elements you get a
significant gain using binaries instead of lists).

	<aside>
    There is  only  one way to  write  code. Make it as  beautiful as
possible, then measure the performance.   If not ok optimise. I   very
rarely optimise code [It has happened].
	</aside>	

>  Another question: Is there    any intention to add
> features to Erlang's  module system,  such as  the ability  to  make
> certain modules private  to others? In  practice, I suppose just not
> documenting  certain modules (or documenting them  as not for public
> use) is  enough to keep people  out  of them,  but I'd like  for the
> language itself to understand that some modules are for internal use
> only within a  definable  group of modules. 

	Not yet planned. Sorry.

    I usually make one main  module with all the  public exports and a
lot of sub modules. This is the convention that is used in the system.

    Example: The main  module snmp.erl offers  the public interface to
the snmp package. All internal modules are given names like snmp_XXX.erl

    Hope that helps	

	/Joe

--
Joe Armstrong  Computer Science Laboratory  +46 8 719 9452
AT2/ETX/DN/SU  Ericsson Telecom SE-126 25 Stockholm Sweden 
joe@REDACTED    http://www.ericsson.se/cslab/~joe