Performance of length and spawn

Fri Jan 22 15:49:11 CET 1999

Craig Dickson wrote:
> 
> A couple of questions:
> 
> Is the performance of length/1 O(1) or O(n)? (I.e. is a count of the number
> of items in the list stored, or is the length determined by walking the list
> every time?)

O(n).

A colleague of mine performed some measurements a while back.
This is what he found on a particular version of the runtime system:

CPUtime length(List)
22      1
25      3
33      5
36      7

> 
> When spawning a new process, is it necessary to duplicate in memory every
> argument passed to it?

I would think yes, but the operation is fast since it performs some kind
of block copy.

> The specific case is that I'm writing a server that spawns new processes to
> handle most of the requests sent to it, in the hopes of maximizing the
> server's availability to accept requests. If a request basically amounts to
> "How long is a certain list?", then I can see a few possibilities:
> 
> (1) If length/1 is O(n) and lists don't have to be copied for spawn/3: the
> list might be quite long, so perhaps it's worthwhile to spawn a process to
> get the length and send it back to the caller.
> 
> (2) If length/1 is O(1): it probably is fast enough that it isn't worth the
> trouble to use spawn/3 even if copying would not be necessary.

A spawn operation without copying is fast, but spawn does copy the
arguments. As it turns out, it's cheaper to find out the length of the
list, than to spawn a process to do it.

-module(test).
% to find out how much it costs to spawn a thread 

-export([incr/1]).
-export([incr1/2]).

incr(A) ->
    Pid = spawn(?MODULE,incr1,[self(),A]),
    receive
	{Pid,B} -> B
    end.

incr1(Pid,A) -> 
    Pid ! {self(),A+1}.

Eshell V4.7.3.3  (abort with ^G)
1> c(test).
{ok,test}
2> timer:tc(test,incr,[17]).
{60,18}
3> timer:tc(test,incr,[17]).
{42,18}
4> timer:tc(test,incr,[17]).
{44,18}
5> timer:tc(test,incr,[17]).
{42,18}

The time is given in microseconds. Thus, it takes about 45 us on my
machine to spawn a process and wait for its reply (I know, it's not
exactly what you were talking about.)

> (3) If arguments have to be copied for spawn/3: since copying a list
> probably takes longer than counting its elements, spawn is probably taking
> more of the server's time than just calling length/1 directly.

8> L = lists:seq(1,10000).
[...]
9> timer:tc(erlang,spawn,[erlang,length,[L]]).
{2720,<0.47.0>}
10> timer:tc(erlang,spawn,[erlang,length,[L]]).
{2532,<0.49.0>}
11> timer:tc(erlang,spawn,[erlang,length,[L]]).
{2600,<0.51.0>}
12> timer:tc(erlang,spawn,[erlang,length,[L]]).
{2471,<0.53.0>}
13> timer:tc(erlang,spawn,[erlang,length,[L]]).
{2634,<0.55.0>}
14> timer:tc(erlang,length,[L]).
{677,10000}
15> timer:tc(erlang,length,[L]).
{571,10000}
16> timer:tc(erlang,length,[L]).
{639,10000}

As you can see, it's much cheaper to call length(L) (ca 600 us) than to
spawn a process which handles length(L) (2.5 ms), when operating on a
list of 10,000 integers.

Here's a wild thought:

If your input comes from outside Erlang, you may receive it as a binary
(if it's text data). It is very cheap (O(1)) to find out the size of a
binary. It is also cheap to pass it to another process, since the data
is not copied.

19> L = lists:duplicate(10000,79).
[...]
20> timer:tc(erlang,list_to_binary,[L]).
{1714,#Bin}
21> timer:tc(erlang,list_to_binary,[L]).
{1663,#Bin}
22> timer:tc(erlang,list_to_binary,[L]).
{1718,#Bin}
23> timer:tc(erlang,list_to_binary,[L]).
{1671,#Bin}

24> B = element(2,v(23)).
#Bin
25> timer:tc(erlang,size,[B]).
{24,10000}
26> timer:tc(erlang,size,[B]).
{21,10000}
27> timer:tc(erlang,size,[B]).
{21,10000}
28> timer:tc(erlang,size,[B]).
{21,10000}

29> timer:tc(erlang,spawn,[erlang,size,[B]]).
{43,<0.72.0>}
30> timer:tc(erlang,spawn,[erlang,size,[B]]).
{40,<0.74.0>}
31> timer:tc(erlang,spawn,[erlang,size,[B]]).
{38,<0.76.0>}
32> timer:tc(erlang,spawn,[erlang,size,[B]]).
{40,<0.78.0>}

That is, if you don't have to convert it to a binary, finding out the
number of bytes is fast.

> 
> Another question: Is there any intention to add features to Erlang's module
> system, such as the ability to make certain modules private to others? In
> practice, I suppose just not documenting certain modules (or documenting
> them as not for public use) is enough to keep people out of them, but I'd
> like for the language itself to understand that some modules are for
> internal use only within a definable group of modules. I seem to recall that
> Java's "package" concept supplies such a capability (then again, I haven't
> done any Java in a couple of years).

There is work being done. "Safe Erlang" comes to mind.
http://www.erlang.se/onlinenews/archive/Erlang-OTP-news/sep98/a2.shtml

/Uffe
-- 
Ulf Wiger, Chief Designer AXD 301     <ulf.wiger@REDACTED>
Ericsson Telecom AB                          tfn: +46  8 719 81 95
Varuvägen 9, Älvsjö                          mob: +46 70 519 81 95
S-126 25 Stockholm, Sweden                   fax: +46  8 719 43 44