Performance of length and spawn
Joe Armstrong
joe@REDACTED
Tue Jan 26 08:53:11 CET 1999
On Fri, 22 Jan 1999, Craig Dickson wrote:
> A couple of questions:
>
> Is the performance of length/1 O(1) or O(n)? (I.e. is a count of the number
> of items in the list stored, or is the length determined by walking the list
> every time?)
>
O(n)
> When spawning a new process, is it necessary to duplicate in memory every
> argument passed to it?
All arguments are copied.
>
> The specific case is that I'm writing a server that spawns new processes to
> handle most of the requests sent to it, in the hopes of maximizing the
> server's availability to accept requests. If a request basically amounts to
> "How long is a certain list?", then I can see a few possibilities:
>
> (1) If length/1 is O(n) and lists don't have to be copied for spawn/3: the
> list might be quite long, so perhaps it's worthwhile to spawn a process to
> get the length and send it back to the caller.
>
> (2) If length/1 is O(1): it probably is fast enough that it isn't worth the
> trouble to use spawn/3 even if copying would not be necessary.
>
> (3) If arguments have to be copied for spawn/3: since copying a list
> probably takes longer than counting its elements, spawn is probably taking
> more of the server's time than just calling length/1 directly.
>
If the lists are short then I wouldn't bother with any optimisations.
I can't define short - but I wouldn't be looking at any optimisations if
I was thinking in terms of say less than 5,000 elements.
If the lists get big then I'd use a data structure called a
binary. Binaries are untyped chunks of memory. They are not copied
between processes on the same node (they are reference counted). For
messages:
The bif size(Binary) is O(1)
If my data structure were say 100,000 element then I'd certainly
use binaries. Somewhere between 5,000 and 100,000 elements I'd change
representation from lists to binaries.
(Hopefully I'd have written the program in such a way that the
choice of representation would now impact the code much).
I don't know what your problem is but usually a major source of
inefficiency is getting data from the outside world into Erlang, here
you should almost always use binaries - the break even point comes
with very short lists (i.e. even with a few hundred elements you get a
significant gain using binaries instead of lists).
<aside>
There is only one way to write code. Make it as beautiful as
possible, then measure the performance. If not ok optimise. I very
rarely optimise code [It has happened].
</aside>
> Another question: Is there any intention to add
> features to Erlang's module system, such as the ability to make
> certain modules private to others? In practice, I suppose just not
> documenting certain modules (or documenting them as not for public
> use) is enough to keep people out of them, but I'd like for the
> language itself to understand that some modules are for internal use
> only within a definable group of modules.
Not yet planned. Sorry.
I usually make one main module with all the public exports and a
lot of sub modules. This is the convention that is used in the system.
Example: The main module snmp.erl offers the public interface to
the snmp package. All internal modules are given names like snmp_XXX.erl
Hope that helps
/Joe
--
Joe Armstrong Computer Science Laboratory +46 8 719 9452
AT2/ETX/DN/SU Ericsson Telecom SE-126 25 Stockholm Sweden
joe@REDACTED http://www.ericsson.se/cslab/~joe
More information about the erlang-questions
mailing list