[erlang-questions] Erlang Cost Model
Fred Hebert
mononcqc@REDACTED
Thu Sep 17 17:56:47 CEST 2015
On 09/17, Eric des Courtis wrote:
>Regardless if concurrency and parallelism plays a role *I still need to
>know if an operation in a sequential process I am working on is O(1) or
>O(n)* or am I missing something?
These values are way more worthwhile as a property of libraries than
language, since they're far more impacted by the choice of algorithm
taken there.
Let's look at a few modules:
1. Queue
"all operations has an amortized O(1) running time, except len/1,
join/2, split/2, filter/2 and member/2 that have O(n). To minimize the
size of a queue minimizing the amount of garbage built by queue
operations, the queues do not contain explicit length information, and
that is why len/1 is O(n)."
2. Maps
undefined in the module doc, but from the EEP
(http://www.erlang.org/eeps/eep-0043.html):
"has at most O(log N) time complexity in insert and lookup operations,
where N is the number of key-value associations."
3. Dict
undefined in docs. I know that it is a very flat tree (bunch of buckets)
acting as a hash map, so O(log N)
4. Array
undefined in docs. Also a very flat tree. O(Log N), where N is the size
of the array, and if I recall, the tree has a branching factor of 10 to
compromise between the speed of lookups and garbage generated when
modifying it. It's described in the source only, sadly.
5. Gb_trees
Doc compares them to AVL trees, without storage overhead and with better
performance. O(log N) it is (those are the worst cases for AVL trees)
6. Sets
They're using the same mechanism as dicts, so O(Log N)
7. Gb_sets
Same mechanism as gb_trees, so O(log N)
8. ETS
Doc says: "These provide the ability to store very large quantities of
data in an Erlang runtime system, and to have constant access time to
the data. (In the case of ordered_set, see below, access time is
proportional to the logarithm of the number of objects stored)."
9. Orddict
Doc says " An orddict is a representation of a dictionary, where a list
of pairs is used to store the keys and values. The list is ordered after
the keys."
lists are O(N) traversal and insertion, so let's consider this the best
case.
10. ordsets
implemented over orddict, O(N) too.
11. sofs
despite having the most cryptic documentation, it does mention: "The
execution time of the functions of this module is dominated by the time
it takes to sort lists. When no sorting is needed, the execution time is
in the worst case proportional to the sum of the sizes of the input
arguments and the returned value. A few functions execute in constant
time: from_external, is_empty_set, is_set, is_sofs_set, to_external,
type."
O(N log N) for most (optimal sort time, where N = sum(Input)), with a
list of O(1) functions
12. Digraph
Unspecified, no idea
13. lists:key* functions
using lists, so O(N), but they're fast given they're implemented in C
14. Proplists
Using lists, so O(N).
Now with these modules somewhat covered (there's a lot of subtleties in
the APIs making some better than others depending on the access
patterns), it's interesting to look at basic operations we may do a lot,
for example:
1. Message receiving (non-selective)
O(1), though accessing the mailbox when it's very large can trigger GCs
and mess up expectations a bit.
2. Message receiving (selective)
O(N) selection where N is the mailbox size. An optimization existsw for
common access patterns where a reference (with make_ref()) is created in
the scope of the receive and used in all clauses, which truncates the
existing mailbox, therefore reducing N.
3. Sorting (lists:sort)
O(N log N), performing a fancypants merge sort
So yeah, for a lot of data structures, the information is already there.
For many, it's also still missing.
Regards,
Fred.
More information about the erlang-questions
mailing list