[erlang-questions] documentation of data structures

Tue Dec 11 11:35:23 CET 2007

Matej Kosik wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Lev Walkin wrote:
>> Matej Kosik wrote:
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> Andras Georgy Bekes wrote:
>>>
>>> (snip)
>>>
>>>> -queue: The doc doesn't say anything about the implementation, only:
>>>> "implements FIFO queues in an efficient manner".
>>>> It states that "all operations has an amortised O(1) running time",
>>>> except for some, that are "probably O(n)". This is rather vague.
>>>>
>>> Check the implementation. You will reveal that the operations for
>>> adding/removing elements from/to
>>> the head/tail can have O(1) complexity in some cases but in some cases
>>> they will have O(n) complexity.
>>>
>>> It would be particularly attractive, if someone implemented `queue'
>>> (with the same signature and
>>> semantics of operations) where those operations would have always O(1)
>>> complexity. That is, as I
>>> believe, impossible. Of course, queues can be implemented more
>>> efficiently, but in a different
>>> (non-functional-like) way.
>>
>> Okasaki shows a way to have a queue with O(1) worst-case complexity.
>> However the constant factor is rather large. The queue ADT has O(1)
>> amortized complexity. In addition to that, it has a rather small
>> constant factor, so it'll be faster overall.
>>
> 
> In non-functional programming languages (also in Erlang) similar data-structure can be implemented
> with usual O(1) complexity without ``rather large'' constant factor. A different way of doing
> similar thing but it has different properties.
> - - it is more efficient
> - - it is outside functional programming paradigm
>   (no referential transparency, side effects etc.)
> Such implementation can be done in Erlang too.
> 
> I am not sure whether I understand why the above approach was not taken in Erlang. If mutable queues
> would be local to some process, no problems would arise. However, if one process wanted to pass a
> queue to another process, then if they tried to use them concurrently, interference could occur.
> With the original implementation, such interference is excluded. The penalty is that
> - - it is not the most efficient possible implementation
> - - equational reasoning cannot be applied
>   (other, more complicated, proof-techniques are necessary)
> A typical tradeoff.
> 
> However, if a queue would not be shared among processes than its mutable version would OK. Whouldn't it?

There's also a problem of persistence, even within a single process. A
queue might have different "logical histories" within a single process
which would imply quite a lot of copying if implemented as a mutable
data structure, or prohibit versioning at all. Think of versioned ets.
The present queue implementation does not deal with that as well,
though.

However, the problem with mutable/BIF approach is that it must be
implemented as a "hidden" data structure, akin to ets, with lots of
copying to and from the "imperative" data storage. This might
actually costs a larger CPU overhead than just using the present
erlang native O(1) amortized queues, especially if queues are to
store large deep tuples. As it is right now, the cost of extracting
a tuple from the queue is an [amortized] constant not dependent
on the size of the item in the queue. Accounting for that copying,
there's always a data set which is going to be faster with the
present implementation of queues.

Given that, you never know what is faster on your data until you try.
At least present implementation of queues is correct.

-- 
Lev Walkin
vlm@REDACTED