[erlang-questions] documentation of data structures

Tue Dec 11 13:52:07 CET 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Lev Walkin wrote:
> Matej Kosik wrote:
> Lev Walkin wrote:
>>>> Matej Kosik wrote:
>>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>>> Hash: SHA1
>>>>>
>>>>> Lev Walkin wrote:
>>>>>> Matej Kosik wrote:
>>>>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>>>>> Hash: SHA1
>>>>>>>
>>>>>>> Andras Georgy Bekes wrote:
>>>>>>>
>>>>>>> (snip)
>>>>>>>
>>>>>>>> -queue: The doc doesn't say anything about the implementation, only:
>>>>>>>> "implements FIFO queues in an efficient manner".
>>>>>>>> It states that "all operations has an amortised O(1) running time",
>>>>>>>> except for some, that are "probably O(n)". This is rather vague.
>>>>>>>>
>>>>>>> Check the implementation. You will reveal that the operations for
>>>>>>> adding/removing elements from/to
>>>>>>> the head/tail can have O(1) complexity in some cases but in some
>>>>>>> cases
>>>>>>> they will have O(n) complexity.
>>>>>>>
>>>>>>> It would be particularly attractive, if someone implemented `queue'
>>>>>>> (with the same signature and
>>>>>>> semantics of operations) where those operations would have always
>>>>>>> O(1)
>>>>>>> complexity. That is, as I
>>>>>>> believe, impossible. Of course, queues can be implemented more
>>>>>>> efficiently, but in a different
>>>>>>> (non-functional-like) way.
>>>>>> Okasaki shows a way to have a queue with O(1) worst-case complexity.
>>>>>> However the constant factor is rather large. The queue ADT has O(1)
>>>>>> amortized complexity. In addition to that, it has a rather small
>>>>>> constant factor, so it'll be faster overall.
>>>>>>
>>>>> In non-functional programming languages (also in Erlang) similar
>>>>> data-structure can be implemented
>>>>> with usual O(1) complexity without ``rather large'' constant factor. A
>>>>> different way of doing
>>>>> similar thing but it has different properties.
>>>>> - - it is more efficient
>>>>> - - it is outside functional programming paradigm
>>>>>   (no referential transparency, side effects etc.)
>>>>> Such implementation can be done in Erlang too.
>>>>>
>>>>> I am not sure whether I understand why the above approach was not
>>>>> taken in Erlang. If mutable queues
>>>>> would be local to some process, no problems would arise. However, if
>>>>> one process wanted to pass a
>>>>> queue to another process, then if they tried to use them concurrently,
>>>>> interference could occur.
>>>>> With the original implementation, such interference is excluded. The
>>>>> penalty is that
>>>>> - - it is not the most efficient possible implementation
>>>>> - - equational reasoning cannot be applied
>>>>>   (other, more complicated, proof-techniques are necessary)
>>>>> A typical tradeoff.
>>>>>
>>>>> However, if a queue would not be shared among processes than its
>>>>> mutable version would OK. Whouldn't it?
>>>>
>>>> There's also a problem of persistence, even within a single process. A
>>>> queue might have different "logical histories"
> 
> I am not sure what you mean and thus I cannot assess how useful such
> thing is.
> 
>>>> within a single process
>>>> which would imply quite a lot of copying if implemented as a mutable
>>>> data structure, or prohibit versioning at all. Think of versioned ets.
>>>> The present queue implementation does not deal with that as well,
>>>> though.
>>>>
>>>>
>>>> However, the problem with mutable/BIF approach
> 
> What do you mean by BIF approach?
> 
> Mutable queues can be implemented via already existing Erlang
> constructs (spawn, send, receive etc).
>  No new primitive data-types or new primitive operations are needed to
> implement such queues whose
> operations for adding elements to head and tail have always O(1)
> complexity.
> 
> The first step would be implementing a new data type: `cell' with
> operations such as
> - cell:empty/0 (create an empty cell)
> - cell:make/1 (create a cell with a given contents)
> - cell:get/1 (non-destructively get the contents of a cell. Block
> (or throw an error or some
> predefined value if it is empty)
> - cell:put/2 (destructively overwrite a cell with a given value)
> Basically, cell can be represented as a process that responds to
> proper messages. There can be
> wrapper procedures
> - cell:/isEmpty/1 returns true if a given cell is empty
> 
> If you have cells, you can implement `queue item' as a triple composed
> from:
> - a cell containing a reference to the previous `queue item'
> - a cell containing a reference to the value of the current `queue
> item'
> - a cell containing a reference to the next `queue item'.
> 
> If you have that, you can represent the whole queue as a couple
> composed from:
> - a cell containing a reference to the head
> - a cell containing a reference to the tail
> 
> I did not implemented it in Erlang (because in Erlang I did not needed
> queues) but in a different
> language that had those immutable queues I droped them and
> reimplemented the queues as described
> above. I think this is reasonable approach if one do not need share
> queues among processes.
> 
> If Erlang is implemented properly, this would be efficient despite the
> fact that there are many
> processes. Real measurements would either support or reject such
> hypothesis.
> 
>> This design has the same drawback as ets: lots of copying [between
>> processes]. Consequently, a part of my email still applies:

Are you sure? When (in which kind of operation) would such copying (of what) be needed.

Sure, the cost of explicit copying of a queue instance has O(len(N)) complexity. But we were talking
about adding/removing elements at the head/tail of the queue.

> 
>>>> Accounting for that copying,
>>>> there's always a data set which is going to be faster with the
>>>> present implementation of queues.
>>>>
>>>> Given that, you never know what is faster on your data until you try.
>>>> At least present implementation of queues is correct.
>>>>
> 
> Regards

- --
Matej Kosik
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHXof3L+CaXfJI/hgRAqOuAKDTT9xwZ6EQDBYct1OOhvwZCjmeMwCfZikI
Ch28AO9ZJXz5wXcRIX2IuX0=
=vAu7
-----END PGP SIGNATURE-----