[erlang-questions] documentation of data structures

Tue Dec 11 12:48:41 CET 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Lev Walkin wrote:
> Matej Kosik wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Lev Walkin wrote:
>>> Matej Kosik wrote:
>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>> Hash: SHA1
>>>>
>>>> Andras Georgy Bekes wrote:
>>>>
>>>> (snip)
>>>>
>>>>> -queue: The doc doesn't say anything about the implementation, only:
>>>>> "implements FIFO queues in an efficient manner".
>>>>> It states that "all operations has an amortised O(1) running time",
>>>>> except for some, that are "probably O(n)". This is rather vague.
>>>>>
>>>> Check the implementation. You will reveal that the operations for
>>>> adding/removing elements from/to
>>>> the head/tail can have O(1) complexity in some cases but in some cases
>>>> they will have O(n) complexity.
>>>>
>>>> It would be particularly attractive, if someone implemented `queue'
>>>> (with the same signature and
>>>> semantics of operations) where those operations would have always O(1)
>>>> complexity. That is, as I
>>>> believe, impossible. Of course, queues can be implemented more
>>>> efficiently, but in a different
>>>> (non-functional-like) way.
>>>
>>> Okasaki shows a way to have a queue with O(1) worst-case complexity.
>>> However the constant factor is rather large. The queue ADT has O(1)
>>> amortized complexity. In addition to that, it has a rather small
>>> constant factor, so it'll be faster overall.
>>>
>>
>> In non-functional programming languages (also in Erlang) similar
>> data-structure can be implemented
>> with usual O(1) complexity without ``rather large'' constant factor. A
>> different way of doing
>> similar thing but it has different properties.
>> - - it is more efficient
>> - - it is outside functional programming paradigm
>>   (no referential transparency, side effects etc.)
>> Such implementation can be done in Erlang too.
>>
>> I am not sure whether I understand why the above approach was not
>> taken in Erlang. If mutable queues
>> would be local to some process, no problems would arise. However, if
>> one process wanted to pass a
>> queue to another process, then if they tried to use them concurrently,
>> interference could occur.
>> With the original implementation, such interference is excluded. The
>> penalty is that
>> - - it is not the most efficient possible implementation
>> - - equational reasoning cannot be applied
>>   (other, more complicated, proof-techniques are necessary)
>> A typical tradeoff.
>>
>> However, if a queue would not be shared among processes than its
>> mutable version would OK. Whouldn't it?
> 
> 
> There's also a problem of persistence, even within a single process. A
> queue might have different "logical histories"

I am not sure what you mean and thus I cannot assess how useful such thing is.

> within a single process
> which would imply quite a lot of copying if implemented as a mutable
> data structure, or prohibit versioning at all. Think of versioned ets.
> The present queue implementation does not deal with that as well,
> though.
> 
> 
> However, the problem with mutable/BIF approach

What do you mean by BIF approach?

Mutable queues can be implemented via already existing Erlang constructs (spawn, send, receive etc).
 No new primitive data-types or new primitive operations are needed to implement such queues whose
operations for adding elements to head and tail have always O(1) complexity.

The first step would be implementing a new data type: `cell' with operations such as
- - cell:empty/0 (create an empty cell)
- - cell:make/1 (create a cell with a given contents)
- - cell:get/1 (non-destructively get the contents of a cell. Block (or throw an error or some
predefined value if it is empty)
- - cell:put/2 (destructively overwrite a cell with a given value)
Basically, cell can be represented as a process that responds to proper messages. There can be
wrapper procedures
- - cell:/isEmpty/1 returns true if a given cell is empty

If you have cells, you can implement `queue item' as a triple composed from:
- - a cell containing a reference to the previous `queue item'
- - a cell containing a reference to the value of the current `queue item'
- - a cell containing a reference to the next `queue item'.

If you have that, you can represent the whole queue as a couple composed from:
- - a cell containing a reference to the head
- - a cell containing a reference to the tail

I did not implemented it in Erlang (because in Erlang I did not needed queues) but in a different
language that had those immutable queues I droped them and reimplemented the queues as described
above. I think this is reasonable approach if one do not need share queues among processes.

If Erlang is implemented properly, this would be efficient despite the fact that there are many
processes. Real measurements would either support or reject such hypothesis.

> is that it must be
> implemented as a "hidden" data structure, akin to ets, with lots of
> copying to and from the "imperative" data storage. This might
> actually costs a larger CPU overhead than just using the present
> erlang native O(1) amortized queues, especially if queues are to
> store large deep tuples. As it is right now, the cost of extracting
> a tuple from the queue is an [amortized] constant not dependent
> on the size of the item in the queue. Accounting for that copying,
> there's always a data set which is going to be faster with the
> present implementation of queues.
> 
> Given that, you never know what is faster on your data until you try.
> At least present implementation of queues is correct.
> 

Regards
- --
Matej Kosik
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHXnkYL+CaXfJI/hgRAsemAJ91squuD5cHBGrFcu07fl521Xu4lgCgy77g
W6U8KnjrHU723tqFBx4YXkY=
=ftFi
-----END PGP SIGNATURE-----