[erlang-questions] documentation of data structures

Tue Dec 11 13:35:38 CET 2007

Matej Kosik wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Lev Walkin wrote:
>> Matej Kosik wrote:
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> Lev Walkin wrote:
>>>> Matej Kosik wrote:
>>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>>> Hash: SHA1
>>>>>
>>>>> Andras Georgy Bekes wrote:
>>>>>
>>>>> (snip)
>>>>>
>>>>>> -queue: The doc doesn't say anything about the implementation, only:
>>>>>> "implements FIFO queues in an efficient manner".
>>>>>> It states that "all operations has an amortised O(1) running time",
>>>>>> except for some, that are "probably O(n)". This is rather vague.
>>>>>>
>>>>> Check the implementation. You will reveal that the operations for
>>>>> adding/removing elements from/to
>>>>> the head/tail can have O(1) complexity in some cases but in some cases
>>>>> they will have O(n) complexity.
>>>>>
>>>>> It would be particularly attractive, if someone implemented `queue'
>>>>> (with the same signature and
>>>>> semantics of operations) where those operations would have always O(1)
>>>>> complexity. That is, as I
>>>>> believe, impossible. Of course, queues can be implemented more
>>>>> efficiently, but in a different
>>>>> (non-functional-like) way.
>>>> Okasaki shows a way to have a queue with O(1) worst-case complexity.
>>>> However the constant factor is rather large. The queue ADT has O(1)
>>>> amortized complexity. In addition to that, it has a rather small
>>>> constant factor, so it'll be faster overall.
>>>>
>>> In non-functional programming languages (also in Erlang) similar
>>> data-structure can be implemented
>>> with usual O(1) complexity without ``rather large'' constant factor. A
>>> different way of doing
>>> similar thing but it has different properties.
>>> - - it is more efficient
>>> - - it is outside functional programming paradigm
>>>   (no referential transparency, side effects etc.)
>>> Such implementation can be done in Erlang too.
>>>
>>> I am not sure whether I understand why the above approach was not
>>> taken in Erlang. If mutable queues
>>> would be local to some process, no problems would arise. However, if
>>> one process wanted to pass a
>>> queue to another process, then if they tried to use them concurrently,
>>> interference could occur.
>>> With the original implementation, such interference is excluded. The
>>> penalty is that
>>> - - it is not the most efficient possible implementation
>>> - - equational reasoning cannot be applied
>>>   (other, more complicated, proof-techniques are necessary)
>>> A typical tradeoff.
>>>
>>> However, if a queue would not be shared among processes than its
>>> mutable version would OK. Whouldn't it?
>>
>> There's also a problem of persistence, even within a single process. A
>> queue might have different "logical histories"
> 
> I am not sure what you mean and thus I cannot assess how useful such thing is.
> 
>> within a single process
>> which would imply quite a lot of copying if implemented as a mutable
>> data structure, or prohibit versioning at all. Think of versioned ets.
>> The present queue implementation does not deal with that as well,
>> though.
>>
>>
>> However, the problem with mutable/BIF approach
> 
> What do you mean by BIF approach?
> 
> Mutable queues can be implemented via already existing Erlang constructs (spawn, send, receive etc).
>  No new primitive data-types or new primitive operations are needed to implement such queues whose
> operations for adding elements to head and tail have always O(1) complexity.
> 
> The first step would be implementing a new data type: `cell' with operations such as
> - - cell:empty/0 (create an empty cell)
> - - cell:make/1 (create a cell with a given contents)
> - - cell:get/1 (non-destructively get the contents of a cell. Block (or throw an error or some
> predefined value if it is empty)
> - - cell:put/2 (destructively overwrite a cell with a given value)
> Basically, cell can be represented as a process that responds to proper messages. There can be
> wrapper procedures
> - - cell:/isEmpty/1 returns true if a given cell is empty
> 
> If you have cells, you can implement `queue item' as a triple composed from:
> - - a cell containing a reference to the previous `queue item'
> - - a cell containing a reference to the value of the current `queue item'
> - - a cell containing a reference to the next `queue item'.
> 
> If you have that, you can represent the whole queue as a couple composed from:
> - - a cell containing a reference to the head
> - - a cell containing a reference to the tail
> 
> I did not implemented it in Erlang (because in Erlang I did not needed queues) but in a different
> language that had those immutable queues I droped them and reimplemented the queues as described
> above. I think this is reasonable approach if one do not need share queues among processes.
> 
> If Erlang is implemented properly, this would be efficient despite the fact that there are many
> processes. Real measurements would either support or reject such hypothesis.

This design has the same drawback as ets: lots of copying [between
processes]. Consequently, a part of my email still applies:

 >> Accounting for that copying,
>> there's always a data set which is going to be faster with the
>> present implementation of queues.
>>
>> Given that, you never know what is faster on your data until you try.
>> At least present implementation of queues is correct.
>>
> 
> Regards
> - --
> Matej Kosik
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.6 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
> 
> iD8DBQFHXnkYL+CaXfJI/hgRAsemAJ91squuD5cHBGrFcu07fl521Xu4lgCgy77g
> W6U8KnjrHU723tqFBx4YXkY=
> =ftFi
> -----END PGP SIGNATURE-----