are Mnesia tables immutable?

Tue Jun 20 23:45:08 CEST 2006

Yariv Sadan wrote:

> Yes, I should have realized that... I'm still new to Erlang and
> functional programming so the concept of immutable data structures
> still needs to settle fully in my brain. I have been writing some
> parsing code and have been quite paranoid about making horribly
> inefficient use of lists.

Most list operations are not particulary costly.

* traversing list e.g. "hd(List)" or  "[H | R]  =  List" is a O(1) 
operation.
* building lists by adding to the front e.g. "NewList = [H | OldList]" 
is a O(1) operation.
* reversing a list is a O(N) operation.
* the various lists:map/foldl/foreach operations and list comprehension 
work over lists, in O(N) time.

But you should carefull when using "NewList = List1 ++ List2"  (and 
other list append functions) especialy in loops where you build a lot of 
NewList entries, as it is a O(N) operation (N = length of List1).

It is also generaly a bad idea to use nth(List, Nth) to do random list 
access, this is a O(N) operation as each nth(...) call traverses the 
list until it gets to the Nth list position.

Tuples/Records (they are essentialy the same thing) have slightly 
different properties:

* access is O(1)
* updates are O(N), where N is the size of the tuple - but only a array 
of pointers (pointing to other erlang terms) is created when a new tuple 
is created, so if the tuple isn't very large (< 100 elements (?)) or the 
tuple isn't used often - there should be little performance cost.
This means that tuples/records work well in the role of C structs or 
Java classes, but are a bad choice for large arrays that need random 
access updates (sequential data processing is best done with lists)

If you needs random write access your basicly stuck with trees, ets 
tables or the even slower mnesia tables (slower due to their database 
transaction support).

Note: that lists (sequential list traversal) and tuples/records are the 
fastest types for read operations, while adding elements to the front of 
a list - "NewList = [H | OldList]" is the fastest way to build a new 
data structure.

>
> It did occur to me that it would be useful to have a API for
> efficiently handling large strings. It would have similar function
> signatures to regular strings, but the implementation would involve
> less copying. Does such an API exist?

Erlang is somewhat weak in the area of strings (strings where not a main 
concern when erlang was begin designed to be used in Ericssons telecom 
products - switches ... etc):
* erlang strings are implemented as lists of (8 bit) integers - while 
this makes it easy to use lists functions on strings this also means 
that strings take a lot of space, ~2 words (2 * 32 bit on a 32 bit CPU).
* erlang strings and string operations assume that characters are 8 bit, 
latin-1 encodes characters.

There are several ways to handle this if it becomes a performance/size 
issue:
* chars can be stored more compactly - chars can for example be packed 
into binaries (allowing for a N byte sized char to fit in N bytes of 
memory).
* string data could be preprocessed by other unix tools, better suited 
for fast text processing, before passing the data to erlang
* lines-1.1 at http://erlang.org/user.html may be useful to handle large 
strings.

You may also find stuff here:
http://www.erlang-projects.org/
http://planeterlang.org/

>
> Regards,
> Yariv
>
> On 6/19/06, Håkan Stenholm <hokan.stenholm@REDACTED> wrote:
>
>> Yariv Sadan wrote:
>>
>> > Hi,
>> >
>> > I have a rather basic question: does Erlang's lack of mutable data
>> > structures mean that every time you modify a Mnesia (ets) table, the
>> > whole table is copied? This sounds very expensive... am I missing
>> > something?
>> >
>> Most data structures are non-mutable but mnesia, ets tables and process
>> dictionaries _are_ mutable, data is copied in and out of mnesia/ets when
>> accessing - it helps to think of them as erlang processes that one can
>> send (write) and recieve (read) messages to/from.
>>
>> Also note that a non-mutable datastructure doesn't necessarily need to
>> copy everything, it's often sufficient, to build a new datastructure
>> using mostly old unchanged parts. There are for example tree based data
>> structures like gb_trees, that have O(log N)  read and write  costs -
>> only a single tree branch is traveresed or updated, when accessing a
>> specific tree element.
>>
>> > Thanks
>> > Yariv
>> >
>>
>>
>