are Mnesia tables immutable?
Wed Jun 21 08:45:16 CEST 2006
You can consider to use binaries for holding strings. They are
designed to hold 8-bit (or 16-bit (or any-bit-divisible by 8)) data,
random access. Care must be taken when creating them, though;
not to build them as lists - concatenating binaries builds a
new one and the old will become garbage. Lots of copying and
lots of garbage collect.
(Yariv Sadan) writes:
> > Most list operations are not particulary costly.
> > * traversing list e.g. "hd(List)" or "[H | R] = List" is a O(1)
> > operation.
> > * building lists by adding to the front e.g. "NewList = [H | OldList]"
> > is a O(1) operation.
> > * reversing a list is a O(N) operation.
> > * the various lists:map/foldl/foreach operations and list comprehension
> > work over lists, in O(N) time.
> > But you should carefull when using "NewList = List1 ++ List2" (and
> > other list append functions) especialy in loops where you build a lot of
> > NewList entries, as it is a O(N) operation (N = length of List1).
> > It is also generaly a bad idea to use nth(List, Nth) to do random list
> > access, this is a O(N) operation as each nth(...) call traverses the
> > list until it gets to the Nth list position.
> > Tuples/Records (they are essentialy the same thing) have slightly
> > different properties:
> > * access is O(1)
> > * updates are O(N), where N is the size of the tuple - but only a array
> > of pointers (pointing to other erlang terms) is created when a new tuple
> > is created, so if the tuple isn't very large (< 100 elements (?)) or the
> > tuple isn't used often - there should be little performance cost.
> > This means that tuples/records work well in the role of C structs or
> > Java classes, but are a bad choice for large arrays that need random
> > access updates (sequential data processing is best done with lists)
> > If you needs random write access your basicly stuck with trees, ets
> > tables or the even slower mnesia tables (slower due to their database
> > transaction support).
> > Note: that lists (sequential list traversal) and tuples/records are the
> > fastest types for read operations, while adding elements to the front of
> > a list - "NewList = [H | OldList]" is the fastest way to build a new
> > data structure.
> > >
> > > It did occur to me that it would be useful to have a API for
> > > efficiently handling large strings. It would have similar function
> > > signatures to regular strings, but the implementation would involve
> > > less copying. Does such an API exist?
> > Erlang is somewhat weak in the area of strings (strings where not a main
> > concern when erlang was begin designed to be used in Ericssons telecom
> > products - switches ... etc):
> > * erlang strings are implemented as lists of (8 bit) integers - while
> > this makes it easy to use lists functions on strings this also means
> > that strings take a lot of space, ~2 words (2 * 32 bit on a 32 bit CPU).
> > * erlang strings and string operations assume that characters are 8 bit,
> > latin-1 encodes characters.
> > There are several ways to handle this if it becomes a performance/size
> > issue:
> > * chars can be stored more compactly - chars can for example be packed
> > into binaries (allowing for a N byte sized char to fit in N bytes of
> > memory).
> > * string data could be preprocessed by other unix tools, better suited
> > for fast text processing, before passing the data to erlang
> > * lines-1.1 at http://erlang.org/user.html may be useful to handle large
> > strings.
> > You may also find stuff here:
> > http://www.erlang-projects.org/
> > http://planeterlang.org/
> Thanks, Hokan. This is very useful. I did initially make the mistake
> of abusing the ++ operator in iterative functions because I wanted to
> end up with a single string, not a nested structure of lists. However,
> I saw in the Erlang efficiency guide that Erlang knows how to
> automatically serialize nested lists as a single character stream,
> which made me realize that appending new lists to the beginning of an
> accumulator is the right thing to do.
> I looked at the lines module, and it does seem pretty close to what I
> had in mind. I was thinking more along the lines of a generic string
> container that allows efficient (less than O(n)) modification and
> random access of strings, which would work by breaking the string up
> into a tree of some sorts. Lines, if I understand correctly, is
> similar, but it requires the user to do the chunking.
> Are Erlang's drawbacks with regards to efficient string manipulation
> shared among all functional languages? I was planning on writing a
> Google killer in Erlang, and this new knowledge makes me rethink this
> strategy :)
/ Raimo Niskanen, Erlang/OTP, Ericsson AB
More information about the erlang-questions