are Mnesia tables immutable?

Wed Jun 21 08:48:14 CEST 2006

"Yariv Sadan" <yarivvv@REDACTED> wrote:
	I looked at the lines module, and it does seem pretty close to what I
	had in mind. I was thinking more along the lines of a generic string
	container that allows efficient (less than O(n)) modification and
	random access of strings, which would work by breaking the string up
	into a tree of some sorts. Lines, if I understand correctly, is
	similar, but it requires the user to do the chunking.

There is a fundamental fact about strings that applies to every
programming language:

    STRINGS ARE WRONG.

Strings are a good data type for text that you are NOT going to manipulate.
If you have to manipulate text, it's usually a good idea to convert it to
something else as quickly as you can, such as an abstract syntax tree.
This will be orders of magnitude cheaper to process, even in C.

	Are Erlang's drawbacks with regards to efficient string manipulation
	shared among all functional languages?

What drawbacks?  Who said string manipulation in Erlang wasn't efficient?
My personal favourite observation here concerns Xerox Quintus Prolog.

The Xerox D-machines (1108 Dandelion, 1109 Dandetiger, 1185 1186 -- one
of those was Daybreak but I don't remember which) were available running
Interlisp-D (and eventually Xerox Common Lisp on top of that).  These
were microcoded machines.  Interlisp had always had strings; Interlisp-D
seamlessly supported any mix of 8-bit and 16-bit strings.  There was
microcode to make string operations fast.

Xerox Quintus Prolog used the Quintus Prolog compiler and libraries, but
the WAM emulator was mostly written in microcode (instead of macro
assembler).  Interlisp used cdr-coding, so Lisp list cells were 4 bytes.
The WAM doesn't use cdr-coding, so our list cells were 8 bytes.

So, here's the contest.
The incumbent has packed byte strings with O(1) access time to individual
elements, including O(1) in-place update -- except if you replace an
8-bit character by a 16-bit one.  The incumbent has special microcode
support for some basic string operations.

The the other corner, the upstate takes EIGHT TIMES as much space per
character, has O(N) access time to the Nth element, has no in-place
update, and while it has microcode support for cons, car, and cdr,
has *no* microcode support for any string operations, all string calculations
being done in macrocode.

Clear victory for the imcumbent, right?

Wrong.  Prolog beat the pants off Interlisp, same hardware, same microcode
expert having written the microcode for both.

The reason?  Simple.  Prolog made it very cheap to *build* "strings";
list differences (as used in Definite Clause Grammars) made concatenation
O(1) instead of O(N).  

It's the same with Erlang.  *Building* strings using [This|That] instead
of This ++ That is an O(1) operation no matter how big This is.  Yes, you
can beat C that way -- unless the C programmer knows that strings are wrong
too.

	I was planning on writing a Google killer in Erlang, and this
	new knowledge makes me rethink this strategy :)

Google actually have a few programming language clues; they have no
hangups about using or even inventing special-purpose programming languages.

Especially these days when a single "character" from the user's point of
view may be several Unicode characters and each of those characters might
be mapped to several bytes, random access into strings is of very little
actual interest.  It gets worse:  if you change a Unicode string to lower
case or to upper case or to title case, the result may and quite often WILL
be a different length from the original, so in-place case shifting is a
thing of the past, and that's one of the few things that in-place update
was ever good for.  So the things that Erlang _can't_ do well with strings
are things that aren't going to work _anyway_ in a Unicode world, while the
things that Erlang _can_ to well (lists, trees, recursive functions, leex,
yecc) are still very useful.