are Mnesia tables immutable?

Yariv Sadan yarivvv@REDACTED
Wed Jun 21 14:29:19 CEST 2006


>
> There is a fundamental fact about strings that applies to every
> programming language:
>
>     STRINGS ARE WRONG.
>
> Strings are a good data type for text that you are NOT going to manipulate.
> If you have to manipulate text, it's usually a good idea to convert it to
> something else as quickly as you can, such as an abstract syntax tree.
> This will be orders of magnitude cheaper to process, even in C.

Interesting. Do you think it would be possible to write an API similar
to Lines (or on top of Lines), that would turn any string into a tree
for easy manipulation and pseudo-random access, without understanding
its syntax? The tradeoff would be speed in exchage for space.

>
>         Are Erlang's drawbacks with regards to efficient string manipulation
>         shared among all functional languages?
>
> What drawbacks?  Who said string manipulation in Erlang wasn't efficient?

Well, Hokan said it in this thread :)

"Erlang is somewhat weak in the area of strings (strings where not a main
concern when erlang was begin designed to be used in Ericssons telecom
products - switches ... etc)"

> My personal favourite observation here concerns Xerox Quintus Prolog.
>
> It's the same with Erlang.  *Building* strings using [This|That] instead
> of This ++ That is an O(1) operation no matter how big This is.  Yes, you
> can beat C that way -- unless the C programmer knows that strings are wrong
> too.

Coming from a C/C++ background, I would say that it's pretty easy to
write an expanding buffer in those languages to which you can append
characters at the end for O(1) cost, with an occasional realloc()
call.  There's some wasted space and realloc() isn't cheap by any
means, but it's staightforward and I bet most C/C++ programs use such
a buffer (std:vector and std:string use this technique AFAIK).

>
>         I was planning on writing a Google killer in Erlang, and this
>         new knowledge makes me rethink this strategy :)
>
> Google actually have a few programming language clues; they have no
> hangups about using or even inventing special-purpose programming languages.

To the outside observer, it seems like Google's lightbulb hasn't gone
off yet when it comes to functional languages :)

http://www.google.com/support/jobs/bin/search.py?hl=en&lr=lang_en&type=f&query=erlang&Action.Search=Search+U.S.+Openings
http://www.google.com/support/jobs/bin/search.py?lr=lang_en&type=f&query=haskell&Action.Search=Search+U.S.+Openings
http://www.google.com/support/jobs/bin/search.py?lr=lang_en&type=f&query=ocaml&Action.Search=Search+U.S.+Openings
http://www.google.com/support/jobs/bin/search.py?lr=lang_en&type=f&query=lisp&Action.Search=Search+U.S.+Openings
http://www.google.com/support/jobs/bin/search.py?lr=lang_en&type=f&query=ML&Action.Search=Search+U.S.+Openings

So, either this is an advantage to Google's competitors who *do* use
functional languages -- especially Erlang, the #1 choice for buidling
scalable, concurrent, fault-tolert, distributed systems, which is
Google's bread and butter -- or Google is shy about this fact on its
website :)



>
> Especially these days when a single "character" from the user's point of
> view may be several Unicode characters and each of those characters might
> be mapped to several bytes, random access into strings is of very little
> actual interest.  It gets worse:  if you change a Unicode string to lower
> case or to upper case or to title case, the result may and quite often WILL
> be a different length from the original, so in-place case shifting is a
> thing of the past, and that's one of the few things that in-place update
> was ever good for.  So the things that Erlang _can't_ do well with strings
> are things that aren't going to work _anyway_ in a Unicode world, while the
> things that Erlang _can_ to well (lists, trees, recursive functions, leex,
> yecc) are still very useful.

Yes, unicode does make things different. In Java, all strings are
unicode (like Erlang), but there isn't the extra 32 bit overhead per
characters for building the linked list. Is it always 32, by the way,
even on 64 bit machines? Or do Erlang "strings" double in size on 64
bit architectures?

Best regards,
Yariv



More information about the erlang-questions mailing list