Strings (was: Re: are Mnesia tables immutable?)

Richard A. O'Keefe ok@REDACTED
Fri Jun 30 07:15:12 CEST 2006


ke han <ke.han@REDACTED> wrote:
	It seems from the above you appreciate that it's grossly inefficient  
	to use 4 or 8 bytes (64-bit erlang) per character as a method of  
	representing strings in erlang.

Not quite.  I accept that it is *space* inefficient (using 8 bytes for
a 3-byte character is a little excessive), but I do not accept that it
is *grossly* inefficient.  On the contrary, I argued and gave empirical
evidence that Erlang strings are (relative to other things in Erlang)
quite *time* efficient.

The big thing about space is that *sometimes* we care and *sometimes*
we don't.  I have 1.6GB of memory on my desktop machine.  This is useful
to me because I am doing Information Retrieval experiments with hundreds
of MB of text.  But for looking at a _single_ document...  Most of the
individual documents I'm looking at are under 8kB, and if I'm processing
them one at a time, I can't imagine caring about it taking 64kB instead
of 8kB.

	However, from other messages you post in this thread, your
	proposals seem to still be to use lists of integers (one cell per
	character).

Yes.  It's simple.  It's time-efficient.  It works.

	Are you talking about two different things?  One memory efficient  
	form for when the string doesn't need to be accessed at a character  
	level and the list of integers form for when they do?
	
Exactly.

More precisely, I am arguing that "strings" get used in different ways
and we should not expect a single representation to be good at all of
them.  For some applications it's transmission time over the network,
or space on a CD-ROM, that matters.  For some applications it's
searching in large amounts of text.  For some applications it's the
ability to do lots of edit operations.  For some applications it's the
ability to compactly store multiple versions.  Different applications
come with _different_ tradeoffs, and an Erlang programmer should never
be afraid to invent a task-specific "string" data structure.
	



More information about the erlang-questions mailing list