[erlang-questions] Strings as Lists

Zvi <>
Thu Feb 14 07:07:52 CET 2008


Richard,

I think you confusing datatype with it's implementation/representation.
My biggest problem with Erlang standard string representation is that in
64bit mode, each character taking 16 bytes. So typical message of 10KB after
converted to XML become 20KB and after parsed by 64bit Erlang VM become
320KB. I talking about server-side code, so I need 320KB-per-client. Using
binary, even if I copy this XML 3 times in memory, I still can handle 5-6
times more clients.

Also, you forget that what was good for LISP Machines is not good for
current machine architectures.
Binary strings can be handled very efficiently using vectorized SIMD code
and they use modern caches much more effectively, than lists.

Deep lists and io-lists are essentially analog of Java's StringBuffer or C++
STL's ostringstream.
io-lists can also be more efficient fo IO, if underlying OS has gather-write
system call.

So in general I want immutable String ADT. The ADT implementation should be
smart enough, to switch to the best representation, according to usage.
Similar to some languages, which implement associative arrays ADT using
various datastructures, i.e. property lists for small arrays and hashtables
for large.
Or I can give hints which implementation of String ADT I want to use (using
for example parametrized modules), i.e.  string(deepList):concat(S1,S2)  or 
string(binarySlice):substr(S,1,3).

BTW: most sophisticated string implementation I know about was in SNOBOL. I
think it was list of unique substrings.

Zvi


Richard A. O'Keefe wrote:
> 
> On 13 Feb 2008, at 5:19 am, tsuraan wrote:
> 
>> Why does erlang internally represent strings as lists?  In every  
>> language I've used other than Java, a string is a sequence of  
>> octets, just like Erlang's binary type.
> 
> Recall the programming proverb:
> 	it is better to have one data type with 100 functions
> 	than 10 data types with 10 functions each.
> 
> It is actually quite commonplace in declarative languages (Prolog,  
> Mercury,
> Haskell, some others) to implement strings as lists of characters  
> because
> that way you get a vast number of functions you can usefully apply to  
> strings.
> Not only that, you can process strings *incrementally* if they are  
> lists.
> 
> I never tire of telling this story:
> 	I was on the team that ported Quintus Prolog from the "UNIX" world
> 	to the Xerox Lisp machines (the 1108, 1109, 1185, and 1186, aka
> 	Dandelion, Dandetiger, Daybreak, and something else I forget).
> 	These machines compiled Interlisp to bytecode which was then
> 	executed by microcode, and gave very respectable performance for
> 	Lisp, in their day.  The existing microcode supported Interlisp
> 	strings, which were byte vectors as you describe.
> 
> 	There wasn't much microcode space left to support Quintus Prolog,
> 	so we had a microcoded WAM with plain old 2 word list cells and
> 	strings represented by 1 list cell per character.
> 
> 	A series of benchmarks I did showed string processing going
> 	FIVE TIMES FASTER in Prolog using lists than in Lisp using byte
> 	vectors, largely because we could represent "the rest of a string"
> 	using NO new allocations whatever.
> 
> The Java string representation is (slice of (array of uint16_t))  
> which means
> that "the rest of a string" costs only O(1) time and space, but O(1)  
> extra
> space isn't O(0) extra space.  I would expect most Javascript  
> implementations
> to use something similar to this.
> 
> The guiding rule is
>   - if you just want to hold onto a string for a while, use a binary
>   - if you want to build or process a string, use a list (possibly in
>     Erlang a deep list).
>   - if you want to represent something that has structure, and you want
>     your program to be aware of that structure, turn it into a  
> structured
>     data value and work with it in that form.
> 
> Basically, most languages get strings embarassingly wrong.  Another  
> story
> I like to tell is how I was able to do in 10 pages of C what a friend of
> mine needed 150 pages of PL/I to accomplish, largely because PL/I *does*
> have a "real" string data type and C doesn't, so I could accomplish what
> I needed very simply, while he had to fight the language every step  
> of the
> way.
> 
> 
> _______________________________________________
> erlang-questions mailing list
> 
> http://www.erlang.org/mailman/listinfo/erlang-questions
> 
> 

-- 
View this message in context: http://www.nabble.com/Strings-as-Lists-tp15436835p15474643.html
Sent from the Erlang Questions mailing list archive at Nabble.com.




More information about the erlang-questions mailing list