Strings (was: Re: are Mnesia tables immutable?)

ke han ke.han@REDACTED
Wed Jun 28 16:14:48 CEST 2006


On Jun 28, 2006, at 6:45 PM, Romain Lenglet wrote:

> ke han wrote:
>> On Jun 28, 2006, at 2:47 PM, Romain Lenglet wrote:
>>> Personally, I am voting for (1) representing strings as
>>> lists of Unicode code points, but (2) providing a better
>>> (more flexible, more efficient) external representation, and
>>> most importantly (3) providing a more flexible interface to
>>> the external encoding/decoding primitives, such as
>>> supporting strings as tuples as above.
>>
>> I don't care about the internal representation of string so
>> long as its (a) _significantly_ more memory efficient than one
>> word per character in a list and (b) allows me to pass these
>> non-mutable strings between processes without a mem copy each
>> time.
>>
>> My end game is writing web apps in erlang+yaws+mnesia.
>
> What we were discussing is how to internally represent, and
> externally encode (in the term_to_binary/1 sense), strings in a
> form suitable for building or modification by programs. You are
> discussing about the need to pass around strings that are
> already 8-bit encoded and that don't need to be modified.
> Different problems. Different representations.

right..I do understand this thread has several facets...I'm adding my  
high level application needs into the mix to ensure they aren't  
forgotten...why, because I haven't the skills to solve this low level  
problem and need guys like you who do understand the internals of  
erlang to take these needs into account in hopes of eventually  
getting something useful ;-)

>
>> The basic result of any yaws page (or any dynamic html server)
>> is to output a sequence of terms into a stream the browser is
>> expecting. This means the following concatenation or list of
>> "strings"  is common in streaming out a page:
>>
>> Header + StaticWebPagePreamble +
>> StaticContentSuchAsLabelsLookedUpByUsersLangPref +
>> HTMLInputControl + ContentForInputControl + ...  +
>> HTMLSelectControl +
>> ContentForSelectControl + StaticWebPageFooter
> [...]
>
> Since you don't seem to need to modify the contents of those
> strings, why don't IO-lists (i.e. a list of binaries) fit your
> need? You should simply pass a list of binaries, where each
> binary contains text is 8-bit encoded in UTF-8 or ISO-8859-1 or
> whatever. Binaries are not copied. Such IO-lists are what is
> used to communicate with linked-in C drivers. IO-lists are the
> most efficient way to transmit large data in an Erlang node.
> Why doesn't that fit your needs?

I understand that a lengthy binary is not copied.  I have seen posts  
on this maillist that short binaries _are_ copied and long ones _are  
not_...but I don't know what length determines when something is  
copied or not.
In the example I gave, my countryManager process is a singleton  
(pardon the oo pattern reference, but thats what it is) that serves  
the entire VM to answer a list of countries.  This is a lengthy list  
of short utf-8 encoded binaries.  So wouldn't the list get copied?   
And won't each short binary in the list get copied as well?  There  
must be a better way.

In order to get around this problem, I would have to destroy MVC  
separations and have my model object (countryManager) return an  
already serialized binary of binaries (or if I'm going to do that I  
may as well have the countryManager go ahead and serialize it to json  
form as well).
This violates lots of sounds application design.  Basic principles of  
encapsulation and separation of presentation and app logic are well  
grounded in OO design.  These principals apply to non-OO languages as  
well.  I understand that not having object references and copying  
terms between calls to erlang processes is a key element of erlang.   
But for non-mutable strings??? Not having a solution for this makes  
mainstream web apps very inefficient.

To reference Richard's earlier post:
 >    STRINGS ARE WRONG.
 >
 >Strings are a good data type for text that you are NOT going to  
manipulate.
 >If you have to manipulate text, it's usually a good idea to convert  
it to
 >something else as quickly as you can, such as an abstract syntax tree.
 >This will be orders of magnitude cheaper to process, even in C.

I agree.  I am talking about handling these non-mutable things (I've  
been calling them strings) that need to be stored (in mem and on  
disk) and passed around between processes efficiently.

My point of providing a real world example of how I need to use  
strings is that talk of string implementation without defining a set  
of cases for how it would be used is pretty, well useless.

As Ulf points out, my scenario gets even worse with 64-bit erlang.

thanks, ke han

>
> -- 
> Romain LENGLET

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20060628/ca0048ea/attachment.htm>


More information about the erlang-questions mailing list