Strings (was: Re: are Mnesia tables immutable?)

ke han ke.han@REDACTED
Wed Jun 28 12:21:20 CEST 2006

On Jun 28, 2006, at 2:47 PM, Romain Lenglet wrote:

> Personally, I am voting for (1) representing strings as lists of
> Unicode code points, but (2) providing a better (more flexible,
> more efficient) external representation, and most importantly
> (3) providing a more flexible interface to the external
> encoding/decoding primitives, such as supporting strings as
> tuples as above.
I don't care about the internal representation of string so long as  
its (a) _significantly_ more memory efficient than one word per  
character in a list and (b) allows me to pass these non-mutable  
strings between processes without a mem copy each time.

My end game is writing web apps in erlang+yaws+mnesia.

The basic result of any yaws page (or any dynamic html server) is to  
output a sequence of terms into a stream the browser is expecting.   
This means the following concatenation or list of "strings"  is  
common in streaming out a page:

Header + StaticWebPagePreamble +  
StaticContentSuchAsLabelsLookedUpByUsersLangPref + HTMLInputControl +  
ContentForInputControl + ...  + HTMLSelectControl +  
ContentForSelectControl + StaticWebPageFooter

Lets assume the above is a good general view of what yaws needs to  

- Header may be hardcoded in the yaws file or calculated.
- StaticWebPagePreamble may be hardcoded in the yaws file or looked  
up used something like gettext for different language representations.
- StaticContentSuchAsLabelsLookedUpByUsersLangPref is looked up by  
user lang prefs.  e.g. is the field label "name:" or "nom:"
- HTMLInputControl hard coded in the yaws file.  may need to have its  
size adjusted based on content (see next value)
- ContentForInputControl injected or "bound attribute" from a model  
or controller process or record
- ContentForSelectControl injected or "bound attribute" from a process

This implies the following:
1 - yaws should be able to handle "strings" (even if they are some  
binary encoded utf-8 format) without touching them until its time to  
stream out the final result.  Yaws already does this for terms in  
general when constructing pages.  A string solution would need to  
ensure yaws can handle the new string form.
2 - it should be easy/possible but not necessarily efficient to  
compute the length of a string.  for example, string like  
StaticContentSuchAsLabelsLookedUpByUsersLangPref and HTMLInputControl  
might need to be sized based on the length of string content.   
Counting the characters in a utf-8 encoded binary to compute its  
length is not a problem since these string will always be short (they  
have to fit in the HTML page).
3 - lists of strings such as ContentForSelectControl might be a long  
list of country names.  You _do not_ want this list of strings copied  
from countryManager process to pageController process to yaws page  
_every time_ you output a page.  This is a killer inefficiency!!!

It is academically interesting to learn about the complexities of  
supporting unicode.  However, most practitioners like myself simple  
do not care and just want _one_ solution that works efficiently for  
an entire set of use case for the type of apps we build.
If SCSU internal and utf-8 external is smart, than thats fine.  If  
utf-8 internal and external is easier to implement and is good  
enough, its heaps better than what we have now and would unblock this  
reality (not perception) that "erlang doesn't handle strings well".

thanks, ke han

> -- 
> Romain LENGLET

More information about the erlang-questions mailing list