Strings (was: Re: are Mnesia tables immutable?)
Wed Jun 28 12:21:20 CEST 2006
On Jun 28, 2006, at 2:47 PM, Romain Lenglet wrote:
> Personally, I am voting for (1) representing strings as lists of
> Unicode code points, but (2) providing a better (more flexible,
> more efficient) external representation, and most importantly
> (3) providing a more flexible interface to the external
> encoding/decoding primitives, such as supporting strings as
> tuples as above.
I don't care about the internal representation of string so long as
its (a) _significantly_ more memory efficient than one word per
character in a list and (b) allows me to pass these non-mutable
strings between processes without a mem copy each time.
My end game is writing web apps in erlang+yaws+mnesia.
The basic result of any yaws page (or any dynamic html server) is to
output a sequence of terms into a stream the browser is expecting.
This means the following concatenation or list of "strings" is
common in streaming out a page:
Header + StaticWebPagePreamble +
StaticContentSuchAsLabelsLookedUpByUsersLangPref + HTMLInputControl +
ContentForInputControl + ... + HTMLSelectControl +
ContentForSelectControl + StaticWebPageFooter
Lets assume the above is a good general view of what yaws needs to
- Header may be hardcoded in the yaws file or calculated.
- StaticWebPagePreamble may be hardcoded in the yaws file or looked
up used something like gettext for different language representations.
- StaticContentSuchAsLabelsLookedUpByUsersLangPref is looked up by
user lang prefs. e.g. is the field label "name:" or "nom:"
- HTMLInputControl hard coded in the yaws file. may need to have its
size adjusted based on content (see next value)
- ContentForInputControl injected or "bound attribute" from a model
or controller process or record
- ContentForSelectControl injected or "bound attribute" from a process
This implies the following:
1 - yaws should be able to handle "strings" (even if they are some
binary encoded utf-8 format) without touching them until its time to
stream out the final result. Yaws already does this for terms in
general when constructing pages. A string solution would need to
ensure yaws can handle the new string form.
2 - it should be easy/possible but not necessarily efficient to
compute the length of a string. for example, string like
StaticContentSuchAsLabelsLookedUpByUsersLangPref and HTMLInputControl
might need to be sized based on the length of string content.
Counting the characters in a utf-8 encoded binary to compute its
length is not a problem since these string will always be short (they
have to fit in the HTML page).
3 - lists of strings such as ContentForSelectControl might be a long
list of country names. You _do not_ want this list of strings copied
from countryManager process to pageController process to yaws page
_every time_ you output a page. This is a killer inefficiency!!!
It is academically interesting to learn about the complexities of
supporting unicode. However, most practitioners like myself simple
do not care and just want _one_ solution that works efficiently for
an entire set of use case for the type of apps we build.
If SCSU internal and utf-8 external is smart, than thats fine. If
utf-8 internal and external is easier to implement and is good
enough, its heaps better than what we have now and would unblock this
reality (not perception) that "erlang doesn't handle strings well".
thanks, ke han
> Romain LENGLET
More information about the erlang-questions