Strings (was: Re: are Mnesia tables immutable?)
ke han
ke.han@REDACTED
Wed Jun 28 16:14:48 CEST 2006
On Jun 28, 2006, at 6:45 PM, Romain Lenglet wrote:
> ke han wrote:
>> On Jun 28, 2006, at 2:47 PM, Romain Lenglet wrote:
>>> Personally, I am voting for (1) representing strings as
>>> lists of Unicode code points, but (2) providing a better
>>> (more flexible, more efficient) external representation, and
>>> most importantly (3) providing a more flexible interface to
>>> the external encoding/decoding primitives, such as
>>> supporting strings as tuples as above.
>>
>> I don't care about the internal representation of string so
>> long as its (a) _significantly_ more memory efficient than one
>> word per character in a list and (b) allows me to pass these
>> non-mutable strings between processes without a mem copy each
>> time.
>>
>> My end game is writing web apps in erlang+yaws+mnesia.
>
> What we were discussing is how to internally represent, and
> externally encode (in the term_to_binary/1 sense), strings in a
> form suitable for building or modification by programs. You are
> discussing about the need to pass around strings that are
> already 8-bit encoded and that don't need to be modified.
> Different problems. Different representations.
right..I do understand this thread has several facets...I'm adding my
high level application needs into the mix to ensure they aren't
forgotten...why, because I haven't the skills to solve this low level
problem and need guys like you who do understand the internals of
erlang to take these needs into account in hopes of eventually
getting something useful ;-)
>
>> The basic result of any yaws page (or any dynamic html server)
>> is to output a sequence of terms into a stream the browser is
>> expecting. This means the following concatenation or list of
>> "strings" is common in streaming out a page:
>>
>> Header + StaticWebPagePreamble +
>> StaticContentSuchAsLabelsLookedUpByUsersLangPref +
>> HTMLInputControl + ContentForInputControl + ... +
>> HTMLSelectControl +
>> ContentForSelectControl + StaticWebPageFooter
> [...]
>
> Since you don't seem to need to modify the contents of those
> strings, why don't IO-lists (i.e. a list of binaries) fit your
> need? You should simply pass a list of binaries, where each
> binary contains text is 8-bit encoded in UTF-8 or ISO-8859-1 or
> whatever. Binaries are not copied. Such IO-lists are what is
> used to communicate with linked-in C drivers. IO-lists are the
> most efficient way to transmit large data in an Erlang node.
> Why doesn't that fit your needs?
I understand that a lengthy binary is not copied. I have seen posts
on this maillist that short binaries _are_ copied and long ones _are
not_...but I don't know what length determines when something is
copied or not.
In the example I gave, my countryManager process is a singleton
(pardon the oo pattern reference, but thats what it is) that serves
the entire VM to answer a list of countries. This is a lengthy list
of short utf-8 encoded binaries. So wouldn't the list get copied?
And won't each short binary in the list get copied as well? There
must be a better way.
In order to get around this problem, I would have to destroy MVC
separations and have my model object (countryManager) return an
already serialized binary of binaries (or if I'm going to do that I
may as well have the countryManager go ahead and serialize it to json
form as well).
This violates lots of sounds application design. Basic principles of
encapsulation and separation of presentation and app logic are well
grounded in OO design. These principals apply to non-OO languages as
well. I understand that not having object references and copying
terms between calls to erlang processes is a key element of erlang.
But for non-mutable strings??? Not having a solution for this makes
mainstream web apps very inefficient.
To reference Richard's earlier post:
> STRINGS ARE WRONG.
>
>Strings are a good data type for text that you are NOT going to
manipulate.
>If you have to manipulate text, it's usually a good idea to convert
it to
>something else as quickly as you can, such as an abstract syntax tree.
>This will be orders of magnitude cheaper to process, even in C.
I agree. I am talking about handling these non-mutable things (I've
been calling them strings) that need to be stored (in mem and on
disk) and passed around between processes efficiently.
My point of providing a real world example of how I need to use
strings is that talk of string implementation without defining a set
of cases for how it would be used is pretty, well useless.
As Ulf points out, my scenario gets even worse with 64-bit erlang.
thanks, ke han
>
> --
> Romain LENGLET
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20060628/ca0048ea/attachment.htm>
More information about the erlang-questions
mailing list