Strings (was: Re: are Mnesia tables immutable?)

Thu Jun 29 19:01:25 CEST 2006

Christian,
Thanks for the reply...

On Jun 29, 2006, at 3:38 PM, Christian S wrote:

> On 6/28/06, ke han <ke.han@REDACTED> wrote:
>> In the example I gave, my countryManager process is a singleton  
>> (pardon the
>> oo pattern reference, but thats what it is) that serves the entire  
>> VM to
>> answer a list of countries.  This is a lengthy list of short utf-8  
>> encoded
>> binaries.  So wouldn't the list get copied?  And won't each short  
>> binary in
>> the list get copied as well?  There must be a better way.
>
> When benchmarking, how fast could you serve requests to your
> countryManager? (Btw, registered process would be more erlangy than
> 'singleton')

thanks..calling it a registered process is best...although if I can  
really get my apps designed right, I wouldn't register processes by  
name.  I would inject the process into the controllers that need to  
know about them and not name them at all.

>
> What job does it do?

I was trying to write a simple example so not to let the app design  
get in the way of my point.  In tthe apps I'm building there are  
_many_ lists of strings.  Some of these are as simple as names of  
countries, states, technology interests, etc...  Some lists are keyed  
by other lists...e.g.  states grouped by country choice.  Since my  
app is AJAX oriented, sometimes these lists get encoded into the  
original page that gets sent to the browser and sometimes they get  
sent later as json data to update dependent lists.
These lists come from mnesia tables and are managed by appropriate  
processes which encapsulate access to the lists.  Mnesia table size  
is another concern..but I think I can deal with this easier than my  
main memory concerns.

In addition to these basic look-up-table types lists, lots of other  
lists of strings or complex terms (which mostly contain strings)  
occur in my app (mostly to create html tables).
The bottom line is that to get at any list, a message is sent from  
the yaws page to a controller (a separate process) which then sends a  
message to a model (sometimes another process sometimes a record.   
Each of these sends is synchronized to wait for a return of a copy of  
these lists.  So not only is this data stored as lists of integers  
(which gets really bad for 64-bit) but they are being copied with  
each message send.

>
>> In order to get around this problem, I would have to destroy MVC  
>> separations
>> and have my model object (countryManager) return an already  
>> serialized
>> binary of binaries (or if I'm going to do that I may as well have the
>> countryManager go ahead and serialize it to json form as well).
>> This violates lots of sounds application design.  Basic principles of
>> encapsulation and separation of presentation and app logic are  
>> well grounded
>> in OO design.  These principals apply to non-OO languages as well.  I
>> understand that not having object references and copying terms  
>> between calls
>> to erlang processes is a key element of erlang.  But for non-mutable
>> strings??? Not having a solution for this makes mainstream web  
>> apps very
>> inefficient.
>
> Since we have first class functions in erlang you can pass your
> countryManager process a function that process the data it has, and
> send you back only the result of that call. No violation of sound
> application design. This is a trick languages without first class
> functions have a hard time to take advantage of, luckily Erlang is not
> that crappy.

yes, I am looking into solutions like this.  I will post to the yaws  
maillist asking about how to accomplish some of the ideas I have  
rolling around.

>
> You keep mentioning non-mutable strings as we had mutable strings. We
> have ways to modify bindings (process dictionary or ets) but not to
> manipulate the string value itself (hipe extensions ignored). The
> later is a good thing nobody want to give up.

I was stressing the strings were non-mutable (and should have added  
don't require character level access) because it seemed the  
discussion going on in this thread was talking about many other  
unicode issues and I wanted to stress the difference.

>
> Where are your benchmark that show how mainstream web apps in erlang
> are very inefficient? Maybe you are just doing the wrong thing?

The apps I develop are mostly data in / data out with some nice  
presentation and validation on what goes in.  This means that the  
majority of memory is taken up by strings as most of my data is text  
of some form or another.  I don't need benchmarks to know that 4  
bytes per character is _too_ much.  In most cases its 4x too much and  
going to 64-bits is off limits with this type of memory allocation.   
Add to that the intermittent copying of these lists of integers (one  
page request could trigger 20 copies of lengthy lists of lists of  
integers in memory...just to stream out a page containing drop down  
lists that don't change very often)... and you will get spikes of  
memory allocation as the number of page requests grows.  It turns out  
that processor performance, io, concurrency issues won't be my first  
bottleneck...it will be memory taken up by strings!!!

I am actually less concerned about the copy time...but the mem  
required by the strings (in the model objects and in mnesia) and the  
mem required for a web server to constantly be copying these lists to  
output pages.

sorry..I know this already turned into a rant...I do like erlang very  
much...which is why I'm crying out for help on this issue.
I already have one erlang+yaws+mnesia app in production.  Its an  
internal corporate app and the uptake on usage is slow..,but I can  
already tell the memory its taking for all the data is too much...I  
should be able to get at least twice as much data in RAM as I have.
The next app I'm writing over this summer should get released this  
September.  This will be a world-wide highly public app and will  
hopefully get lots of page requests.  The last time I launched a  
large system on the web was a few years ago...it was a Java based web  
app.  It actually scaled pretty well but I can vividly recall that  
what kept me from sleeping at night was worying about my  
servers...will some mem leak crash things...will some concurrency  
deadlock crash the system..etc...
I chose erlang for this new app because I want to sleep at night when  
I launch this next product.   I have to launch this new app on one or  
two low end servers and pray for success...My biggest fear is memory  
support for all my character strings.  performance is secondary.

thanks for allowing the rant...
ke han