How to get configuration data to a large number of threads?
Joe Armstrong
joe@REDACTED
Wed Oct 27 11:10:06 CEST 2004
> We need to be able to handle large
> volumes of transactions in a short time in bursts (SMS system using
> oserl http://oserl.sourceforge.net/)
> Unfortunately the entrie config set is needed
> for this and it could be as large as 5k (worst case, 1-2k is probably
> more realistic.)
> Some quick profiling shows that we can expect in the order of thousands
> of processes active at the same time, making the memory overhead a
> problem.
Now I really have to ask "have you done any measurements" - or are
you guessing the outcome of an experiment that you have not yet
performed?
If I interpret "thousands of processes" as meaning (say) 5000
processes and take your realistic case as 2K - then the total memory
reqirement is about 10MBytes now this is not a lot of data - have you
said "hundreds of thousands" of processes then it would be a diffent
story.
> In order to keep the processing speed as fast as possible I want as many
> prallel processes as I can manage. Obviosly if I can get clever with
> the config data, this would mean more processes. Failing that, I will
> have to place a lower limit on the number of processes so that they will
> fit into memory or to prevent memory churn.
One way of "being clever with the configuration data" might be to
organise it into a number of small servers, each of which answer
queries about a specify sub-set of the gloab configuration data.
No matter how you do things you should abstract away from the
details of *how* you get the configuration data.
What I would do is as follows:
1) Define a configuartion api
config:get_data(Key) => Data
do you need more than this :-)
2) Write the most beautiful and inefficient config.erl you can think of
3) Measure
If fast enough - hooray
If not write a more ugly config.erl
You might also like to think about how long clients hold the data.
Vsn1:
foo() ->
SomeBigDataStructure = config:get_data(big)
loop(SomeBigDataStructure).
loop(SomeBigDataStructure) ->
receive
... ->
loop(SomeBigDataStructure)
end
May not be a good idea
Vsn2:
foo() -> loop().
loop() ->
receive
Msg1 ->
SomeSmallDataStructure = config:get_data(small),
... some local code which uses SomeSmallDataStructure ...
loop()
Msg1 ->
...
end
Retains the configuration data you nned for a small amount of time.
After calling loop() in the reception of Msg1 the data will be
available for garbage collection ((or possibly earlier depending upon
the smartness of the compiler)).
Note the trade-of between caching all the configuration data, and
keeping it around until you need it (Vsn1) or fetching a small ammount
of data, using it and consuming it (Vsn2).
To allow any possibilities for optimisation its probably a better to
work with many different keys in the configuration data, so you can
get small amounts of related data when you need it rather than
everything.
>> From the coments up to now it seems that I am most likely going to have
> to be satisfied with passing the config data as a parameter to the
> function that executes in each thread and rely on the GC to keep things
> sane.
>
But this is the keep everything solution.
/Joe
More information about the erlang-questions
mailing list