Megaco Avalanche! (was: ets:update_counter/3 taking a looooong time!)
Micael Karlberg
micael.karlberg@REDACTED
Mon May 5 12:08:09 CEST 2003
Hi Peter,
I have been trying to reproduce your problem without success.
I have some questions and suggestions.
q1: Have you changed the max counter value (max_trans_id)?
A small value would lead to reset_trans_id_counter more
often (I admit that from the fprof output that does not
seem to be the case).
q2: The fprof results included here, where they produced
after the system beeing "warmed up"?
s1: When fprof'ing, use fprof:apply(Func, Args, OptionList)
and include the megaco system processes megaco_config
and megaco_monitor ({procs, [self(),megaco_config,megaco_monitor]}).
s2: In order to get more reliable results, run more then one
set of call-setup: N*(add, modify & subtract).
s3: Try setting prio of the megaco_config process to high.
/BMK
Peter-Henry Mander writes:
> Hi Scott,
>
> Well, I did a bit of hacking and then realised I should use my brain too!
>
> First I replaced megaco_config:incr_trans_id_counter/1 with something I
> rolled myself, a very simple tail-recursive counter server that didn't
> use ets. Not because of any scheme or strategy I had, just because I
> can! And the result was a noticable but minor improvement. The fprof
> trace now show that blocking occurs on waiting for a reply, but the
> performance still sucks. The media gateway end now held up the gateway
> controller beyond 18 concurrent processes instead of 16.
>
> Since I haven't managed to conclusively fprof the media gateway
> (possibly due to lack of experience I'm sorry to say), I decided to see
> if things improved by adding a 20ms pause between launching each
> process. Maybe all 2000 processes were trying to access ets at once,
> effectively an avalanche of processes. The performance sucked a bit less
> beyond 18 processes, but the call rate was a constant 36 cps all the way
> up to 2000 concurrent processes maintaining 2000 open calls. So the
> avalanche hypothesis seemed correct.
>
> This figure was a lot less than the 400 cps I get doing a "churning"
> call cycle running on a dozen threads. Each thread repeatedly does a
> Megaco add followed by modify and subtract as fast as possible. So I
> know that this rate is achievable and will remain stable and constant
> for hours on end without anything going pop!
>
> I then tweaked the launching code to introduce a 20ms pause after
> starting seven processes at a time, seven being a trial-and-error
> figure. This backs off process launching just enough to prevent the
> avalanche effet and now I can open up 2000 calls at a rate of 330 cps.
> Not quite 400 cps, but sufficient and an order of magnitude better!
>
> So, not exactly an ets problem (I'm using Linux, not FreeBSD), but I
> haven't reversed my hacks to the megaco stack to see if there is any
> significant speed gains through avoiding ets in this situation. Probably
> not, I've been assured that ets is perfectly fast enough.
>
> I hope this little tail helps someone out there, it's not always clearly
> obvious what's wrong with your code when an process avalanche situation
> occurs. Ah the joys of massively concurrent systems (-:
>
> Pete.
>
> Scott Lystig Fritchie wrote:
> >>>>>>"pm" == Peter-Henry Mander <erlang@REDACTED> writes:
> >>>>>
> >
> > pm> The attached gives the output of fprof, and the last line
> > pm> indicates that all the time is spent waiting for
> > pm> ets:update_counter/3 to return a new transaction ID to
> > pm> megaco_config:incr_trans_id_counter/1.
> >
> > I've got two theories.
> >
> > 1. Has your Erlang VM's size grown so large that your OS has started
> > paging memory to disk to make room? Or has some other OS process
> > started hogging CPU cycles?
> >
> > Er, well, those are easy guess, and surprisingly easy to forget
> > about if you're distracted by other things.
> >
> > 2. Is your OS platform FreeBSD (or perhaps one of the other *BSDs)?
> >
> > I've been doing some simple ETS benchmarks lately, and I've noticed
> > really weird behavior of ets:delete() (deleting lots of items in a
> > table or deleting an entire table at once) with FreeBSD 4.7-RELEASE
> > and 5.0-RELEASE and Erlang R9B-1. If the table is large (tens of
> > thousands to millions of items), the delete operation can take up
> > to 40 times (!) longer than running on the exact same hardware
> > under a "modern" Linux (Mandrake 9.1 distribution).
> >
> > This was so surprising to me that I tried it on two different
> > machines, a Pentium III laptop and an AMD XP+ desktop. Same thing:
> > FreeBSD was horrible in the delete case, Linux was not.
> >
> > I haven't chased down the final answer (hopefully I'll get back to
> > finding the answer and propose a fix) ... but "gprof" analysis
> > strongly suggests that libc's free() is the culprit. Bummer.
> >
> > -Scott
> >
> >
>
>
--
Micael Karlberg Ericsson AB, Älvsjö Sweden
Tel: +46 8 727 5668 EAB/UHK/KD - OTP Product Development
ECN: 851 5668 Mail: micael.karlberg@REDACTED
Fax: +46 8 727 5775
More information about the erlang-questions
mailing list