Megaco Avalanche! (was: ets:update_counter/3 taking a looooong time!)

Tue May 6 09:39:20 CEST 2003

Hi Micael,

Thanks for the three suggestions. Since I managed to hack the problem 
into submission and obtain a sufficiently fast result, I'm probably not 
going to be able to try them until after more urgent hacking is 
finished. But when I do I'll try to isolate and diagnose what was going on.

A curious symptom was that the two nodes either end of the megaco 
signalling link where running idle, as if they where waiting for each 
other. Could it be that they were franticly garbage collecting or 
something? The CPU usage was practically zero.

Q1: no, I haven't changed any default values. I noticed the 
reset_trans_id_counter function but ignored it because it was never 
being called.

Q2: The results were the same whether "hot" or "cold." I assume you are 
hinting at the garbage collector kicking in?

S2: I have a different test that "churns" the calls very quickly, 
setting them up and tearing them down without holding them and without 
any pause between. This one sustains almost 400 calls setup and teardown 
every second, so it is clear to me that ets:update_counter/3 *must* be 
capable of at least this rate. When I attempted this new test where 
calls were setup without tearing them down I was expecting at least 
400/second if not more. So I am very puzzled as to why I have to 
introduce "breathing time" when before it wasn't necessary. The only 
difference is that memory is being rapidly consumed by the call 
processes to store the call contexts.

If I manage to reproduce the symptoms with a subset of the code, would 
you be interested in reading it? I must warn you that it is *very* 
"experimental!" i.e. hacked code.

Pete.

Micael Karlberg wrote:

Hi Peter,

I have been trying to reproduce your problem without success.
I have some questions and suggestions.

   q1: Have you changed the max counter value (max_trans_id)?
       A small value would lead to reset_trans_id_counter more
       often (I admit that from the fprof output that does not
       seem to be the case).

   q2: The fprof results included here, where they produced
       after the system beeing "warmed up"?

   s1: When fprof'ing, use fprof:apply(Func, Args, OptionList)
       and include the megaco system processes megaco_config
       and megaco_monitor ({procs, [self(),megaco_config,megaco_monitor]}).

   s2: In order to get more reliable results, run more then one
       set of call-setup: N*(add, modify & subtract).

   s3: Try setting prio of the megaco_config process to high.

/BMK