Megaco simple Media Gateway bleeds memory under load.

Tue Aug 27 18:10:02 CEST 2002

Hi Håkan,

Thanks for the speedy reply, you're quickly becoming my mentor!

Hakan Mattsson wrote:

>There are two drawbacks with this:
>
>- GC. The system will need to perform more GC compared to
>  the current solution, where the heaps of short lived
>  processes cheaply can be removed instead of doing GC of long
>  lived processes. The inital size of the processes can be
>  regulated with options to megaco_messenger:receive_message/4.
>

I'm not sure I understand you here. I know how many the maximum number 
of concurrent unacknowledged requests is going to be as I have full 
control over the MGC, so I expect to need a similar number of 
receive_message processes and possibly a similar number of timeout 
processes too, which I hope to avoid having collected at all for the 
duration of the test.

I agree that in reality the MG ought to dynamically respond to surges of 
requests, and release resources when spent, and the current scheme is 
best for that, but I'm creating a MGC for use as a test tool, and the MG 
is only being used to simulate the product that will eventually be 
tested. I would like to push the MGC to the maximum rate to find where 
its limits are. At the moment I can only speculate that it may achieve 
500+ setup/teardowns a second on a single physical PC.

>- Non-safe congestion handling. The pool solution does not
>  really cope with the case when the MGC is able to outperform
>  the MG. You may possible be able to raise the current limit,
>  but the memory of the MG would eventually be exhausted if the
>  MGC persists. 
>  
>

Well, at the moment the congestion handling may be safe, but I would 
like to remove all congestion at the MG end if this is possible. It 
seems to me that the solution may lie in distributing the MG over two or 
more physical nodes, but I'm unclear how I'm going to achieve this. The 
Megaco manuals hint at doing exactly this, but I haven't found an 
example of how to do it. I will need to use distributed nodes  for the 
MGC anyway, to push performance into four-figure setup/teardown rates, 
so any information or instructions will be very welcome!

>I would try to push the congestion problem into the
>transport layer. By explicitly blocking the socket (use
>megaco_tcp:block/1 et al), the sender will back off and the
>receiver will not hog more resources until you unblock the
>socket again.
>
>It should be possible to keep track of the internal resources
>such as memory, number of currently handled requests
>(megaco:system_info(n_active_requests)) etc. and use that
>info to control the socket. 
>

Well, my intentions preclude any use of throttling, so I'll pass on this 
solution.

>If this is not precise enough, you could hack Megaco's
>transport modules or simply plug in a brand new one.
>A public and congestion proof megaco_sctp module would be
>nice. ;-)
>  
>

It certainly would! If I had the time...

On Tue, 27 Aug 2002, Peter-Henry Mander wrote:

Pete> In megaco_messenger.erl the receive_message/4 function spawns a process 
Pete> for each received message. The problem I have with this scheme is that 
Pete> memory is being consumed by processes spawned in receive_message/4 and 
Pete> garbage-collected at a crippling rate, leading to a bottleneck in the 
Pete> Media Gateway.
Pete> 
Pete> The MG Controller and MG run on separate machines. The MGC is only 
Pete> consuming 50%-60% CPU and has a small stable memory footprint while 
Pete> issuing over 300 add-modify-subtract request cycles each second, whereas 
Pete> the MG is struggling at 99% and has a huge and ever expanding memory 
Pete> footprint.
Pete> 
Pete> I managed to streamline the MGC by reusing processes instead of spawning 
Pete> new ones. This has made it efficient enough to potentially achieve over 
Pete> 500 call cycles a second, and I wonder if it were possible to use a 
Pete> similar scheme in receive_message/4 and use a pool of "process received 
Pete> message" processes instead of continually spawning new ones?
Pete> 
Pete> Are there any issues I must be aware of before I start "hacking" 
Pete> megaco_messenger.erl? Is there a better way than my (possibly naive) 
Pete> proposal?