Megaco simple Media Gateway bleeds memory under load.

Tue Aug 27 17:30:29 CEST 2002

On Tue, 27 Aug 2002, Peter-Henry Mander wrote:

Pete> In megaco_messenger.erl the receive_message/4 function spawns a process 
Pete> for each received message. The problem I have with this scheme is that 
Pete> memory is being consumed by processes spawned in receive_message/4 and 
Pete> garbage-collected at a crippling rate, leading to a bottleneck in the 
Pete> Media Gateway.
Pete> 
Pete> The MG Controller and MG run on separate machines. The MGC is only 
Pete> consuming 50%-60% CPU and has a small stable memory footprint while 
Pete> issuing over 300 add-modify-subtract request cycles each second, whereas 
Pete> the MG is struggling at 99% and has a huge and ever expanding memory 
Pete> footprint.
Pete> 
Pete> I managed to streamline the MGC by reusing processes instead of spawning 
Pete> new ones. This has made it efficient enough to potentially achieve over 
Pete> 500 call cycles a second, and I wonder if it were possible to use a 
Pete> similar scheme in receive_message/4 and use a pool of "process received 
Pete> message" processes instead of continually spawning new ones?
Pete> 
Pete> Are there any issues I must be aware of before I start "hacking" 
Pete> megaco_messenger.erl? Is there a better way than my (possibly naive) 
Pete> proposal?

There are two drawbacks with this:

- GC. The system will need to perform more GC compared to
  the current solution, where the heaps of short lived
  processes cheaply can be removed instead of doing GC of long
  lived processes. The inital size of the processes can be
  regulated with options to megaco_messenger:receive_message/4.

- Non-safe congestion handling. The pool solution does not
  really cope with the case when the MGC is able to outperform
  the MG. You may possible be able to raise the current limit,
  but the memory of the MG would eventually be exhausted if the
  MGC persists. 

I would try to push the congestion problem into the
transport layer. By explicitly blocking the socket (use
megaco_tcp:block/1 et al), the sender will back off and the
receiver will not hog more resources until you unblock the
socket again.

It should be possible to keep track of the internal resources
such as memory, number of currently handled requests
(megaco:system_info(n_active_requests)) etc. and use that
info to control the socket. 

If this is not precise enough, you could hack Megaco's
transport modules or simply plug in a brand new one.
A public and congestion proof megaco_sctp module would be
nice. ;-)

/Håkan