Megaco simple Media Gateway bleeds memory under load.

Wed Aug 28 18:39:43 CEST 2002

Thanks again Håkan,

I think understand you now, lets see if I got it. With a large enough 
heap, a spawned process will not require further chunks from system 
memory, and therefore will not cause garbage collection sweeps, but only 
while the process doesn't terminate. When the process itself terminates, 
garbage collection reclaims the process heap and anything else allocated 
from it in one sweep. Am I correct? If I am, I can understand why it 
would be more efficient than allocating and freeing small fragments of 
system memory, and that it would avoid memory fragmentation.

But what if the megaco_messenger processes are spawned at a high rate, 
as they appear to be in the MG when receiving almost 1000 
transactions/second? I suspect that memory will get eaten up very 
quickly by spawned processes with large heaps. Is it possible that the 
garbage collector process is starved (since CPU usage is 99%) due to the 
rate at which megaco_messenger processes are being spawned? My idea of 
maintaining a pool of megaco_messenger may not be an elegant solution, 
and I may be accused of micro-managing memory as would a C programmer! 
But I would like to persue it, simply to convince myself, either yes or 
no, whether this imperative pardigm may have value in a functional 
language. I'll try to keep you updated on this, and the distributed MG 
as you describe below, and the Megaco V2 work I mentioned earlier.

I wonder, since you seem to still be developing the Megaco stack, 
whether my feedback is useful to you? I am extremely interested in the 
partial/distributed decoding you mentioned in the last paragraph. Is 
there a complementary distributed encoding project in the pipeline?

Pete.

Hakan Mattsson wrote:

>On Tue, 27 Aug 2002, Peter-Henry Mander wrote:
>
>Pete> I'm not sure I understand you here. I know how many the maximum number 
>Pete> of concurrent unacknowledged requests is going to be as I have full 
>Pete> control over the MGC, so I expect to need a similar number of 
>Pete> receive_message processes and possibly a similar number of timeout 
>Pete> processes too, which I hope to avoid having collected at all for the 
>Pete> duration of the test.
>
>Sorry. What I was getting at, was that it is not obvious that spawning
>new processes should be avoided by performance reasons. In the Megaco
>application one process (megaco_tcp/megaco_udp) reads the bytes off a
>socket and spawns a new process (megaco_messenger) with a binary as
>argument. The new process decodes the binary and creates lots of
>temporary data and eventually terminates. If the initial size of the
>process heap is set large enough (see spawn_opt), no GC is needed at
>all.
>
>Pete> Well, at the moment the congestion handling may be safe, but I would 
>Pete> like to remove all congestion at the MG end if this is possible. It 
>Pete> seems to me that the solution may lie in distributing the MG over two or 
>Pete> more physical nodes, but I'm unclear how I'm going to achieve this. The 
>Pete> Megaco manuals hint at doing exactly this, but I haven't found an 
>Pete> example of how to do it. I will need to use distributed nodes  for the 
>Pete> MGC anyway, to push performance into four-figure setup/teardown rates, 
>Pete> so any information or instructions will be very welcome!
>
>There are some (limited) documentation about distributed MG's/MGC's in
>the reference manual for megaco:connect/4.
>
>The basic idea is that you invoke megaco:connect/4 as usual on the
>Erlang node (A) that holds the connection to the MGC. Then you may
>invoke megaco:connect/4 on another node (B) using the SAME ControlPid.
>
>Now you have enabled usage of megaco:call/3 and megaco:cast/3 from the
>B node (as well as from node A). The encoding work is performed on the
>originating node (B) while the decoding work is performed on the node
>holding the connection (A).
>
>In order to off-load the A node as much as possible, we have been
>looking into a more sophisticated solution where the message is
>partially decoded on the A node. And then based on the extracted
>transaction id, the message is forwarded as a binary to node B where
>the complete decoding is performed. The implementation of this is
>however not complete yet.
>
>/Håkan
>
>
>
>
>  
>