[erlang-bugs] Memory leak in diameter_service module in diameter app (otp_R16B)

Mon May 27 10:40:27 CEST 2013

Hi Aleksander.

Yes, it is indeed a bug that was introduced in R16B. The fix was
merged into maint on April 12, in this commit:

  https://github.com/erlang/otp/commit/656b37f1b6fbc3611f5e0f8b8c0e4f61bef9092b

The commit for the fix itself points at the one that introduced the error:

  https://github.com/erlang/otp/commit/c609108ce017069a77708f80dae9e89c45ff222d

So, fetch maint and the problem should be solved.

Sorry for the slow reply: I've been on vacation.

Anders

On Mon, May 20, 2013 at 11:45 AM, Aleksander Nycz
<Aleksander.Nycz@REDACTED> wrote:
> Hello,
>
> I think there is a problem with resource leak (memory) in diameter_service
> module.
>
> This module is a gen_server, that state contains field watchdogT ::
> ets:tid().
> This ets contains info about watchdogs.
>
> Diameter app service cfg is:
>
> [{'Origin-Host',  HostName},
>      {'Origin-Realm', Realm},
>         {'Vendor-Id',     ...},
>      {'Product-Name', ...},
>      {'Auth-Application-Id', [?DCCA_APP_ID]},
>      {'Supported-Vendor-Id', [...]},
>      {application,     [{alias,       diameterNode},
>                        {dictionary, dictionaryDCCA},
>                      {module,       dccaCallback}]},
>      {restrict_connections, false}]
>
> After start dimeter app, adding service and transport, diameter_service
> state is:
>
>> diameter_service:state(diameterNode).
> #state{id = {1369,41606,329900},
>        service_name = diameterNode,
>        service = #diameter_service{pid = <0.1011.0>,
>                                    capabilities = #diameter_caps{...},
>                                    applications = [#diameter_app{...}]},
>        watchdogT = 4194395,peerT = 4259932,shared_peers = 4325469,
>        local_peers = 4391006,monitor = false,
>        options = [{sequence,{0,32}},
>                   {share_peers,false},
>                   {use_shared_peers,false},
>                   {restrict_connections,false}]}
>
> and ets 4194395 has one record:
>
>> ets:tab2list(4194395).
> [#watchdog{pid = <0.1013.0>,type = accept,
>            ref = #Ref<0.0.0.1696>,
>            options = [{transport_module,diameter_tcp},
>                       {transport_config,[{reuseaddr,true},
>                                          {ip,{0,0,0,0}},
>                                          {port,4068}]},
>                       {capabilities_cb,[#Fun<diameterNode.acceptCER.2>]},
>                       {watchdog_timer,30000},
>                       {reconnect_timer,60000}],
>            state = initial,
>            started = {1369,41606,330086},
>            peer = false}]
>
>   Next I run very simple test using seagull symulator. Test scenario is
> following:
>
> 1. seagull: send CER
> 2. seagull: recv CEA
> 3. seagull: send CCR (init)
> 4. seagull: recv CCA (init)
> 5. seagull: send CCR (update)
> 6. seagull: recv CCR (update)
> 7. seagull: send CCR (terminate)
> 8. seagull: recv CCA (terminate)
>
> Durring test there are two watchdogs in ets:
>
>> ets:tab2list(4194395).
> [#watchdog{pid = <0.1816.0>,type = accept,
>            ref = #Ref<0.0.0.1696>,
>            options = [{transport_module,diameter_tcp},
>                       {transport_config,[{reuseaddr,true},
>                                          {ip,{0,0,0,0}},
>                                          {port,4068}]},
>                       {capabilities_cb,[#Fun<diameterNode.acceptCER.2>]},
>                       {watchdog_timer,30000},
>                       {reconnect_timer,60000}],
>            state = initial,
>            started = {1369,41823,711370},
>            peer = false},
>  #watchdog{pid = <0.1013.0>,type = accept,
>            ref = #Ref<0.0.0.1696>,
>            options = [{transport_module,diameter_tcp},
>                       {transport_config,[{reuseaddr,true},
>                                          {ip,{0,0,0,0}},
>                                          {port,4068}]},
>                       {capabilities_cb,[#Fun<diameterNode.acceptCER.2>]},
>                       {watchdog_timer,30000},
>                       {reconnect_timer,60000}],
>            state = okay,
>            started = {1369,41606,330086},
>            peer = <0.1014.0>}]
>
> After test but before tw timer elapsed, there is two watchdogs also and this
> is ok:
>
>> ets:tab2list(4194395).
> [#watchdog{pid = <0.1816.0>,type = accept,
>            ref = #Ref<0.0.0.1696>,
>            options = [{transport_module,diameter_tcp},
>                       {transport_config,[{reuseaddr,true},
>                                          {ip,{0,0,0,0}},
>                                          {port,4068}]},
>                       {capabilities_cb,[#Fun<diameterNode.acceptCER.2>]},
>                       {watchdog_timer,30000},
>                       {reconnect_timer,60000}],
>            state = initial,
>            started = {1369,41823,711370},
>            peer = false},
>  #watchdog{pid = <0.1013.0>,type = accept,
>            ref = #Ref<0.0.0.1696>,
>            options = [{transport_module,diameter_tcp},
>                       {transport_config,[{reuseaddr,true},
>                                          {ip,{0,0,0,0}},
>                                          {port,4068}]},
>                       {capabilities_cb,[#Fun<diameterNode.acceptCER.2>]},
>                       {watchdog_timer,30000},
>                       {reconnect_timer,60000}],
>            state = down,
>            started = {1369,41606,330086},
>            peer = <0.1014.0>}]
>
> But when tm timer elapsed transport and watchdog processes are finished:
>
>> erlang:is_process_alive(list_to_pid("<0.1014.0>")).
> false
>> erlang:is_process_alive(list_to_pid("<0.1013.0>")).
> false
>
> and still two watchdogs are in ets:
>
>> ets:tab2list(4194395).
> [#watchdog{pid = <0.1816.0>,type = accept,
>            ref = #Ref<0.0.0.1696>,
>            options = [{transport_module,diameter_tcp},
>                       {transport_config,[{reuseaddr,true},
>                                          {ip,{0,0,0,0}},
>                                          {port,4068}]},
>                       {capabilities_cb,[#Fun<diameterNode.acceptCER.2>]},
>                       {watchdog_timer,30000},
>                       {reconnect_timer,60000}],
>            state = initial,
>            started = {1369,41823,711370},
>            peer = false},
>  #watchdog{pid = <0.1013.0>,type = accept,
>            ref = #Ref<0.0.0.1696>,
>            options = [{transport_module,diameter_tcp},
>                       {transport_config,[{reuseaddr,true},
>                                          {ip,{0,0,0,0}},
>                                          {port,4068}]},
>                       {capabilities_cb,[#Fun<diameterNode.acceptCER.2>]},
>                       {watchdog_timer,30000},
>                       {reconnect_timer,60000}],
>            state = down,
>            started = {1369,41606,330086},
>            peer = <0.1014.0>}]
>
> I think watchdog <0.1013.0> should be removed when watchdog process is being
> finished.
>
> I run next test and now there are 3 watchdogs in ets:
>
>> ets:tab2list(4194395).
> [#watchdog{pid = <0.1816.0>,type = accept,
>            ref = #Ref<0.0.0.1696>,
>            options = [{transport_module,diameter_tcp},
>                       {transport_config,[{reuseaddr,true},
>                                          {ip,{0,0,0,0}},
>                                          {port,4068}]},
>                       {capabilities_cb,[#Fun<diameterNode.acceptCER.2>]},
>                       {watchdog_timer,30000},
>                       {reconnect_timer,60000}],
>            state = down,
>            started = {1369,41823,711370},
>            peer = <0.1817.0>},
>  #watchdog{pid = <0.1013.0>,type = accept,
>            ref = #Ref<0.0.0.1696>,
>            options = [{transport_module,diameter_tcp},
>                       {transport_config,[{reuseaddr,true},
>                                          {ip,{0,0,0,0}},
>                                          {port,4068}]},
>                       {capabilities_cb,[#Fun<diameterNode.acceptCER.2>]},
>                       {watchdog_timer,30000},
>                       {reconnect_timer,60000}],
>            state = down,
>            started = {1369,41606,330086},
>            peer = <0.1014.0>},
>  #watchdog{pid = <0.3533.0>,type = accept,
>            ref = #Ref<0.0.0.1696>,
>            options = [{transport_module,diameter_tcp},
>                       {transport_config,[{reuseaddr,true},
>                                          {ip,{0,0,0,0}},
>                                          {port,4068}]},
>                       {capabilities_cb,[#Fun<diameterNode.acceptCER.2>]},
>                       {watchdog_timer,30000},
>                       {reconnect_timer,60000}],
>            state = initial,
>            started = {1369,42342,845898},
>            peer = false}]
>
> Watchdog and transport process are not alive:
>
>> erlang:is_process_alive(list_to_pid("<0.1816.0>")).
> false
>> erlang:is_process_alive(list_to_pid("<0.1817.0>")).
> false
>
>
> I suggest following change in code to correct this problem (file
> diameter_service.erl):
>
> $ diff diameter_service.erl diameter_service.erl_ok
> 1006c1006
> < connection_down(#watchdog{state = WS,
> ---
>> connection_down(#watchdog{state = ?WD_OKAY,
> 1015,1017c1015,1021
> <     ?WD_OKAY == WS
> <         andalso
> <         connection_down(Wd, fetch(PeerT, TPid), S).
> ---
>>     connection_down(Wd, fetch(PeerT, TPid), S);
>>
>> connection_down(#watchdog{},
>>                 To,
>>                 #state{})
>>   when is_atom(To) ->
>>     ok.
>
> You can find this solution in attachement.
>
> Regards
> Aleksander Nycz
>
>
> --
> Aleksander Nycz
> Senior Software Engineer
> Telco_021 BSS R&D
> Comarch SA
> Phone:  +48 12 646 1216
> Mobile: +48 691 464 275
> website: www.comarch.pl
>
>
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://erlang.org/mailman/listinfo/erlang-bugs
>