[erlang-bugs] Memory leak in diameter_service module in diameter app (otp_R16B)
Anders Svensson
anders.otp@REDACTED
Mon May 27 10:40:27 CEST 2013
Hi Aleksander.
Yes, it is indeed a bug that was introduced in R16B. The fix was
merged into maint on April 12, in this commit:
https://github.com/erlang/otp/commit/656b37f1b6fbc3611f5e0f8b8c0e4f61bef9092b
The commit for the fix itself points at the one that introduced the error:
https://github.com/erlang/otp/commit/c609108ce017069a77708f80dae9e89c45ff222d
So, fetch maint and the problem should be solved.
Sorry for the slow reply: I've been on vacation.
Anders
On Mon, May 20, 2013 at 11:45 AM, Aleksander Nycz
<Aleksander.Nycz@REDACTED> wrote:
> Hello,
>
> I think there is a problem with resource leak (memory) in diameter_service
> module.
>
> This module is a gen_server, that state contains field watchdogT ::
> ets:tid().
> This ets contains info about watchdogs.
>
> Diameter app service cfg is:
>
> [{'Origin-Host', HostName},
> {'Origin-Realm', Realm},
> {'Vendor-Id', ...},
> {'Product-Name', ...},
> {'Auth-Application-Id', [?DCCA_APP_ID]},
> {'Supported-Vendor-Id', [...]},
> {application, [{alias, diameterNode},
> {dictionary, dictionaryDCCA},
> {module, dccaCallback}]},
> {restrict_connections, false}]
>
> After start dimeter app, adding service and transport, diameter_service
> state is:
>
>> diameter_service:state(diameterNode).
> #state{id = {1369,41606,329900},
> service_name = diameterNode,
> service = #diameter_service{pid = <0.1011.0>,
> capabilities = #diameter_caps{...},
> applications = [#diameter_app{...}]},
> watchdogT = 4194395,peerT = 4259932,shared_peers = 4325469,
> local_peers = 4391006,monitor = false,
> options = [{sequence,{0,32}},
> {share_peers,false},
> {use_shared_peers,false},
> {restrict_connections,false}]}
>
> and ets 4194395 has one record:
>
>> ets:tab2list(4194395).
> [#watchdog{pid = <0.1013.0>,type = accept,
> ref = #Ref<0.0.0.1696>,
> options = [{transport_module,diameter_tcp},
> {transport_config,[{reuseaddr,true},
> {ip,{0,0,0,0}},
> {port,4068}]},
> {capabilities_cb,[#Fun<diameterNode.acceptCER.2>]},
> {watchdog_timer,30000},
> {reconnect_timer,60000}],
> state = initial,
> started = {1369,41606,330086},
> peer = false}]
>
> Next I run very simple test using seagull symulator. Test scenario is
> following:
>
> 1. seagull: send CER
> 2. seagull: recv CEA
> 3. seagull: send CCR (init)
> 4. seagull: recv CCA (init)
> 5. seagull: send CCR (update)
> 6. seagull: recv CCR (update)
> 7. seagull: send CCR (terminate)
> 8. seagull: recv CCA (terminate)
>
> Durring test there are two watchdogs in ets:
>
>> ets:tab2list(4194395).
> [#watchdog{pid = <0.1816.0>,type = accept,
> ref = #Ref<0.0.0.1696>,
> options = [{transport_module,diameter_tcp},
> {transport_config,[{reuseaddr,true},
> {ip,{0,0,0,0}},
> {port,4068}]},
> {capabilities_cb,[#Fun<diameterNode.acceptCER.2>]},
> {watchdog_timer,30000},
> {reconnect_timer,60000}],
> state = initial,
> started = {1369,41823,711370},
> peer = false},
> #watchdog{pid = <0.1013.0>,type = accept,
> ref = #Ref<0.0.0.1696>,
> options = [{transport_module,diameter_tcp},
> {transport_config,[{reuseaddr,true},
> {ip,{0,0,0,0}},
> {port,4068}]},
> {capabilities_cb,[#Fun<diameterNode.acceptCER.2>]},
> {watchdog_timer,30000},
> {reconnect_timer,60000}],
> state = okay,
> started = {1369,41606,330086},
> peer = <0.1014.0>}]
>
> After test but before tw timer elapsed, there is two watchdogs also and this
> is ok:
>
>> ets:tab2list(4194395).
> [#watchdog{pid = <0.1816.0>,type = accept,
> ref = #Ref<0.0.0.1696>,
> options = [{transport_module,diameter_tcp},
> {transport_config,[{reuseaddr,true},
> {ip,{0,0,0,0}},
> {port,4068}]},
> {capabilities_cb,[#Fun<diameterNode.acceptCER.2>]},
> {watchdog_timer,30000},
> {reconnect_timer,60000}],
> state = initial,
> started = {1369,41823,711370},
> peer = false},
> #watchdog{pid = <0.1013.0>,type = accept,
> ref = #Ref<0.0.0.1696>,
> options = [{transport_module,diameter_tcp},
> {transport_config,[{reuseaddr,true},
> {ip,{0,0,0,0}},
> {port,4068}]},
> {capabilities_cb,[#Fun<diameterNode.acceptCER.2>]},
> {watchdog_timer,30000},
> {reconnect_timer,60000}],
> state = down,
> started = {1369,41606,330086},
> peer = <0.1014.0>}]
>
> But when tm timer elapsed transport and watchdog processes are finished:
>
>> erlang:is_process_alive(list_to_pid("<0.1014.0>")).
> false
>> erlang:is_process_alive(list_to_pid("<0.1013.0>")).
> false
>
> and still two watchdogs are in ets:
>
>> ets:tab2list(4194395).
> [#watchdog{pid = <0.1816.0>,type = accept,
> ref = #Ref<0.0.0.1696>,
> options = [{transport_module,diameter_tcp},
> {transport_config,[{reuseaddr,true},
> {ip,{0,0,0,0}},
> {port,4068}]},
> {capabilities_cb,[#Fun<diameterNode.acceptCER.2>]},
> {watchdog_timer,30000},
> {reconnect_timer,60000}],
> state = initial,
> started = {1369,41823,711370},
> peer = false},
> #watchdog{pid = <0.1013.0>,type = accept,
> ref = #Ref<0.0.0.1696>,
> options = [{transport_module,diameter_tcp},
> {transport_config,[{reuseaddr,true},
> {ip,{0,0,0,0}},
> {port,4068}]},
> {capabilities_cb,[#Fun<diameterNode.acceptCER.2>]},
> {watchdog_timer,30000},
> {reconnect_timer,60000}],
> state = down,
> started = {1369,41606,330086},
> peer = <0.1014.0>}]
>
> I think watchdog <0.1013.0> should be removed when watchdog process is being
> finished.
>
> I run next test and now there are 3 watchdogs in ets:
>
>> ets:tab2list(4194395).
> [#watchdog{pid = <0.1816.0>,type = accept,
> ref = #Ref<0.0.0.1696>,
> options = [{transport_module,diameter_tcp},
> {transport_config,[{reuseaddr,true},
> {ip,{0,0,0,0}},
> {port,4068}]},
> {capabilities_cb,[#Fun<diameterNode.acceptCER.2>]},
> {watchdog_timer,30000},
> {reconnect_timer,60000}],
> state = down,
> started = {1369,41823,711370},
> peer = <0.1817.0>},
> #watchdog{pid = <0.1013.0>,type = accept,
> ref = #Ref<0.0.0.1696>,
> options = [{transport_module,diameter_tcp},
> {transport_config,[{reuseaddr,true},
> {ip,{0,0,0,0}},
> {port,4068}]},
> {capabilities_cb,[#Fun<diameterNode.acceptCER.2>]},
> {watchdog_timer,30000},
> {reconnect_timer,60000}],
> state = down,
> started = {1369,41606,330086},
> peer = <0.1014.0>},
> #watchdog{pid = <0.3533.0>,type = accept,
> ref = #Ref<0.0.0.1696>,
> options = [{transport_module,diameter_tcp},
> {transport_config,[{reuseaddr,true},
> {ip,{0,0,0,0}},
> {port,4068}]},
> {capabilities_cb,[#Fun<diameterNode.acceptCER.2>]},
> {watchdog_timer,30000},
> {reconnect_timer,60000}],
> state = initial,
> started = {1369,42342,845898},
> peer = false}]
>
> Watchdog and transport process are not alive:
>
>> erlang:is_process_alive(list_to_pid("<0.1816.0>")).
> false
>> erlang:is_process_alive(list_to_pid("<0.1817.0>")).
> false
>
>
> I suggest following change in code to correct this problem (file
> diameter_service.erl):
>
> $ diff diameter_service.erl diameter_service.erl_ok
> 1006c1006
> < connection_down(#watchdog{state = WS,
> ---
>> connection_down(#watchdog{state = ?WD_OKAY,
> 1015,1017c1015,1021
> < ?WD_OKAY == WS
> < andalso
> < connection_down(Wd, fetch(PeerT, TPid), S).
> ---
>> connection_down(Wd, fetch(PeerT, TPid), S);
>>
>> connection_down(#watchdog{},
>> To,
>> #state{})
>> when is_atom(To) ->
>> ok.
>
> You can find this solution in attachement.
>
> Regards
> Aleksander Nycz
>
>
> --
> Aleksander Nycz
> Senior Software Engineer
> Telco_021 BSS R&D
> Comarch SA
> Phone: +48 12 646 1216
> Mobile: +48 691 464 275
> website: www.comarch.pl
>
>
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://erlang.org/mailman/listinfo/erlang-bugs
>
More information about the erlang-bugs
mailing list