[erlang-questions] Deterministic destruction of NIF resources

Sat Jul 16 10:06:02 CEST 2016

My apologies, I've not used trace flags before.

Anyway, by tweaking the trace to [exiting, procs, {tracer, TPid}], I
*don't* see any 'in_exiting' or 'out_exiting' events; I *do* see
'out_exited' events, but they happen (more-or-less) immediately after
the 'exit' events, but the resource destructor is still delayed w.r.t.
the 'out_exited' events.

Note that I'm still using my slightly dodgy mechanism for filtering
the trace to only the processes I'm interested in. I tried to use the
'set_on_spawn' flag, and specify the process to be traced as
whereis(my_sup), but I couldn't get that to work.

In my testing, the destructor is delayed by a few seconds (when
repeatedly -- manually, so not particularly quickly -- restarting one
test process). In production, we're seeing the destructor being
delayed an arbitrarily long period, which is kinda important, because
the destructor is responsible for killing a background thread, and
we'd prefer it not to continue after the associated Erlang process has
died.

As far as I can tell from reading the OTP source, it looks like
resources are allocated from the binary heap, rather than any
particular process heap, which suggests to me that they'll be
collected when the binary heap undergoes a GC. What triggers that?

On 15 July 2016 at 22:29, Lukas Larsson <garazdawi@REDACTED> wrote:
> Apparently I wasn't clear enough. You want to look at the 'exiting' trace
> flag, not the 'procs' flags. i.e. see this code:
>
>
>> go() ->
>>     TFun = fun TFun() ->
>>                    receive
>>                        Other ->
>>                            io:format("~p~n",[Other]),
>>                            TFun()
>>                    end
>>            end,
>>     TPid = spawn(TFun),
>>     erlang:trace(self(), true, [set_on_spawn, procs, exiting, {tracer,
>> TPid}]),
>>     Pid = spawn(fun loop/0),
>>     timer:sleep(100),
>>     exit(Pid, die).
>> loop() ->
>>     [ets:new(hej,[]) || _ <- lists:seq(1,100)],
>>     receive after infinity -> ok end.
>
>
> when you run Mod:go() you will get a bunch of printouts:
>
>> 1> trace:go().
>> {trace,<0.33.0>,spawn,<0.36.0>,{erlang,apply,[#Fun<trace.1.87118278>,[]]}}
>> {trace,<0.36.0>,in_exiting,0}
>> {trace,<0.36.0>,exit,die}
>> true
>> {trace,<0.36.0>,out_exiting,0}
>> {trace,<0.36.0>,in_exiting,0}
>> {trace,<0.36.0>,out_exiting,0}
>> {trace,<0.36.0>,in_exiting,0}
>> {trace,<0.36.0>,out_exiting,0}
>> {trace,<0.36.0>,in_exiting,0}
>> {trace,<0.36.0>,out_exiting,0}
>> {trace,<0.36.0>,in_exiting,0}
>> {trace,<0.36.0>,out_exiting,0}
>> {trace,<0.36.0>,in_exiting,0}
>> {trace,<0.36.0>,out_exited,0}
>
>
> as you can see, the 'exit' trace message is sent relatively early, but the
> heap of the process is not cleaned up until when the 'out_exited' trace
> message is sent. In my example the process gets all of these extra schedules
> after it has decided to die in order to clean up the ets tables that is
> responsible for. There are a bunch of reasons why this may happen, not just
> ets table cleanup. What the reason is cannot be observed without
> instrumenting the vm.
>
> On Fri, Jul 15, 2016 at 9:05 PM, Roger Lipscombe <roger@REDACTED>
> wrote:
>>
>> Assuming I'm tracing correctly (see below), that's not the cause. I
>> see the 'exit' trace events fairly soon after the process dies, but I
>> don't see the resource destructor being called until later, and
>> usually in a clump:
>>
>> Alloc wrapper resource 0x2b038eb01620
>> sup <0.2167.0> spawned <0.3013.0>
>> Alloc wrapper resource 0x2b038ec80328
>> <0.3013.0> exit {timeout,moo}
>> sup <0.2167.0> spawned <0.3049.0>
>> Alloc wrapper resource 0x2b038ec801e0
>> Free wrapper resource 0x2b038ec80328
>> Free wrapper resource 0x2b038eb01620
>> <0.3049.0> exit {timeout,moo}
>> sup <0.2167.0> spawned <0.15256.0>
>> Alloc wrapper resource 0x2b038ea80c70
>> <0.15256.0> exit {timeout,moo}
>> sup <0.2167.0> spawned <0.17734.0>
>> Alloc wrapper resource 0x2b038eac7de0
>> <0.17734.0> exit {timeout,moo}
>> sup <0.2167.0> spawned <0.18825.0>
>> Alloc wrapper resource 0x2b038eb48098
>> <0.18825.0> exit {timeout,moo}
>> sup <0.2167.0> spawned <0.19267.0>
>> Alloc wrapper resource 0x2b038eb00260
>> <0.19267.0> exit {timeout,moo}
>> sup <0.2167.0> spawned <0.19616.0>
>> Alloc wrapper resource 0x2b038ea80b88
>> <0.19616.0> exit {timeout,moo}
>> sup <0.2167.0> spawned <0.20068.0>
>> Alloc wrapper resource 0x2b038eac2930
>> <0.20068.0> exit {timeout,moo}
>> sup <0.2167.0> spawned <0.20424.0>
>> Alloc wrapper resource 0x2b038eac1328
>> <0.20424.0> exit {timeout,moo}
>> sup <0.2167.0> spawned <0.20649.0>
>> Alloc wrapper resource 0x2b038eac2aa0
>> <0.20649.0> exit {timeout,moo}
>> sup <0.2167.0> spawned <0.20663.0>
>> Alloc wrapper resource 0x2b038ea84118
>> Free wrapper resource 0x2b038eac2aa0
>> Free wrapper resource 0x2b038eac1328
>> Free wrapper resource 0x2b038eac2930
>> Free wrapper resource 0x2b038ea80b88
>> Free wrapper resource 0x2b038eb00260
>> Free wrapper resource 0x2b038eb48098
>> Free wrapper resource 0x2b038eac7de0
>> Free wrapper resource 0x2b038ea80c70
>> Free wrapper resource 0x2b038ec801e0
>> <0.20663.0> exit {timeout,moo}
>> sup <0.2167.0> spawned <0.20679.0>
>> Alloc wrapper resource 0x2b038ea80b88
>>
>> The "... wrapper resource ..." lines are from the NIF; the other lines
>> are from tracing:
>>
>> SupPid = whereis(agents_sup).
>>
>> TFun = fun TFun(State) ->
>>         receive
>>             {trace, ByPid, spawn, Pid, _} when ByPid =:= SupPid ->
>>                 io:format("sup ~p spawned ~p~n", [ByPid, Pid]),
>>                 State2 = [Pid | State],
>>                 TFun(State2);
>>             {trace, Pid, exit, Reason} = M ->
>>                 case lists:member(Pid, State) of
>>                     true ->
>>                         io:format("~p exit ~p~n", [Pid, Reason]);
>>                     _ ->
>>                         ok
>>                 end,
>>                 TFun(State);
>>             Other ->
>>                 TFun(State)
>>         end
>>     end.
>> TPid = spawn(fun() -> TFun([]) end).
>> erlang:trace(all, true, [procs, {tracer, TPid}]).
>>
>>
>> On 15 July 2016 at 15:53, Lukas Larsson <garazdawi@REDACTED> wrote:
>> > How do you know when the process dies?
>> >
>> > If you aren't already you should use the trace flag 'exiting'
>> > (http://erlang.org/doc/man/erlang.html#trace-3) to figure out when a
>> > process
>> > actually has died. ERTS will delay process death after you send it a
>> > termination signal for various reasons.
>> >
>> > On Fri, Jul 15, 2016 at 4:39 PM, Roger Lipscombe
>> > <roger@REDACTED>
>> > wrote:
>> >>
>> >> Sorry, let me rephrase: they're not garbage collected *when* that
>> >> process dies. The destructor is sometimes called immediately, and
>> >> sometimes at some point in the future. Can I make this deterministic?
>> >>
>> >> And, no, the resource has not been sent to another process or stored in
>> >> ETS.
>> >>
>> >> On 15 July 2016 at 15:10, Sergej Jurečko <sergej.jurecko@REDACTED>
>> >> wrote:
>> >> >
>> >> > On 15 Jul 2016, at 15:57, Roger Lipscombe <roger@REDACTED>
>> >> > wrote:
>> >> >
>> >> > because they're not garbage collected when that
>> >> > process dies.
>> >> >
>> >> >
>> >> > Then there is an outstanding reference to the resource. Has it been
>> >> > sent
>> >> > to
>> >> > another process or stored in an ETS?
>> >> >
>> >> >
>> >> > Sergej
>> >> _______________________________________________
>> >> erlang-questions mailing list
>> >> erlang-questions@REDACTED
>> >> http://erlang.org/mailman/listinfo/erlang-questions
>> >
>> >
>
>