[erlang-questions] Suspending Erlang Processes

Duncan Paul Attard duncan.attard.01@REDACTED
Mon Sep 30 12:46:17 CEST 2019

I am tracing an Erlang process, say, `P` by invoking the BIF `erlang:trace(Pid_P, true, [set_on_spawn, procs, send, 'receive'])` from some process. As per the Erlang docs, the latter process becomes the tracer for `P`, which I shall call `Trc_Q`.

Suppose now, that process `P` spawns a new process `Q`. Since the flag `set_on_spawn` was specified in the call to `erlang:trace/3` above, `Q` will automatically be traced by `Trc_P` as well.


I want to spawn a **new** tracer, `Trc_Q`, and transfer the ownership of tracing `Q` to it, so that the resulting configuration will be that of process `P` being traced by tracer `Trc_P`, `Q` by `Trc_Q`.

However, Erlang permits **at most** one tracer per process, so I cannot achieve said configuration by invoking `erlang:trace(Pid_Q, true, ..)` from `Trc_Q`. The only way possible is to do it in two steps:

1. Tracer `Trc_Q` calls `erlang:trace(Pid_Q, false, ..)` to stop `Trc_P` from tracing `Q`;
2. `Trc_Q` calls `erlang:trace(Pid_Q, true, ..)` again to start tracing `Q`.

In the time span between steps **1.** and **2.** above, it might be possible that trace events by process `Q` are **lost** because at that moment, there is no tracer attached. One way of mitigating this is to perform the following:

1. Suspend process `Q` by calling `erlang:suspend_process(Pid_Q)` from `Trc_Q` (note that as per Erlang docs, `Trc_Q` remains blocked until `Q` is eventually suspended by the VM);
2. `Trc_Q` calls `erlang:trace(Pid_Q, false, ..)` to stop `Trc_P` from tracing `Q`;
3. `Trc_Q` calls `erlang:trace(Pid_Q, true, ..)` again to start tracing `Q`;
4. Finally, `Trc_Q` calls `erlang:resume_process(Pid_Q)` so that `Q` can continue executing.

From what I was able to find out, while `Q` is suspended, messages sent to it are queued, and when resumed, `Trc_Q` receives the `{trace, Pid_Q, receive, Msg}` trace events accordingly without any loss. 

I have one constraint which led me to look at suspend/resume process: I cannot modify the code of `P` or `Q`, so inserting `receive` expressions to block said processes is out of the question.

However, I am hesitant to use suspend/resume, since the Erlang docs explicitly say that these are to be used for *debugging purposes only*. Any idea as to why this is the case?



More information about the erlang-questions mailing list