[erlang-questions] Suspending Erlang Processes

Duncan Paul Attard duncan.attard.01@REDACTED
Wed Oct 2 08:41:39 CEST 2019


Thanks for the explanation and for pointing the bug out. So it seems to me that there is no way to stop ‘receive’ trace events from being generated, despite the use of suspend. I guess this stems out from the asynchronous nature of the actor model.

> The language was designed with other communication primitives intended for use. Suspend/Resume was explicitly introduced for debugging purposes only, and not for usage by ordinary Erlang programs. They will most likely not disappear, but debug functionality in general are not treated as carefully by us at OTP as other ordinary functionality with regards to compatibility, etc. We for example removed the automatic deadlock prevention in suspend_process() that existed prior to erts 10.0 due to performance reasons.


I understand and do agree that synchronisation in Erlang, and in general, the actor model is modelled via message exchanges only, and that utilising other primitives such as suspend and resume does not adhere to this model. 

Yet, in my particular use case (runtime verification), I am restricting myself to systems/processes that *cannot* be instrumented with additional instructions (in this case, receive clauses) so as to block their execution at specific points. Thus, the only way left to me would be to suspend a process whilst it is executing, *without* my having the knowledge of what instruction the process in question is executing. To give you a bit of context, I am creating a monitoring system ‘M' that is layered on top of a given system that one wishes to monitor, 'S'. ‘M' observes the execution of ’S’ via EVM tracing to try and detect infringements of certain logical properties specified over ’S’.

The docs mention that suspend and resume are reserved for debugging purposes, and like you said in your reply, draws attention to the fact that careless use of these two functions can lead to inadvertent deadlocks. You mentioned also that automatic deadlock detection has been removed, hinting that that the implementation of suspend and resume might change in future releases of Erlang. I understand that. Besides this however, is there any other reason that suspend and resume should *not* be used? For instance, would executing suspend at any point, say, mess up the internal state of the suspended process? This question is in light of what I said above, namely that I would suspend a process whilst it is executing without having knowledge of what instruction the suspendee is executing. ‘suspend_process/1’ blocks the suspender until suspendee is eventually suspended: does "eventually suspended" mean that it is safe to assume that the VM brings suspendee to a state where it is ok to suspend it? 

And out of sheer curiosity, is a suspendee suspended as soon as possible, or does the scheduler execute its remaining number of reductions before suspending it and returns control back to the suspender?

Once again, thanks a lot for your kind help Rickard. 

Best regards,
Duncan.







> On 01 Oct 2019, at 22:06, Rickard Green <rickard@REDACTED> wrote:
> 
> 
> 
> On Mon, Sep 30, 2019 at 1:57 PM Duncan Paul Attard <duncan.attard.01@REDACTED <mailto:duncan.attard.01@REDACTED>> wrote:
> >
> > I am tracing an Erlang process, say, `P` by invoking the BIF `erlang:trace(Pid_P, true, [set_on_spawn, procs, send, 'receive'])` from some process. As per the Erlang docs, the latter process becomes the tracer for `P`, which I shall call `Trc_Q`.
> >
> > Suppose now, that process `P` spawns a new process `Q`. Since the flag `set_on_spawn` was specified in the call to `erlang:trace/3` above, `Q` will automatically be traced by `Trc_P` as well.
> >
> > ---
> >
> > I want to spawn a **new** tracer, `Trc_Q`, and transfer the ownership of tracing `Q` to it, so that the resulting configuration will be that of process `P` being traced by tracer `Trc_P`, `Q` by `Trc_Q`.
> >
> 
> Unfortunately I do not have any ideas on how to accomplish this.
> 
> > However, Erlang permits **at most** one tracer per process, so I cannot achieve said configuration by invoking `erlang:trace(Pid_Q, true, ..)` from `Trc_Q`. The only way possible is to do it in two steps:
> >
> > 1. Tracer `Trc_Q` calls `erlang:trace(Pid_Q, false, ..)` to stop `Trc_P` from tracing `Q`;
> > 2. `Trc_Q` calls `erlang:trace(Pid_Q, true, ..)` again to start tracing `Q`.
> >
> > In the time span between steps **1.** and **2.** above, it might be possible that trace events by process `Q` are **lost** because at that moment, there is no tracer attached. One way of mitigating this is to perform the following:
> >
> > 1. Suspend process `Q` by calling `erlang:suspend_process(Pid_Q)` from `Trc_Q` (note that as per Erlang docs, `Trc_Q` remains blocked until `Q` is eventually suspended by the VM);
> > 2. `Trc_Q` calls `erlang:trace(Pid_Q, false, ..)` to stop `Trc_P` from tracing `Q`;
> > 3. `Trc_Q` calls `erlang:trace(Pid_Q, true, ..)` again to start tracing `Q`;
> > 4. Finally, `Trc_Q` calls `erlang:resume_process(Pid_Q)` so that `Q` can continue executing.
> >
> > From what I was able to find out, while `Q` is suspended, messages sent to it are queued, and when resumed, `Trc_Q` receives the `{trace, Pid_Q, receive, Msg}` trace events accordingly without any loss.
> >
> 
> This is not a feature, it is a bug (introduced in erts 10.0, OTP 21.0) that will be fixed. The trace message should have been delivered even though the receiver was suspended.
> 
> You cannot even rely on this behavior while this bug is present. If you (or any process in the system) send the suspended process a non-message signal (monitor, demonitor, link, unlink, exit, process_info, ...), the bug will be bypassed and the trace message will be delivered.
> 
> > However, I am hesitant to use suspend/resume, since the Erlang docs explicitly say that these are to be used for *debugging purposes only*.
> 
> Mission accomplished! :-)
> 
> > Any idea as to why this is the case?
> >
> 
> The language was designed with other communication primitives intended for use. Suspend/Resume was explicitly introduced for debugging purposes only, and not for usage by ordinary Erlang programs. They will most likely not disappear, but debug functionality in general are not treated as carefully by us at OTP as other ordinary functionality with regards to compatibility, etc. We for example removed the automatic deadlock prevention in suspend_process() that existed prior to erts 10.0 due to performance reasons.
> 
> Regards,
> Rickard
> --
> Rickard Green, Erlang/OTP, Ericsson AB

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20191002/f787e890/attachment.htm>


More information about the erlang-questions mailing list