[erlang-questions] Suspending Erlang Processes

Thu Oct 17 09:48:30 CEST 2019

Hi Duncan,

My initial thought was that why do you need many tracers to achieve what
you want.
What do you say about the approach of having 1 global tracer which act as a
dispatcher to all your monitors.
As all trace messages have information about the process from which the
trace event originates you can do a mapping between
the message and the monitor and then distribute the message to that monitor
which could act in the same way as if it was the tracer for a specific
process or process group.

/Regards Kenneth

On Thu, Oct 17, 2019 at 9:01 AM Duncan Paul Attard <
duncan.attard.01@REDACTED> wrote:

> Hi Kenneth, Rickard,
>
> I was wondering whether you have any suggestions regarding this please.
>
>
> All the best,
>
> Duncan
>
>
> On 03 Oct 2019, at 11:27, Duncan Paul Attard <duncan.attard.01@REDACTED>
> wrote:
>
> Kenneth, Rickard,
>
> Let me give you a bit of context.
>
> I’m working on a runtime verification (RV) tool that focusses on
> components systems in an asynchronous setting. I’ve chosen Erlang because
> it nicely models this setting and also facilitates certain aspects in the
> development of said tool. Very briefly, in RV, the concept is that of
> instrumenting the system with other processes (called monitors in the RV
> community, but have nothing to do with Erlang monitors) that analyse the
> parts of the system (e.g., one process or a group of them, which I will
> refer to as a "system component") to detect and flag the infringement of
> some property specified over the component.
>
> These properties (which are written using a high-level logic such as
> Linear Temporal Logic, Hennessy-Milner Logic, etc.), define things like
> “Process P cannot send message M to Q when such and such condition arises”
> or “Process P must exit when a particular message M is sent to it”, etc. A
> monitor, or rather, the monitor source code, is synthesised from a property
> and “attached” to the component to be monitored. The following is more or
> less the general workflow:
>
> 1. A property is written in a text file using one of the logics mentioned;
> 2. The property is parsed and compiled to generate the monitor (in Erlang
> source code, in my particular case);
> 3. The monitor is spawned as a process that analyses a system component of
> interest as this executes.
>
> The monitor needs to somehow acquire the runtime events emitted by
> processes, and this it does via the built-in Erlang tracing (i.e., the
> monitor is itself a tracer process). The important thing to note is that
> the monitors, despite being processes themselves, may be considered as a
> meta-layer over the system, and therefore, do not technically form part of
> the “ordinary” implementation of the system. This means that monitors can
> be introduced or removed from the system as needed, and merely function as
> a second layer that strives to observe the system with *minimal*
> interference.
>
> This brings me to Kenneth’s point, that tracing is a tool intended for
> debugging/profiling purposes. I agree, and in fact, RV might be considered
> as a flavour of debugging or profiling that is done at runtime. It differs
> (amongst other things) from debugging and profiling, in that monitors are
> the product of autogenerated code resulting from *formal* logical
> properties. From what I gather, debugging or profiling obtains trace events
> in a similar way to the one I’m using for monitoring. I also understand and
> agree with you Kenneth that, if a system process is being monitored by one
> of my monitors, then it cannot be profiled or debugged due to the
> one-tracer limit imposed by the EVM.
>
> Also, the reason I’m not using ‘dbg' but 'erlang:trace/3’ directly is that
> I want the full flexibility of tracing (I might require it in later stages
> of my research). Way back when I started the project I was not aware of the
> extent of the functionality ‘dbg’ offers, and so to play it safe (and after
> reading Francesco and Simon’s book), I decided to go for the tracing BIFs.
>
> Finally, the reason I require different tracers (in my case, monitors) for
> different system processes (or groups) is that it makes the specification
> of correctness properties much more manageable. The gist of the idea is
> that it is far easier to specify a property over a restricted set of
> processes (e.g., just one process which exhibits *sequential* execution)
> than it is for a large number of processes, as then the property needs to
> account for all the possible interleavings of trace events exhibited by
> different processes. So in a sense, different monitors over different
> system components allow me to partition and view the otherwise whole trace
> of the system as a collection of separate traces for different components.
> Naturally, the monitors generated from smaller properties tend to be small
> and lightweight themselves, and are easier to work with. Moreover, this
> allows me to switch off certain monitors dynamically at runtime for system
> components that might not require monitoring anymore, while leaving others
> on.
>
> Since a system can be viewed as always starting from one root process, I
> attach (i.e., start tracing) a special root monitor to this system root
> process. The root monitor creates new monitors on the fly for certain child
> processes that are spawned by the root system process. Now, to collect
> trace events without loss, the root monitor is configured with
> ’set_on_spawn’, meaning that new children of the root system process are
> automatically traced by the root monitor at first. To spawn a dedicated
> monitor ‘Mon_C' for some child process ‘C’, the following is executed:
>
> 1. Root monitor ‘Mon_R’ is currently tracing the new child process ‘C’
> ('set_on_spawn' flag was set on 'Mon_R');
> 2. The new monitor ‘Mon_C’ created for child process ‘C’ switches tracing
> *off* for ‘C’ (i.e., erlang:trace(Pid_C, false, ..)), so the (previous)
> monitor ‘Mon_R' stops being the tracer of ‘C’;
> 3. New monitor ‘Mon_C’ switches tracing back *on* for child process ‘C’
> and becomes its new tracer.
>
> To minimise trace event loss between steps 2 and 3, I was thinking of
> suspending child process ‘C' before step 2, and resuming it after step 3
> This way, ‘C’ is at least blocked, and cannot spawn new processes itself or
> send messages. I cannot however prevent other processes from sending ‘C'
> messages, meaning that there might be a chance of ‘receive’ events being
> lost in the space of time between steps 2 and 3. Therefore, my suggestion
> still does not banish the problem but merely mitigates it, as steps 2 and 3
> do not happen atomically. I wonder whether such a BIF could be realisable,
> such that the ownership of tracing can be transferred atomically between
> tracers without incurring any loss of trace events (between monitors
> ‘Mon_R’ and ‘Mon_C’ in my case).
>
> FYI, much of the work I’ve discussed has already been published in a
> previous paper we’ve written in the past. The paper can be found here:
> http://staff.um.edu.mt/afra1/papers/sefm17.pdf. If you’re interested
> please let me know.
>
> Many thanks for your help!
> Duncan
>
>
>
>
> On 02 Oct 2019, at 09:11, Kenneth Lundin <kenneth@REDACTED> wrote:
>
> As a follow up on Rickards answer I think it would be interesting if you
> can explain why you want different tracers per process?
> If we know what problem you want to solve we can most probably come with
> better suggestions.
>
> I also recommend that you use tracing via the dbg module which is intended
> to be a more user friendly API towards tracing. The trace BIFs might give
> some more detailed control but dbg has support for most use cases and makes
> it easier to do the right thing, at least that is the intention.
>
> Also worth mentioning is that the tracing mechanisms are really not
> intended to use to achieve a certain functionality which is part of the
> application, they are intended to be used temporarily for
> debugging/profiling purposes. Since there is only one tracer at the time
> the use of tracing as part of the "ordinary" implementation of an
> application there will be conflicts as soon as any tracing or profiling is
> needed and probably the intended functionality of the application will then
> be broken.
>
> /Kenneth, Erlang/OTP Ericsson
>
> On Tue, Oct 1, 2019 at 10:07 PM Rickard Green <rickard@REDACTED> wrote:
>
>>
>>
>> On Mon, Sep 30, 2019 at 1:57 PM Duncan Paul Attard <
>> duncan.attard.01@REDACTED> wrote:
>> >
>> > I am tracing an Erlang process, say, `P` by invoking the BIF
>> `erlang:trace(Pid_P, true, [set_on_spawn, procs, send, 'receive'])` from
>> some process. As per the Erlang docs, the latter process becomes the tracer
>> for `P`, which I shall call `Trc_Q`.
>> >
>> > Suppose now, that process `P` spawns a new process `Q`. Since the flag
>> `set_on_spawn` was specified in the call to `erlang:trace/3` above, `Q`
>> will automatically be traced by `Trc_P` as well.
>> >
>> > ---
>> >
>> > I want to spawn a **new** tracer, `Trc_Q`, and transfer the ownership
>> of tracing `Q` to it, so that the resulting configuration will be that of
>> process `P` being traced by tracer `Trc_P`, `Q` by `Trc_Q`.
>> >
>>
>> Unfortunately I do not have any ideas on how to accomplish this.
>>
>> > However, Erlang permits **at most** one tracer per process, so I cannot
>> achieve said configuration by invoking `erlang:trace(Pid_Q, true, ..)` from
>> `Trc_Q`. The only way possible is to do it in two steps:
>> >
>> > 1. Tracer `Trc_Q` calls `erlang:trace(Pid_Q, false, ..)` to stop
>> `Trc_P` from tracing `Q`;
>> > 2. `Trc_Q` calls `erlang:trace(Pid_Q, true, ..)` again to start tracing
>> `Q`.
>> >
>> > In the time span between steps **1.** and **2.** above, it might be
>> possible that trace events by process `Q` are **lost** because at that
>> moment, there is no tracer attached. One way of mitigating this is to
>> perform the following:
>> >
>> > 1. Suspend process `Q` by calling `erlang:suspend_process(Pid_Q)` from
>> `Trc_Q` (note that as per Erlang docs, `Trc_Q` remains blocked until `Q` is
>> eventually suspended by the VM);
>> > 2. `Trc_Q` calls `erlang:trace(Pid_Q, false, ..)` to stop `Trc_P` from
>> tracing `Q`;
>> > 3. `Trc_Q` calls `erlang:trace(Pid_Q, true, ..)` again to start tracing
>> `Q`;
>> > 4. Finally, `Trc_Q` calls `erlang:resume_process(Pid_Q)` so that `Q`
>> can continue executing.
>> >
>> > From what I was able to find out, while `Q` is suspended, messages sent
>> to it are queued, and when resumed, `Trc_Q` receives the `{trace, Pid_Q,
>> receive, Msg}` trace events accordingly without any loss.
>> >
>>
>> This is not a feature, it is a bug (introduced in erts 10.0, OTP 21.0)
>> that will be fixed. The trace message should have been delivered even
>> though the receiver was suspended.
>>
>> You cannot even rely on this behavior while this bug is present. If you
>> (or any process in the system) send the suspended process a non-message
>> signal (monitor, demonitor, link, unlink, exit, process_info, ...), the bug
>> will be bypassed and the trace message will be delivered.
>>
>> > However, I am hesitant to use suspend/resume, since the Erlang docs
>> explicitly say that these are to be used for *debugging purposes only*.
>>
>> Mission accomplished! :-)
>>
>> > Any idea as to why this is the case?
>> >
>>
>> The language was designed with other communication primitives intended
>> for use. Suspend/Resume was explicitly introduced for debugging purposes
>> only, and not for usage by ordinary Erlang programs. They will most likely
>> not disappear, but debug functionality in general are not treated as
>> carefully by us at OTP as other ordinary functionality with regards to
>> compatibility, etc. We for example removed the automatic deadlock
>> prevention in suspend_process() that existed prior to erts 10.0 due to
>> performance reasons.
>>
>> Regards,
>> Rickard
>> --
>> Rickard Green, Erlang/OTP, Ericsson AB
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
>>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20191017/2d9367a2/attachment.htm>