[erlang-questions] Suspending Erlang Processes

Thu Oct 17 14:34:57 CEST 2019

Hi Kenneth,

Thanks for your reply. 

I wanted different tracers since I would like to target specific processes, and not just all of them. I had initially thought about your approach. However I’m afraid that for my case, having a central tracer that would distribute the different trace events to different monitors (according to the originating PID) could mean that I have to collect trace events even for processes that I don’t need to monitor (I would just filter these out, albeit this is extra processing). 

My idea was to keep tracing to a minimum for the sake of performance by tracing processes selectively (and dynamically at runtime). A second shortcoming I see with this approach is that potentially, the central tracer might at times experience high loads due to the trace events that could collect in its mailbox while it is busy routing or filtering trace events: this in turn could keep recipient monitors waiting longer than necessary just to receive a handful of trace events. I suspect that this might also impact memory consumption. What is more, in the case where the central tracer fails, all events would be lost. 

Separate tracers could mitigate these two issues, since they are less likely to create a hotspot, and independent tracers may fail without hampering the progress of other tracers/monitors.

With reference to my previous question, "I wonder whether such a BIF could be realisable, such that the ownership of tracing can be transferred atomically between tracers without incurring any loss of trace events”, would such a BIF be possible to implement in the foreseeable future?

Best,
Duncan

> On 17 Oct 2019, at 09:48, Kenneth Lundin <kenneth@REDACTED> wrote:
> 
> Hi Duncan,
> 
> My initial thought was that why do you need many tracers to achieve what you want. 
> What do you say about the approach of having 1 global tracer which act as a dispatcher to all your monitors.
> As all trace messages have information about the process from which the trace event originates you can do a mapping between
> the message and the monitor and then distribute the message to that monitor which could act in the same way as if it was the tracer for a specific process or process group.
> 
> /Regards Kenneth
> 
> On Thu, Oct 17, 2019 at 9:01 AM Duncan Paul Attard <duncan.attard.01@REDACTED <mailto:duncan.attard.01@REDACTED>> wrote:
> Hi Kenneth, Rickard,
> 
> I was wondering whether you have any suggestions regarding this please.
> 
> 
> All the best,
> 
> Duncan
> 
> 
>> On 03 Oct 2019, at 11:27, Duncan Paul Attard <duncan.attard.01@REDACTED <mailto:duncan.attard.01@REDACTED>> wrote:
>> 
>> Kenneth, Rickard,
>> 
>> Let me give you a bit of context. 
>> 
>> I’m working on a runtime verification (RV) tool that focusses on components systems in an asynchronous setting. I’ve chosen Erlang because it nicely models this setting and also facilitates certain aspects in the development of said tool. Very briefly, in RV, the concept is that of instrumenting the system with other processes (called monitors in the RV community, but have nothing to do with Erlang monitors) that analyse the parts of the system (e.g., one process or a group of them, which I will refer to as a "system component") to detect and flag the infringement of some property specified over the component.
>> 
>> These properties (which are written using a high-level logic such as Linear Temporal Logic, Hennessy-Milner Logic, etc.), define things like “Process P cannot send message M to Q when such and such condition arises” or “Process P must exit when a particular message M is sent to it”, etc. A monitor, or rather, the monitor source code, is synthesised from a property and “attached” to the component to be monitored. The following is more or less the general workflow:
>> 
>> 1. A property is written in a text file using one of the logics mentioned;
>> 2. The property is parsed and compiled to generate the monitor (in Erlang source code, in my particular case);
>> 3. The monitor is spawned as a process that analyses a system component of interest as this executes.
>> 
>> The monitor needs to somehow acquire the runtime events emitted by processes, and this it does via the built-in Erlang tracing (i.e., the monitor is itself a tracer process). The important thing to note is that the monitors, despite being processes themselves, may be considered as a meta-layer over the system, and therefore, do not technically form part of the “ordinary” implementation of the system. This means that monitors can be introduced or removed from the system as needed, and merely function as a second layer that strives to observe the system with *minimal* interference. 
>> 
>> This brings me to Kenneth’s point, that tracing is a tool intended for debugging/profiling purposes. I agree, and in fact, RV might be considered as a flavour of debugging or profiling that is done at runtime. It differs (amongst other things) from debugging and profiling, in that monitors are the product of autogenerated code resulting from *formal* logical properties. From what I gather, debugging or profiling obtains trace events in a similar way to the one I’m using for monitoring. I also understand and agree with you Kenneth that, if a system process is being monitored by one of my monitors, then it cannot be profiled or debugged due to the one-tracer limit imposed by the EVM. 
>> 
>> Also, the reason I’m not using ‘dbg' but 'erlang:trace/3’ directly is that I want the full flexibility of tracing (I might require it in later stages of my research). Way back when I started the project I was not aware of the extent of the functionality ‘dbg’ offers, and so to play it safe (and after reading Francesco and Simon’s book), I decided to go for the tracing BIFs.
>> 
>> Finally, the reason I require different tracers (in my case, monitors) for different system processes (or groups) is that it makes the specification of correctness properties much more manageable. The gist of the idea is that it is far easier to specify a property over a restricted set of processes (e.g., just one process which exhibits *sequential* execution) than it is for a large number of processes, as then the property needs to account for all the possible interleavings of trace events exhibited by different processes. So in a sense, different monitors over different system components allow me to partition and view the otherwise whole trace of the system as a collection of separate traces for different components. Naturally, the monitors generated from smaller properties tend to be small and lightweight themselves, and are easier to work with. Moreover, this allows me to switch off certain monitors dynamically at runtime for system components that might not require monitoring anymore, while leaving others on. 
>> 
>> Since a system can be viewed as always starting from one root process, I attach (i.e., start tracing) a special root monitor to this system root process. The root monitor creates new monitors on the fly for certain child processes that are spawned by the root system process. Now, to collect trace events without loss, the root monitor is configured with ’set_on_spawn’, meaning that new children of the root system process are automatically traced by the root monitor at first. To spawn a dedicated monitor ‘Mon_C' for some child process ‘C’, the following is executed:
>> 
>> 1. Root monitor ‘Mon_R’ is currently tracing the new child process ‘C’ ('set_on_spawn' flag was set on 'Mon_R');
>> 2. The new monitor ‘Mon_C’ created for child process ‘C’ switches tracing *off* for ‘C’ (i.e., erlang:trace(Pid_C, false, ..)), so the (previous) monitor ‘Mon_R' stops being the tracer of ‘C’;
>> 3. New monitor ‘Mon_C’ switches tracing back *on* for child process ‘C’ and becomes its new tracer.
>> 
>> To minimise trace event loss between steps 2 and 3, I was thinking of suspending child process ‘C' before step 2, and resuming it after step 3 This way, ‘C’ is at least blocked, and cannot spawn new processes itself or send messages. I cannot however prevent other processes from sending ‘C' messages, meaning that there might be a chance of ‘receive’ events being lost in the space of time between steps 2 and 3. Therefore, my suggestion still does not banish the problem but merely mitigates it, as steps 2 and 3 do not happen atomically. I wonder whether such a BIF could be realisable, such that the ownership of tracing can be transferred atomically between tracers without incurring any loss of trace events (between monitors ‘Mon_R’ and ‘Mon_C’ in my case).
>> 
>> FYI, much of the work I’ve discussed has already been published in a previous paper we’ve written in the past. The paper can be found here: http://staff.um.edu.mt/afra1/papers/sefm17.pdf <http://staff.um.edu.mt/afra1/papers/sefm17.pdf>. If you’re interested please let me know.
>> 
>> Many thanks for your help!
>> Duncan
>> 
>> 
>> 
>> 
>>> On 02 Oct 2019, at 09:11, Kenneth Lundin <kenneth@REDACTED <mailto:kenneth@REDACTED>> wrote:
>>> 
>>> As a follow up on Rickards answer I think it would be interesting if you can explain why you want different tracers per process?
>>> If we know what problem you want to solve we can most probably come with better suggestions.
>>> 
>>> I also recommend that you use tracing via the dbg module which is intended to be a more user friendly API towards tracing. The trace BIFs might give some more detailed control but dbg has support for most use cases and makes it easier to do the right thing, at least that is the intention.
>>> 
>>> Also worth mentioning is that the tracing mechanisms are really not intended to use to achieve a certain functionality which is part of the application, they are intended to be used temporarily for debugging/profiling purposes. Since there is only one tracer at the time the use of tracing as part of the "ordinary" implementation of an application there will be conflicts as soon as any tracing or profiling is needed and probably the intended functionality of the application will then be broken.
>>> 
>>> /Kenneth, Erlang/OTP Ericsson
>>> 
>>> On Tue, Oct 1, 2019 at 10:07 PM Rickard Green <rickard@REDACTED <mailto:rickard@REDACTED>> wrote:
>>> 
>>> 
>>> On Mon, Sep 30, 2019 at 1:57 PM Duncan Paul Attard <duncan.attard.01@REDACTED <mailto:duncan.attard.01@REDACTED>> wrote:
>>> >
>>> > I am tracing an Erlang process, say, `P` by invoking the BIF `erlang:trace(Pid_P, true, [set_on_spawn, procs, send, 'receive'])` from some process. As per the Erlang docs, the latter process becomes the tracer for `P`, which I shall call `Trc_Q`.
>>> >
>>> > Suppose now, that process `P` spawns a new process `Q`. Since the flag `set_on_spawn` was specified in the call to `erlang:trace/3` above, `Q` will automatically be traced by `Trc_P` as well.
>>> >
>>> > ---
>>> >
>>> > I want to spawn a **new** tracer, `Trc_Q`, and transfer the ownership of tracing `Q` to it, so that the resulting configuration will be that of process `P` being traced by tracer `Trc_P`, `Q` by `Trc_Q`.
>>> >
>>> 
>>> Unfortunately I do not have any ideas on how to accomplish this.
>>> 
>>> > However, Erlang permits **at most** one tracer per process, so I cannot achieve said configuration by invoking `erlang:trace(Pid_Q, true, ..)` from `Trc_Q`. The only way possible is to do it in two steps:
>>> >
>>> > 1. Tracer `Trc_Q` calls `erlang:trace(Pid_Q, false, ..)` to stop `Trc_P` from tracing `Q`;
>>> > 2. `Trc_Q` calls `erlang:trace(Pid_Q, true, ..)` again to start tracing `Q`.
>>> >
>>> > In the time span between steps **1.** and **2.** above, it might be possible that trace events by process `Q` are **lost** because at that moment, there is no tracer attached. One way of mitigating this is to perform the following:
>>> >
>>> > 1. Suspend process `Q` by calling `erlang:suspend_process(Pid_Q)` from `Trc_Q` (note that as per Erlang docs, `Trc_Q` remains blocked until `Q` is eventually suspended by the VM);
>>> > 2. `Trc_Q` calls `erlang:trace(Pid_Q, false, ..)` to stop `Trc_P` from tracing `Q`;
>>> > 3. `Trc_Q` calls `erlang:trace(Pid_Q, true, ..)` again to start tracing `Q`;
>>> > 4. Finally, `Trc_Q` calls `erlang:resume_process(Pid_Q)` so that `Q` can continue executing.
>>> >
>>> > From what I was able to find out, while `Q` is suspended, messages sent to it are queued, and when resumed, `Trc_Q` receives the `{trace, Pid_Q, receive, Msg}` trace events accordingly without any loss.
>>> >
>>> 
>>> This is not a feature, it is a bug (introduced in erts 10.0, OTP 21.0) that will be fixed. The trace message should have been delivered even though the receiver was suspended.
>>> 
>>> You cannot even rely on this behavior while this bug is present. If you (or any process in the system) send the suspended process a non-message signal (monitor, demonitor, link, unlink, exit, process_info, ...), the bug will be bypassed and the trace message will be delivered.
>>> 
>>> > However, I am hesitant to use suspend/resume, since the Erlang docs explicitly say that these are to be used for *debugging purposes only*.
>>> 
>>> Mission accomplished! :-)
>>> 
>>> > Any idea as to why this is the case?
>>> >
>>> 
>>> The language was designed with other communication primitives intended for use. Suspend/Resume was explicitly introduced for debugging purposes only, and not for usage by ordinary Erlang programs. They will most likely not disappear, but debug functionality in general are not treated as carefully by us at OTP as other ordinary functionality with regards to compatibility, etc. We for example removed the automatic deadlock prevention in suspend_process() that existed prior to erts 10.0 due to performance reasons.
>>> 
>>> Regards,
>>> Rickard
>>> --
>>> Rickard Green, Erlang/OTP, Ericsson AB
>>> _______________________________________________
>>> erlang-questions mailing list
>>> erlang-questions@REDACTED <mailto:erlang-questions@REDACTED>
>>> http://erlang.org/mailman/listinfo/erlang-questions <http://erlang.org/mailman/listinfo/erlang-questions>
>>> _______________________________________________
>>> erlang-questions mailing list
>>> erlang-questions@REDACTED <mailto:erlang-questions@REDACTED>
>>> http://erlang.org/mailman/listinfo/erlang-questions <http://erlang.org/mailman/listinfo/erlang-questions>
>> 
> 
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED <mailto:erlang-questions@REDACTED>
> http://erlang.org/mailman/listinfo/erlang-questions <http://erlang.org/mailman/listinfo/erlang-questions>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20191017/17770313/attachment.htm>