inviso: (Latin) to go to see, visit, inspect, look at.
The Inviso trace system consists of one or several runtime components supposed to run on each Erlang node doing tracing and one control component which can run on any node with available processor power. Inviso may also be part of a higher layer trace tool. See the inviso-tool as an example. The implementation is spread out over the Runtime_tools and the Inviso Erlang/OTP applications. Erlang modules necessary to run the runtime component are located in Runtime_tools and therefore assumed to be available on any node. Even though Inviso is introduced with Erlang/OTP R11B the runtime component implementation is done with backward compatibility in mind. Meaning that it is possible to compile and run it on older Erlang/OTP releases.
This document describes the control and runtime components of the Inviso trace system.
Inviso is built on Erlang trace BIFs and standard linked in trace-port drivers for efficient trace message logging. This means that Inviso can not co-exist in runtime with any other trace tool using the trace BIFs.
This is a short step-by-step description of how tracing using Inviso can be done.
This "recepie" is valid also when tracing in a non-distributed environment. The only difference is that function calls not taking a node-name as argument are used. The runtime component will then of course run on the same node as the control component.
Simple example illustrating the above listed recipe. It traces on two nodes, node1 where the control component also runs. And node2 which is a remote node from the control components perspective. The example uses a mixture of API-calls specifying what nodes to trace on and API functions working on all added nodes. This is in this example interchangable since all to the control component known nodes are participating in the same way.
Eshell V5.5 (abort with ^G) (node1@hurin)1>application:start(runtime_tools). ok (node1@hurin)2> inviso:start(). {ok,<0.56.0>} (node1@hurin)3> inviso:add_nodes([node(),node2@hurin],mytag). {ok,[{'node1@hurin',{ok,new}}, {'node2@hurin',{ok,new}}]} (node1@hurin)4> inviso:init_tracing( [{node(),[{trace,{file,"tracefile_node1.log"}},{ti,{file,"trace_node1.ti"}}]}, {node2@hurin,[{trace,{file,"tracefile_node2.log"}},{ti,{file,"trace_node2.ti"}}]}]). {ok,[{'node1@hurin',{ok,[{trace_log,ok},{ti_log,ok}]}}, {'node2@hurin',{ok,[{trace_log,ok},{ti_log,ok}]}}]} (node1@hurin)5> inviso:tpm_localnames([node(),node2@hurin]). {ok,[{'node1@hurin',{{ok,1},{ok,1}}}, {'node2@hurin',{{ok,1},{ok,1}}}]} (node1@hurin)6> inviso:tpl([node(),node2@hurin],code,which,'_',[]). {ok,[{'node1@hurin',{ok,[2]}}, {'node2@hurin',{ok,[2]}}]} (node1@hurin)7> inviso:tf(all,[call,timestamp]). {ok,[{'node1@hurin',{ok,"/"}}, {'node2@hurin',{ok,"-"}}]} (node1@hurin)8> code:which(ordset). non_existing (node1@hurin)9> inviso:stop_tracing(). {ok,[{'node1@hurin',{ok,idle}}, {'node2@hurin',{ok,idle}}]} (node1@hurin)10> inviso:fetch_log([node2@hurin],".","aprefix_"). {ok,[{'node2@hurin', {complete,[{trace_log,[{ok,"aprefix_tracefile_node2.log"}]}, {ti_log,[{ok,"aprefix_trace_node2.ti"}]}]}}]} (node1@hurin)11> inviso:list_logs([node()]). {ok,[{'node1@hurin', {ok,[{trace_log,".",["tracefile_node1.log"]}, {ti_log,".",["trace_node1.ti"]}]}}]} (node1@hurin)12> inviso_lfm:merge( [{node(),[{trace_log,["tracefile_node1.log"]}, {ti_log,["trace_node1.ti"]}]}, {node2@hurin,[{trace_log,["aprefix_tracefile_node2.log"]}, {ti_log,["aprefix_trace_node2.ti"]}]}],"theoutfile.txt"). {ok,15} (node1@hurin)13> inviso:clear(). {ok,[{'node1@hurin',{ok,{new,running}}}, {'node2@hurin',{ok,{new,running}}}]} (node1@hurin)14> inviso:stop_nodes(). {ok,[{'node2@hurin',ok}, {'node1@hurin',ok}]} (node1@hurin)15>
Incarnation runtime tags are used to identify an incarnation of a runtime component. An incarnation is one "start-up" of a runtime component on a specific Erlang node. The reason why it can sometimes be necessary to examine the incarnation runtime tag is that a user wants to connect, adopt, an already running runtime component. This may be the case if the runtime component has autostarted or because the control component terminated without killing the runtime component. While the user has been out of control of the runtime component it may very well have terminated and been restarted. If it was restarted without the user's knowledge, its incarnation runtime tag has most likely changed. The user can therefore, if the current incarnation runtime tag is not what it is supposed to be, conclude that the runtime component is not "doing" what is expected.
The runtime tag is set at runtime component start-up. This is either done when it is started manually by a call to inviso:add_nodes/X, or according to a specification in one of the autostart configuration files.
A runtime component has a state and a status. The possible states are: new, tracing and idle. A runtime component that is tracing has (possibly) open log files. A new runtime component has no current tracer-data. That is it lacks any history of what it has done just recently. An idle runtime component is no longer tracing. It does therefore have current tracer-data that describes what it did do when it was tracing.
The status describes if the runtime component is running or suspended. A suspended runtime component may very well be in state tracing. However the point is that it shall not generate any processor load. It will therefore refrain from generating any trace messages.
Meta tracing is a trace mechanism separate from the regular tracing. It is normally used by a trace-tool to learn about function calls made anywhere in an Erlang node. A typical example is that there is a possibility in Inviso to get pids translated to registered name in the final formatted trace-log (for processes having registered names). This is done by meta-tracing on the BIF register/2 to learn about all name/pid associations made.
Meta tracing in Inviso is done by the inviso_rt_meta process, which is part of the runtime component if trace-information, ti, is initiated. See inviso:init_tracing/1 for details. The runtime meta tracer opens and controls the so called trace information file. Translations can then be done off-line using the associations logged in the trace information file. Currently the only type of trace information file available is a straight binary file. A wrap-file makes no sence since pid-to-name associations made in the beginning will most likely be lost.
The runtime meta tracer can also be used to translate pids to own identifiers. The only thing needed is one or several association points in the form of function calls which will only be made if an association is done in the system. The pid and own-identifier must be arguments and/or return values from the same function call.
The runtime meta tracer can further more be used to achieve side-effects during tracing, like turning tracing on or off.
It may sometimes be necessary to wait for a meta traced function to return before it can be decided what to do. This may be due to that one piece of information to make the decision is in the arguments to the function, the other in the return value. This kind of logic can be programmed to be executed by the inviso meta tracer. In order for the inviso meta tracer to "remember" function-call arguments until the function return trace message arrives, a public loop data structure is implemented. The public loop data structure is first created when tracing is initiated (of course only when trace information is specified in the init_tracing call). The public loop data can then later be further initiated each time meta tracing (tpm and tpm_ms) is activated for a certain function.
The default public loop data structure is a tuple of size two. The first element in that tuple is used by the predefined meta tracing for capturing locally registered names. The second element is free to use for any other purpose. The elements of the tuple must in the default implementation be lists of tuples. Where each sub-tuple shall represent one waiting call. The last element of that tuple must be a now-stamp (as returned by the BIF now/0). See below for an explanation of the now-stamp. The size of the outer most tuple may be increased as long as the term residing in the first element is left unchanged, and all other elements follow the above described rules.
The inviso meta tracer "cleans" the public loop data structure approximately once every minute. The reason for this is that entries in the public loop data structure may become abandoned. If for instance a process crashes while executing the body of a meta traced function, no return value will be generated. Or in other words, receiving the call meta trace-message can have caused information to have been written into the public loop data structure. That entry will be used and removed when the return_trace meta trace-message arrives. But if the meta traced function causes an exception, no return_trace message will come. The function which normally removes the entry is then therfore never called.
The default clean-function assumes that every item in the public loop data tuple is a list. Where each list contains tuples where the last element of those tuples are "now-stamps". The default clean-function considers an entry older than 30 seconds to be abandoned.
When activating meta tracing for a function for the purpose of writing pid-alias associations in the trace information file, a call-func and possibly also a return-func is specified. These functions will be called when a meta trace message arrives to the inviso meta tracer as a result of function calls or returns for this meta traced function. What exactly to write in the trace information file is dictated by the merge mechanism. This since pid-alias translations are done off line when merging log-files. See the chapter on merging and formatting log files for more details.
Simple example where the call to the function connection:assoc_id(Pid,Ref) will associated Pid with the id Ref. We will then in a merged log-file see a translation between Pid and Ref. Actually for all future since there is no unalias function meta traced in this example. The inviso meta tracer will receive a meta trace message every time connection:assoc_id/2 is called. When that message arrives the meta tracer will call mytrace:call_assoc_id/3 which must return {ok,NewPublicLoopData,OutPutBinary}.
-module(mytrace). call_assoc_id(_CallingPid,[Pid,Ref],PublLoopData) -> {ok,PublLoopData,term_to_binary({Pid,Ref,alias,now()})}.
(node1@hurin)21> inviso:tpm(connection,assoc_id,2,[], {mytrace,call_assoc_id}). {ok,[{'node1@hurin',{ok,1}}, {'node2@hurin',{ok,1}}]} (node1@hurin)22>
It is of course very likely that the public loop data structure must be extended to host all functions where the meta tracer must delay its action until the function in question returns. What is necessary is to create your own public loop data structure at trace initialization. This is done by using the TiSpec. TiSpec={InitMFA,RemoveMF,CleanMF}, where InitMFA creates the structure, RemoveMF removes it (must often not necessary unless a database, file or similar is used as storage instead of a tuple). CleanMF is the function which will be called each every 60 seconds to go over the public loop data structure. Following the below rules, not much programming will be needed, apart from the InitMFA:
Simple example where tracing is initiated with a public loop data structure having 10 places for nine (the locally registered names is mandatory) different functions to be meta traced. Note that the BIF list_to_tuple/1 is used as initialization function. And that the Stdlib function lists:duplicate/2 is used to create something for the initialization function to work on.
(node1@hurin)4> inviso:init_tracing( [{node2@hurin,[{trace,{file,"tracefile_node2.log"}}, {ti,{file,"trace_node2.ti",{{erlang,list_to_tuple,[lists:duplicate(10,[])]}, void,{inviso_rt_meta,clean_std_publld}}}}]}]). {ok,[{'node2@hurin',{ok,[{trace_log,ok},{ti_log,ok}]}}]} (node1@hurin)5>
Since meta tracing is independent of regular tracing and catches any function call to a particular function made in any process, it is well suited to be used to turn things on or off during execution. That trick is done by letting the CallFunc and (if used) ReturnFunc do these sideeffects. One must of course remember that the inviso meta tracer is a process amongst all other processes in the system. Meaning that the side effect is not necessarily done exactly when the meta traced function is called. Unless the side effect can be achieved using a match specification action.
In order to trace before any user interaction is possible, an autostart mechanism is implemented. The runtime component is started by the top supervisor of the Runtime_Tools application top supervisor. Hence the Runtime_Tools application must be part of the boot script for autostart tracing to work. The Runtime_Tools applications must of course be started before any application that is to be traced. Do note that application startup is not entirely synchronous. Meaning that just because the application controller has begun starting the next application, Runtime_Tools is not necessarily fully up and running.
The autostart mechanism is configurable. The runtime component comes with a standard autostart configuration, only missing two text-files to be completely operational.
The autostart is controlled by the Runtime_Tools application configuration parameter inviso_autostart_mod. It must be the name of a module exporting an autostart/1 function. The default value is inviso_autostart, a module which is provided with Runtime_Tools. See below for details.
An autostart/1 function must offer the following:
autostart(RuntimeToolsArg) = {MFA,Options,Tag} | any()
RuntimeToolsArgs is the argument Arg provided to the Runtime_Tools application through the application resource file {mod,{Module,Arg}} parameter.
MFA = {AutoMod,AutoFunc,AutoArgs} | any() controls how tracing will be initiated. Note that initiating tracing is not necessarily the same as starting a runtime component. It is possible to have a runtime component without doing any tracing. The runtime component is started as long as autostart/1 returns the proper tuple, and Options does not for instance require a certain non-existing control component. If it is not a proper tuple or there are other faults in the tuple items, the autostart will terminate. Typically will this happen if there is no autostart/1 function.
If MFA does not properly point out a function possible to call with spawn(AutoMod,AutoFunc,AutoArg), there will simply be no initialization. (Initialization is done by a separate process spawned by the runtime component during autostart.) It may be worth reminding that AutoMod must be present at the node where the runtime component is supposed to run. Not necessarily the node where the control component usually runs.
Options is the list of options given to the runtime component. See Options in inviso:add_nodes/2.
Tag is the runtime component incarnation tag. See Tag in inviso:add_nodes/2.
As mentioned above, Inviso comes with a complete implementation of autostart sufficient for most situations.
The default autostart module is inviso_autostart, provided as part of the Runtime_Tools application. Since that name is the default module name, it is not really necessary to set the inviso_autostart_mod configuration parameter for the Runtime_Tools application.
Its autostart/1 function reads a configuration file pointed out by the Runtime_Tools application configuration parameter inviso_autostart_conf. If the parameter is not present, a default file, inviso_autostart.config in the current working directory, will be consulted.
The config file must be an ascii text file with one or more tuples ended with a dot. The following parameters are recognized:
Example:
{repeat,1}. {mfa,{inviso_autostart_server, init, [[{tracerdata,{file,"mylogfile"}}, {cmdfiles,["a_trace_case.txt"]}, {bindings,[{'M',mymod},{'F','_'},{'Arity','_'}]}, {translations,[]}]]}}.
The example file results in the start of a runtime component given no specific options. There will only be one autostart since the repeat parameter is set to 1. Tracing will be initiated by the standard initiator (inviso_autostart_server). The initiator will initiate tracing opening a plain trace-port logfile ("mylogfile"). It will further read the "a_trace_case.txt" file to get instructions on what patterns and flags to set. If there are variables mentioned in the trace-case file "a_trace_case.txt", it is parameterized, the variables M, F and Arity will get the values according to bindings. There will be no translations done, hence the trace-case file must be written using inviso_rt function calls directly.
To further facilitate the standard autostart implementation a default initiator is implemented. To use it, simply specify it as mfa in the config file read by the standard autostart module.
Its init/1 function takes one argument on the form of a list of tuples. The following tuple-parameters are recognized:
Translations= [{{Mod1,Func1,Arity}, {Mod2,Func2,{TranslMod,TranslFunc}}},...] TranslMod:TranslFunc(ListOfOrigArgs)-> ListOfTransformedArgs
To facilitate creating the configuration file described above, there are functions in a module named inviso_as_lib which can both create new files according to supplied arguments and update existing configuration files.
The node(s) in question must be running since the functionality in the utility library uses distributed Erlang to access the file system.
In order to protect real "live" systems from getting a runtime component lingering around without a control component, a dependency property can be specified at runtime component start-up. The property specifies a dependency in milliseconds. Meaning that if the property is set to 0 (zero), the runtime component will terminate immediately if its current control component terminates.
If a control component tries to start a runtime component at an Erlang node where there already is a runtime component, the control component will adopt the already existing runtime component if it has no current control component. Otherwise the control component will experience an error, not being able to start a runtime component at that node.
It must also be noted that an autostart runtime component is running without control component, at least before any control component adopts it.
Since Inviso is intended to be used on real "live" systems, it is possible to protect the system against overload, having Inviso suspend tracing should an overload situation occur.
What indicates an overload situation must be programmed and configured outside of Inviso. Inviso can initiate an overload protection, call an overload function periodically and clean-up an overload mechanism should it decide to terminate.
Internally inside the runtime component, suspending tracing means removing all process trace flags and meta patterns. Reactivating tracing is outside the scoop of Inviso, but can be implemented in a tool using Inviso.
Simple example adding a runtime component and making it protect its Erlang node from overload.
inviso:add_node(my_rt_tag, [{overload,{{my_ovl,check}, 15000, {my_ovl,start,[my_port_pgm]}, {my_ovl,stop,[my_port_pgm]}}}]).
Immediately when the runtime component is started, it will initiate overload protection by calling my_ovl:start(my_port_pgm). When tracing (not when in state idle or new), the runtime component will every 15000 milliseconds call my_ovl:check/1. Depending on its return value, the runtime component will either do nothing or suspend tracing. When the runtime component is stopped, my_ovl:stop(my_port_pgm) will be called.
If logging trace messages to a logfile has been used (decided when tracing is initiated) the various log files will be located on the different Erlang nodes participating in the trace. The log files must be merged and formatted for the following reasons:
The first step before any merging can take place is of course to get all log files, including any trace information files to a location where the logfile merger can access them. This can either be done by simply copying the files. However if the file systems on the Erlang nodes are not that easily accessed, there is a fetch_log function implemented in the runtime component. It will transfer log files using distributed Erlang.
Inviso comes with two Erlang modules, inviso_lfm and inviso_lfm_tpfreader, implementing a standard log file merger and formatter. The log file merger (inviso_lfm) uses a file reader process (implemented in inviso_lfm_tpfreader) to access log entries in parallel. It is possible to write your own logfile reader. This is necessary since you may have your own trace-log format and/or own trace information log format. The logfile merger can further more be configured to use your own formatter, customizing what to do with a trace message.
Trace messages in the log files must of course be time-stamped for the logfile merger to be capable of correctly merging them. This means using the timestamp process trace flag.
The standard inviso log-file reader understands the following trace information file entries:
{Pid,Alias,alias,NowStamp}
{Pid,Alias,unalias,NowStamp}
The Pid in an alias entry must always be a proper pid. In an unalias entry it may also be the atom undefined. The latter means that all associations involving Alias shall stop to be valid. The standard inviso log file reader uses the now-stamp to make sure that associations are only used during time periods in the log-file when such are valid.
The idea behind trace cases is that someone knowledgeable of a certain system component can write a file specifying the trace-patterns and process trace flags necessary to trace on certain items once and for all. Hence a trace case will most likely be a series of calls to functions setting trace patterns and process trace flags.
However, the actual Erlang nodes and values of arguments given in the trace function calls can not be static in order for the trace cases to become useful and reusable. A trace case file must therefore be possible to parameterize. Introducing variables that will get their values at the time of trace case execution. It may also be the case that Inviso is used as a component in a higher layer trace tool. Trace cases may therefore be written calling more complex functions than the low level inviso_rt functions which are available to autostart mechanisms. In a matter of fact, the inviso API itself can be considered a higher layer. It addresses multiple nodes at once where the inviso_rt API can only address the local node.
This results in that for trace cases to be useful there must be a function call translation mechanism and an execution environment capable of handling variable bindings.
A trace-case is a text ascii file consisting of function calls written as they could have been done in the Erlang shell:
modulename:functionname(arg1,arg3,...).
A trace-case may contain any valid function call, including binding new variables which are used later in the trace-case, but:
Example: Trace cases are expected to be written to be executed directly in an Erlang shell (by some utility reading a text file on trace case format) calling inviso functions. The translations must then translate inviso function calls to inviso_rt function calls, since inviso is not available in the Runtime_Tools applications. It can not be assumed that any trace tools outside the Runtime_Tools application is available on the nodes. Luckily (!) the inviso_rt API resembles the inviso API very much, apart from that the inviso_rt API does not take a list of nodes as an argument. Therefore in most situations the only transformation necessary is to change from inviso to inviso_rt and remove the first argument to the function call.
Assume that we have the following trace-case file:
inviso:tpl(Nodes,mymod,'_','_',MS). inviso:tf(Nodes,all,[call,timestamp]).
For this to work in an autostart the following translation is needed:
[{{inviso,tpl,5},{inviso_rt,tpl,{erlang,tl}}}, {{inviso,tf,3},{inviso_rt,tf,{erlang,tl}}}]
Since transforming the arguments from inviso calls to inviso_rt calls is simply removing the first argument, there is no need to program any function to do this. The BIF tl/1 can be used directly.
Further there must be a variable binding for MS when executing the trace-case. It is not necessary to have one for Nodes since that argument is removed from all function calls by the translation.