fprof

MODULE

fprof

MODULE SUMMARY

A Time Profiling Tool using trace to file for minimal runtime performance impact.

DESCRIPTION

This module is used to profile a program to find out how the execution time is used. Trace to file is used to minimize runtime performance impact.

The fprof module uses tracing to collect profiling data, hence there is no need for special compilation of any module to be profiled. When it starts tracing, fprof will erase all previous tracing in the node and set the necessary trace flags on the profiling target processes as well as local call trace on all functions in all loaded modules and all modules to be loaded. fprof erases all tracing in the node when it stops tracing.

fprof presents both own time i.e how much time a function has used for its own execution, and accumulated time i.e including called functions. All presented times are collected using trace timestamps. Therefore the times are wallclock times from the host machine OS, and OS scheduling will randomly strike all called functions in a presumably fair way.

Profiling is essentially done in 3 steps:

1: Tracing; to file, as mentioned in the previous paragraph.
2: Profiling; the trace file is read and raw profile data is collected into an internal RAM storage on the node. During this step the trace data may be dumped in text format to file or console.
3: Analysing; the raw profile data is sorted and dumped in text format either to file or console.

Since fprof uses trace to file, the runtime performance degradation is minimized, but still far from negligible, especially not for programs that use the filesystem heavily by themselves. Where you place the trace file is also important, e.g on Solaris /tmp is usually a good choice, while any NFS mounted disk is a bad idea.

fprof can also skip the file step and trace to a tracer process that does the profiling in runtime.

EXPORTS

start() -> {ok, Pid} | {error, {already_started, Pid}}

Pid = pid()

Starts the fprof server.

Note that it seldom needs to be started explicitly since it is automatically started by the functions that need a running server.

stop() -> stopped

Same as stop(normal).

stop(Reason) -> stopped

Reason = term()

Stops the fprof server.

The supplied Reason becomes the exit reason for the server process. Default Any Reason other than kill sends a request to the server and waits for it to clean up, reply and exit. If Reason is kill, the server is bluntly killed.

Note!
When the fprof server is stopped the collected raw profile data is lost.

apply(Func, Args) -> term()

Func = function() | {Module, Function}

Args = [term()]

Module = atom()

Function = atom()

Same as apply(Func, Args, []).

apply(Module, Function, Args) -> term()

Args = [term()]

Module = atom()

Function = atom()

Same as apply({Module, Function}, Args, []).

apply(Func, Args, OptionList) -> term()

Func = function() | {Module, Function}

Args = [term()]

OptionList = [Option]

Module = atom()

Function = atom()

Option = continue | start | {procs, PidList} | TraceStartOption

Calls erlang:apply(Func, Args) surrounded by trace([start, ...]) and trace(stop).

Some effort is made to keep the trace clean from unnecessary trace messages; tracing is started and stopped from a spawned process while the erlang:apply/2 call is made in the current process, only surrounded by receive and send statements towards the trace starting process. The trace starting process exits when not needed any more.

The TraceStartOption is any option allowed for trace/1. The options [start, {procs, [self() | PidList]} | OptList] are given to trace/1, where OptList is OptionList with continue, start and {procs, _} options removed.

The continue option inhibits the call to trace(stop) and leaves it up to the caller to stop tracing at a suitable time.

apply(Module, Function, Args, OptionList) -> term()

Module = atom()

Function = atom()

Args = [term()]

Same as apply({Module, Function}, Args, OptionList).

OptionList is an option list allowed for apply/3.

trace(start, Filename) -> ok | {error, Reason} | {'EXIT', ServerPid, Reason}

Reason = term()

Same as trace([start, {file, Filename}]).

trace(verbose, Filename) -> ok | {error, Reason} | {'EXIT', ServerPid, Reason}

Reason = term()

Same as trace([start, verbose, {file, Filename}]).

trace(OptionName, OptionValue) -> ok | {error, Reason} | {'EXIT', ServerPid, Reason}

OptionName = atom()

OptionValue = term()

Reason = term()

Same as trace([{OptionName, OptionValue}]).

trace(verbose) -> ok | {error, Reason} | {'EXIT', ServerPid, Reason}

Reason = term()

Same as trace([start, verbose]).

trace(OptionName) -> ok | {error, Reason} | {'EXIT', ServerPid, Reason}

OptionName = atom()

Reason = term()

Same as trace([OptionName]).

trace([Option]) -> ok | {error, Reason} | {'EXIT', ServerPid, Reason}

Option = start | stop | {procs, [PidSpec]} | verbose | {file, Filename} | {tracer, Tracer}

PidSpec = pid() | atom()

Tracer = pid() | port()

Reason = term()

Starts or stops tracing.

PidSpec and Tracer are used in calls to erlang:trace(PidSpec, true, [{tracer, Tracer} | Flags]), and Filename is used to call dbg:trace_port(file, Filename). Please see the appropriate documentation.

Option description:

stop: Stops a running fprof trace and clears all tracing from the node. Not allowed with the start option.
start: Clears all tracing from the node and starts a new fprof trace. Not allowed with the stop option.
verbose: Adds some trace flags that are unnecessary for fprof's purposes, but that may be interesting for general debugging. This option is only allowed with the start option.
{procs, [PidSpec]}: Specifies which processes that shall be traced. If this option is not given, the calling process is traced. All processes spawned by the traced processes are also traced. This option is only allowed with the start option.
{file, Filename}: Specifies the filename of the trace. If this option is not given, the file "fprof.trace" is used. This option is only allowed with the start option, but not with the {tracer, Tracer} option.
{tracer, Tracer}: Specifies that trace to process or port shall be done instead of trace to file. This option is only allowed with the start option, but not with the {file, Filename} option.

profile() -> ok | {error, Reason} | {'EXIT', ServerPid, Reason}

Reason = term()

Same as profile([]).

profile(OptionName, OptionValue) -> ok | {error, Reason} | {'EXIT', ServerPid, Reason}

OptionName = atom()

OptionValue = term()

Reason = term()

Same as profile([{OptionName, OptionValue}]).

profile(OptionName) -> ok | {error, Reason} | {'EXIT', ServerPid, Reason}

OptionName = atom()

Reason = term()

Same as profile([OptionName]).

profile([Option]) -> ok | {ok, Tracer} | {error, Reason} | {'EXIT', ServerPid, Reason}

Option = file | {file, Filename} | dump | {dump, Dump} | append | start | stop

Dump = pid() | Dumpfile | []

Tracer = pid()

Reason = term()

Compiles a trace into raw profile data held by the fprof server.

Dumpfile is used to call file:open/2, and Filename is used to call dbg:trace_port(file, Filename). Please see the appropriate documentation.

Option description:

{file, Filename}: Reads the file Filename and creates raw profile data that is stored in RAM by the fprof server. If this option is not given, file "fprof.trace" is read. The call will return when the whole trace has been read with the return value ok if successful. This option is not allowed with the start or stop options.
dump | {dump, Dump}: Specifies the destination for the trace text dump. If this option is not given, no dump is generated, if it is dump the destination will be the caller's group leader, otherwise the destination Dump is either the pid of an I/O device or a filename. And, finally, if the filename is [] - "fprof.dump" is used instead. This option is not allowed with the stop option.
append: Causes the trace text dump to be appended to the destination file. This option is only allowed with the {dump, Dumpfile} option.
start: Starts a tracer process that profiles trace data in runtime. The call will return immediately with the return value {ok, Tracer} if successful. This option is not allowed with the stop or {file, Filename} options.
stop: Stops the tracer process that profiles trace data in runtime. The return value will be value ok if successful. This option is not allowed with the start or {file, Filename} options.

analyse() -> ok | {error, Reason} | {'EXIT', ServerPid, Reason}

Reason = term()

Same as analyse([]).

analyse(OptionName, OptionValue) -> ok | {error, Reason} | {'EXIT', ServerPid, Reason}

OptionName = atom()

OptionValue = term()

Reason = term()

Same as analyse([{OptionName, OptionValue}]).

analyse(OptionName) -> ok | {error, Reason} | {'EXIT', ServerPid, Reason}

OptionName = atom()

Reason = term()

Same as analyse([OptionName]).

analyse([Option]) -> ok | {error, Reason} | {'EXIT', ServerPid, Reason}

Option = dest | {dest, Dest} | append | {cols, Cols}

Dest = pid() | Destfile

Cols = integer() >= 80

Reason = term()

Analyses raw profile data in the fprof server. If called while there is no raw profile data available, {error, no_profile} is returned.

Destfile is used to call file:open/2. Please see the appropriate documentation.

Option description:

dest | {dest, Dest}: Specifies the destination for the analysis. If this option is not given or it is dest, the destination will be the caller's group leader, otherwise the destination Dest is either the pid() of an I/O device or a filename. And, finally, if the filename is [] - "fprof.analysis" is used instead.
append: Causes the analysis to be appended to the destination file. This option is only allowed with the {dest, Destfile} option.
{cols, Cols}: Specifies the number of columns in the analysis text. If this option is not given the number of columns is set to 80.

Analysis format

This section describes the output format of the analyse command. See analyse/0.

The following example was run on OTP/R8 on Solaris 8, all OTP internals in this example are very version dependent.

As an example, we will use the following function, that you may recogise as a slightly modified benchmark function from the manpage file(3):

-module(foo).
-export([create_file_slow/2]).

create_file_slow(Name, N) when integer(N), N >= 0 ->
    {ok, FD} = 
        file:open(Name, [raw, write, delayed_write, binary]),
    if N > 256 ->
            ok = file:write(FD, 
                            lists:map(fun (X) -> <<X:32/unsigned>> end,
                            lists:seq(0, 255))),
            ok = create_file_slow(FD, 256, N);
       true ->
            ok = create_file_slow(FD, 0, N)
    end,
    ok = file:close(FD).

create_file_slow(FD, M, M) ->
    ok;
create_file_slow(FD, M, N) ->
    ok = file:write(FD, <<M:32/unsigned>>),
    create_file_slow(FD, M+1, N).

Let us have a look at the printout after running:

1> fprof:apply(foo, create_file_slow, [junk, 1024]).
2> fprof:profile().
3> fprof:analyse().

The printout starts with:

Detailed statistics:
                                                  CALLS        ACC        OWN  

** Totals                                          8124   1534.916   1482.850

The CALLS column shows the total number of function calls that was found in the trace. In the ACC column is the total time of the trace from first timestamp to last. And in the OWN column is the sum of the execution time in functions found in the trace, not including called functions. In this case it is very close to the ACC time since the emulator had practically nothing else to do than to execute our test program.

All time values in the printout are in milliseconds.

The printout continues:

                                                  CALLS        ACC        OWN  

** Process <0.25.0>                                8124              1482.850

This is the printout header of one process. The printout contains only this one process since we did fprof:apply/3 which traces only the current process. Therefore the CALLS and OWN columns perfectly matches the totals above. There is no ACC column since summing the ACC times of all calls in the process makes no sense.

Now to something more interesting:

   undefined                                          1   1484.554      0.094  
*  foo:create_file_slow/2                             1   1484.554      0.094 *
   file:close/1                                       1   1133.846      0.012  
   foo:create_file_slow/3                             1    243.124     19.316  
   file:open/2                                        1     63.869     17.662  
   lists:map/2                                        1     22.290      9.178  
   lists:seq/2                                        1     21.065      0.010  
   file:write/2                                       1      0.266      0.017

The printout consists of one section per called function. The function marked with * is the one this section concerns. Above the marked function are the calling functions - those that has called the marked, and below are those called by the marked function.

The sections are sorted in decreasing order of the ACC column for the marked function.

The columns for the marked function are: CALLS - the number of times this funcion has been called in this process, ACC - the wallclock time spent in this function including called functions, and OWN - the wallclock time spent in this function not including called functions.

The columns for each called function have the same contents as for the marked function, with the constraint that only the occasions when it was called from the marked function are accounted for.

The columns for each calling function also have the same contents as for the marked function, but with the constraint that only the occasions when it called the marked function are accounted for.

So, we see that foo:create_file_slow/2 used very little time for its own execution. It spent most of its time in file:close/1. The function foo:create_file_slow/3 that writes 3/4 of the file contents is the second biggest time thief.

We also see that the call to file:write/2 that writes 1/4 of the file contents takes very little time in itself, what takes time is to build the data (lists:seq/2 and lists:map/2).

The function 'undefined' that has called foo:create_file_slow/2 is an unknown function because that call was not recorded in the trace. It was only recorded that the execution returned from foo:create_file_slow/2.

Let us continue down the printout and we find:

   foo:create_file_slow/2                             1    243.124     19.316  
*  foo:create_file_slow/3                             1    243.124     19.316 *
   file:write/2                                     768    223.610     10.057  
   suspend                                           14      0.198      0.000

If you compare with the code you will see there also that foo:create_file_slow/3 was called only from foo:create_file_slow/2 and called only file:write/2. Please note the number of calls to file:write/2. But here we see that 'suspend' was called a few times. This is a pseudo function that indicates that the process was suspended while executing in foo:create_file_slow/3, and since there is no receive or erlang:yield/0 in the code, it must be scheduling suspensions.

Let us find the 'suspend' entry:

   prim_file:drv_command/4                           15      0.471      0.000  
   prim_file:drv_get_response/1                      17      0.269      0.000  
   file:write/2                                      14      0.212      0.000  
   foo:create_file_slow/3                            14      0.198      0.000  
   prim_file:write/2                                  7      0.156      0.000  
   prim_file:drv_command/2                            9      0.139      0.000  
   prim_file:translate_response/2                     5      0.120      0.000  
   lists:map/2                                        1      0.101      0.000  
   prim_file:'-drv_command/2-fun-0-'/1                3      0.068      0.000  
   fprof:apply_start_stop/4                           1      0.000      0.000  
*  suspend                                           86      1.734      0.000 *

We find no particulary long suspend times, so no function seems to have waited in a receive statement. Actually, prim_file:drv_command/4 contains a receive statement, but in this test program, the message lies in the process receive buffer when the receive statement is entered. We also see that the total suspend time for the test run is very small.

Now we look at another interesting pesudo function, 'garbage_collect':

   lists:seq/4                                        1     18.356     18.356  
   prim_file:drv_command/4                           24      0.999      0.999  
   prim_file:write/2                                 18      0.714      0.714  
   prim_file:drv_command/2                            1      0.063      0.063  
*  garbage_collect                                   44     20.132     20.132 *

Here we can see that one function distinguishes itself since most garbage collections occured while in lists:seq/4. It is rare that one function alone causes most garbage collections, but in this case it is practically the only function that increases the heap size for the process.

Let us now get back to the test code:

   foo:create_file_slow/3                           768    223.610     10.057  
   foo:create_file_slow/2                             1      0.266      0.017  
*  file:write/2                                     769    223.876     10.074 *
   prim_file:write/2                                769    213.590     23.581  
   suspend                                           14      0.212      0.000

Not unexpectedly, we see that file:write/2 was called from foo:create_file_slow/3 and foo:create_file_slow/2. The number of calls in each case as well as the used time are also just confirms the previous results.

We see that file:write/2 only calls prim_file:write/2, but let us refrain from digging into the internals of the kernel application.

Ahh, what the h**k, if we nevertheless do dig down we find the call to the linked in driver that does the file operations towards the host operating system:

   prim_file:drv_command/4                          772   1238.026   1238.026  
*  erlang:port_command/2                            772   1238.026   1238.026 *

This is some 80 % of the total run time, and as we saw before is the close operation the absolutely biggest contributor. We find a comparision ratio a little bit up in the call stack:

   prim_file:close/1                                  1   1122.811      0.011  
   prim_file:write/2                                769    189.139      9.684  
   prim_file:open_int/4                               1     28.682      0.013  
   prim_file:open_int_setopts/3                       1      0.100      0.010  
*  prim_file:drv_command/2                          772   1340.732      9.718 *
   prim_file:drv_command/4                          772   1330.812     22.344  
   suspend                                            9      0.139      0.000  
   garbage_collect                                    1      0.063      0.063

The wallclock time in the linked in driver distributes itself as 2 % for open, 14 % for write and 84 % for close. All data is probably buffered in the operating system until the close.

Notes

The actual supervision of execution times is in itself a CPU intensive activity. A message is written on the trace file for every function call that is made by the profiled code.

The information in the trace does not, unfortunately, give an unambigous picture of what happens on the execution call stack. fprof tries its best to find out which functions are called within which, but sometimes it is not possible. This happens when functions call themselves through other functions in complicated ways, and shows up in the analysis as functions that have used impossibly long accumulated times.

fprof

MODULE

MODULE SUMMARY

DESCRIPTION

EXPORTS

Analysis format

Notes

See Also

AUTHORS