This module is used to profile a program to find out how the execution time is used. Trace to file is used to minimize runtime performance impact.
The fprof
module uses tracing to collect profiling data,
hence there is no need for special compilation of any module to
be profiled. When it starts tracing, fprof
will erase all
previous tracing in the node and set the necessary trace flags
on the profiling target processes as well as local call trace on
all functions in all loaded modules and all modules to be loaded.
fprof
erases all tracing in the node when it stops tracing.
fprof
presents both own time i.e how much time a
function has used for its own execution, and
accumulated time i.e including called functions.
All presented times are
collected using trace timestamps. Therefore the times are
wallclock times from the host machine OS, and OS scheduling will
randomly strike all called functions in a presumably fair way.
Profiling is essentially done in 3 steps:
1
2
3
Since fprof
uses trace to file, the runtime performance degradation
is minimized, but still far from negligible, especially not for
programs that use the filesystem heavily by themselves. Where
you place the trace file is also important, e.g on Solaris
/tmp
is usually a good choice, while any NFS mounted disk
is a bad idea.
fprof
can also skip the file step and trace to a tracer process
that does the profiling in runtime.
start() -> {ok, Pid} | {error, {already_started, Pid}}
Pid = pid()
Starts the fprof
server.
Note that it seldom needs to be started explicitly since it is automatically started by the functions that need a running server.
Same as stop(normal)
.
Reason = term()
Stops the fprof
server.
The supplied Reason
becomes the exit reason for the
server process. Default Any
Reason
other than kill
sends a request to the
server and waits for it to clean up, reply and exit. If
Reason
is kill
, the server is bluntly killed.
When the |
Func = function() | {Module, Function}
Args = [term()]
Module = atom()
Function = atom()
Same as apply(Func, Args, [])
.
apply(Module, Function, Args) -> term()
Args = [term()]
Module = atom()
Function = atom()
Same as apply({Module, Function}, Args, [])
.
apply(Func, Args, OptionList) -> term()
Func = function() | {Module, Function}
Args = [term()]
OptionList = [Option]
Module = atom()
Function = atom()
Option = continue | start | {procs, PidList} | TraceStartOption
Calls erlang:apply(Func, Args)
surrounded by
trace([start, ...])
and
trace(stop)
.
Some effort is made to keep the trace clean from unnecessary
trace messages; tracing is started and stopped from a spawned
process while the erlang:apply/2
call is made in the
current process, only surrounded by receive
and
send
statements towards the trace starting
process. The trace starting process exits when not needed
any more.
The TraceStartOption
is any option allowed for
trace/1
. The options
[start, {procs, [self() | PidList]} | OptList]
are given to trace/1
, where OptList
is
OptionList
with continue
, start
and {procs, _}
options removed.
The continue
option inhibits the call to
trace(stop)
and leaves it up to the caller to stop
tracing at a suitable time.
apply(Module, Function, Args, OptionList) -> term()
Module = atom()
Function = atom()
Args = [term()]
Same as
apply({Module, Function}, Args, OptionList)
.
trace(start, Filename) -> ok | {error, Reason} |
{'EXIT', ServerPid, Reason}
Reason = term()
Same as trace([start, {file, Filename}])
.
trace(verbose, Filename) -> ok | {error, Reason} |
{'EXIT', ServerPid, Reason}
Reason = term()
Same as
trace([start, verbose, {file, Filename}])
.
trace(OptionName, OptionValue) -> ok | {error, Reason} |
{'EXIT', ServerPid, Reason}
OptionName = atom()
OptionValue = term()
Reason = term()
Same as
trace([{OptionName, OptionValue}])
.
trace(verbose) -> ok | {error, Reason} |
{'EXIT', ServerPid, Reason}
Reason = term()
Same as trace([start, verbose])
.
trace(OptionName) -> ok | {error, Reason} |
{'EXIT', ServerPid, Reason}
OptionName = atom()
Reason = term()
Same as trace([OptionName])
.
trace([Option]) -> ok | {error, Reason} |
{'EXIT', ServerPid, Reason}
Option = start | stop | {procs, [PidSpec]} | verbose |
{file, Filename} | {tracer, Tracer}
PidSpec = pid() | atom()
Tracer = pid() | port()
Reason = term()
Starts or stops tracing.
PidSpec
and Tracer
are used in calls to
erlang:trace(PidSpec, true, [{tracer, Tracer} |
Flags])
, and Filename
is used to call
dbg:trace_port(file, Filename)
. Please see the
appropriate documentation.
Option description:
stop
fprof
trace and clears all tracing
from the node. Not allowed with the start
option.
start
fprof
trace. Not allowed with the stop
option.
verbose
fprof
's
purposes, but that may be interesting for general
debugging.
This option is only allowed with the start
option.
{procs, [PidSpec]}
start
option.
{file, Filename}
"fprof.trace"
is used.
This option is only allowed with the start
option,
but not with the {tracer, Tracer}
option.
{tracer, Tracer}
start
option,
but not with the {file, Filename}
option.
profile() -> ok | {error, Reason} |
{'EXIT', ServerPid, Reason}
Reason = term()
Same as profile([])
.
profile(OptionName, OptionValue) -> ok | {error, Reason} |
{'EXIT', ServerPid, Reason}
OptionName = atom()
OptionValue = term()
Reason = term()
Same as
profile([{OptionName, OptionValue}])
.
profile(OptionName) -> ok | {error, Reason} |
{'EXIT', ServerPid, Reason}
OptionName = atom()
Reason = term()
Same as profile([OptionName])
.
profile([Option]) -> ok | {ok, Tracer} | {error, Reason} |
{'EXIT', ServerPid, Reason}
Option = file | {file, Filename} | dump | {dump, Dump} |
append | start | stop
Dump = pid() | Dumpfile | []
Tracer = pid()
Reason = term()
Compiles a trace into raw profile data held by the
fprof
server.
Dumpfile
is used to call file:open/2
,
and Filename
is used to call
dbg:trace_port(file, Filename)
. Please see the
appropriate documentation.
Option description:
{file, Filename}
Filename
and creates raw
profile data that is stored in RAM by the fprof
server. If
this option is not given, file "fprof.trace"
is
read. The call will return when the whole trace has been
read with the return value ok
if successful.
This option is not allowed with the start
or
stop
options.
dump
| {dump, Dump}
dump
the destination will be the
caller's group leader, otherwise the destination
Dump
is either the pid of an I/O device or
a filename. And, finally, if the filename is []
-
"fprof.dump"
is used instead.
This option is not allowed with the stop
option.
append
{dump, Dumpfile}
option.
start
{ok, Tracer}
if successful.
This option is not allowed with the stop
or
{file, Filename}
options.
stop
ok
if successful.
This option is not allowed with the start
or
{file, Filename}
options.
analyse() -> ok | {error, Reason} |
{'EXIT', ServerPid, Reason}
Reason = term()
Same as analyse([])
.
analyse(OptionName, OptionValue) -> ok | {error, Reason} |
{'EXIT', ServerPid, Reason}
OptionName = atom()
OptionValue = term()
Reason = term()
Same as
analyse([{OptionName, OptionValue}])
.
analyse(OptionName) -> ok | {error, Reason} |
{'EXIT', ServerPid, Reason}
OptionName = atom()
Reason = term()
Same as analyse([OptionName])
.
analyse([Option]) -> ok | {error, Reason} |
{'EXIT', ServerPid, Reason}
Option = dest | {dest, Dest} | append | {cols, Cols}
Dest = pid() | Destfile
Cols = integer() >= 80
Reason = term()
Analyses raw profile data in the
fprof
server. If called while there is no raw
profile data available, {error, no_profile}
is
returned.
Destfile
is used to call file:open/2
.
Please see the appropriate documentation.
Option description:
dest
| {dest, Dest}
dest
,
the destination will be the caller's group leader,
otherwise the destination Dest
is either
the pid()
of an I/O device or a filename.
And, finally, if the filename is []
-
"fprof.analysis"
is used instead.
append
{dest, Destfile}
option.
{cols, Cols}
This section describes the output format of the analyse command. See analyse/0.
The following example was run on OTP/R8 on Solaris 8, all OTP internals in this example are very version dependent.
As an example, we will use the following function, that you may recogise as a slightly modified benchmark function from the manpage file(3):
-module(foo). -export([create_file_slow/2]). create_file_slow(Name, N) when integer(N), N >= 0 -> {ok, FD} = file:open(Name, [raw, write, delayed_write, binary]), if N > 256 -> ok = file:write(FD, lists:map(fun (X) -> <<X:32/unsigned>> end, lists:seq(0, 255))), ok = create_file_slow(FD, 256, N); true -> ok = create_file_slow(FD, 0, N) end, ok = file:close(FD). create_file_slow(FD, M, M) -> ok; create_file_slow(FD, M, N) -> ok = file:write(FD, <<M:32/unsigned>>), create_file_slow(FD, M+1, N).
Let us have a look at the printout after running:
1> fprof:apply(foo, create_file_slow, [junk, 1024]). 2> fprof:profile(). 3> fprof:analyse().
The printout starts with:
Detailed statistics: CALLS ACC OWN ** Totals 8124 1534.916 1482.850
The CALLS column shows the total number of function calls that was found in the trace. In the ACC column is the total time of the trace from first timestamp to last. And in the OWN column is the sum of the execution time in functions found in the trace, not including called functions. In this case it is very close to the ACC time since the emulator had practically nothing else to do than to execute our test program.
All time values in the printout are in milliseconds.
The printout continues:
CALLS ACC OWN ** Process <0.25.0> 8124 1482.850
This is the printout header of one process. The printout
contains only this one process since we did fprof:apply/3
which traces only the current process. Therefore the CALLS and
OWN columns perfectly matches the totals above. There is no ACC
column since summing the ACC times of all calls in the process
makes no sense.
Now to something more interesting:
undefined 1 1484.554 0.094 * foo:create_file_slow/2 1 1484.554 0.094 * file:close/1 1 1133.846 0.012 foo:create_file_slow/3 1 243.124 19.316 file:open/2 1 63.869 17.662 lists:map/2 1 22.290 9.178 lists:seq/2 1 21.065 0.010 file:write/2 1 0.266 0.017
The printout consists of one section per called function. The function marked with * is the one this section concerns. Above the marked function are the calling functions - those that has called the marked, and below are those called by the marked function.
The sections are sorted in decreasing order of the ACC column for the marked function.
The columns for the marked function are: CALLS - the number of times this funcion has been called in this process, ACC - the wallclock time spent in this function including called functions, and OWN - the wallclock time spent in this function not including called functions.
The columns for each called function have the same contents as for the marked function, with the constraint that only the occasions when it was called from the marked function are accounted for.
The columns for each calling function also have the same contents as for the marked function, but with the constraint that only the occasions when it called the marked function are accounted for.
So, we see that foo:create_file_slow/2
used very little
time for its own execution. It spent most of its time in
file:close/1
. The function foo:create_file_slow/3
that writes 3/4 of the file contents is the second biggest time
thief.
We also see that the call to file:write/2
that writes
1/4 of the file contents takes very little time in itself, what
takes time is to build the data (lists:seq/2
and
lists:map/2
).
The function 'undefined' that has called
foo:create_file_slow/2
is an unknown function because that
call was not recorded in the trace. It was only recorded
that the execution returned from foo:create_file_slow/2
.
Let us continue down the printout and we find:
foo:create_file_slow/2 1 243.124 19.316 * foo:create_file_slow/3 1 243.124 19.316 * file:write/2 768 223.610 10.057 suspend 14 0.198 0.000
If you compare with the code you will see there also that
foo:create_file_slow/3
was called only from
foo:create_file_slow/2
and called only
file:write/2
. Please note the number of calls to
file:write/2
. But here we see that 'suspend' was called a
few times. This is a pseudo function that indicates that the
process was suspended while executing in
foo:create_file_slow/3
, and since there is no
receive
or erlang:yield/0
in the code, it must be
scheduling suspensions.
Let us find the 'suspend' entry:
prim_file:drv_command/4 15 0.471 0.000 prim_file:drv_get_response/1 17 0.269 0.000 file:write/2 14 0.212 0.000 foo:create_file_slow/3 14 0.198 0.000 prim_file:write/2 7 0.156 0.000 prim_file:drv_command/2 9 0.139 0.000 prim_file:translate_response/2 5 0.120 0.000 lists:map/2 1 0.101 0.000 prim_file:'-drv_command/2-fun-0-'/1 3 0.068 0.000 fprof:apply_start_stop/4 1 0.000 0.000 * suspend 86 1.734 0.000 *
We find no particulary long suspend times, so no function seems
to have waited in a receive statement. Actually,
prim_file:drv_command/4
contains a receive statement, but
in this test program, the message lies in the process receive
buffer when the receive statement is entered. We also see that
the total suspend time for the test run is very small.
Now we look at another interesting pesudo function, 'garbage_collect':
lists:seq/4 1 18.356 18.356 prim_file:drv_command/4 24 0.999 0.999 prim_file:write/2 18 0.714 0.714 prim_file:drv_command/2 1 0.063 0.063 * garbage_collect 44 20.132 20.132 *
Here we can see that one function distinguishes itself since
most garbage collections occured while in lists:seq/4
. It
is rare that one function alone causes most garbage collections,
but in this case it is practically the only function that
increases the heap size for the process.
Let us now get back to the test code:
foo:create_file_slow/3 768 223.610 10.057 foo:create_file_slow/2 1 0.266 0.017 * file:write/2 769 223.876 10.074 * prim_file:write/2 769 213.590 23.581 suspend 14 0.212 0.000
Not unexpectedly, we see that file:write/2
was called
from foo:create_file_slow/3
and
foo:create_file_slow/2
. The number of calls in each case as
well as the used time are also just confirms the previous results.
We see that file:write/2
only calls
prim_file:write/2
, but let us refrain from digging into the
internals of the kernel application.
Ahh, what the h**k, if we nevertheless do dig down we find the call to the linked in driver that does the file operations towards the host operating system:
prim_file:drv_command/4 772 1238.026 1238.026 * erlang:port_command/2 772 1238.026 1238.026 *
This is some 80 % of the total run time, and as we saw before is the close operation the absolutely biggest contributor. We find a comparision ratio a little bit up in the call stack:
prim_file:close/1 1 1122.811 0.011 prim_file:write/2 769 189.139 9.684 prim_file:open_int/4 1 28.682 0.013 prim_file:open_int_setopts/3 1 0.100 0.010 * prim_file:drv_command/2 772 1340.732 9.718 * prim_file:drv_command/4 772 1330.812 22.344 suspend 9 0.139 0.000 garbage_collect 1 0.063 0.063
The wallclock time in the linked in driver distributes itself as 2 % for open, 14 % for write and 84 % for close. All data is probably buffered in the operating system until the close.
The actual supervision of execution times is in itself a CPU intensive activity. A message is written on the trace file for every function call that is made by the profiled code.
The information in the trace does not, unfortunately, give an
unambigous picture of what happens on the execution call stack.
fprof
tries its best to find out which functions are
called within which, but sometimes it is not possible. This
happens when functions call themselves through other functions in
complicated ways, and shows up in the analysis as functions that
have used impossibly long accumulated times.
dbg(3), eprof(3), erlang(3), io(3), Tools User's Guide