The mysteries of eprof

Thu Feb 24 10:06:02 CET 2000

Good morning.  I hope this message is coherent: it's been a long
day.(*)

I've been fiddling a lot with eprof, trying to plumb its secrets.
It's difficult to get used to, because it definitely isn't "gprof".
Dealing strictly with wall-clock times is, well, difficult.  I suppose
the ideal tool would be able to count the CPU time consumed by the
reductions made by a function *and* the CPU time consumed by BIFs
*and* somehow account for CPU time consumed by scheduling, driver
overhead, etc.  Oh, and while I'm dreaming, we might as well sum all
the time consumed by function F plus all the functions F calls.  :-)

Anyhow, there have been a few things I've discovered this week that
don't make a lot of sense.  One is actually eprof-specific.  The other
is more scheduling-oriented.

The eprof question is: if I'm profiling an already-running process,
and I let it go for N seconds, why doesn't total_analyse() report N
seconds (or "close to N") worth of activity?

I've been able to answer part of the question: if the process was
asleep waiting for a message, the time from when eprof:profile() was
called and when the message arrives.

If that process never receives a message, then eprof thinks it did
nothing.  It *did* do nothing, but it would be less misleading if
eprof could say it spent 100% of its time within the blocking
function.  Giving my understanding of how eprof & process tracing
works (wall-clock timestamped messages are sent to the eprof process
whenever a function is entered, exited, or calls/returns from another
function), eprof's silence makes sense.

When I point eprof at processes that are fairly busy,
eprof:total_analyse() sums almost the entire amount of wall-clock time
spent during the profiling.  (This and all other examples use 20
seconds between eprof:profile() and eprof:stop_profiling().)  For
example:

    FUNCTION                                       CALLS      TIME    SECONDS
    gen_server:loop_wait/1                         489        98. %   18.2782
    callback_server_hack:get_key/2                 735        0.3 %    0.0586
    callback_server_hack:handle_call/3             489        0.3 %    0.0571
    gen_server:loop/6                              489        0.1 %    0.0340
    gen_server:reply/2                             489        0.1 %    0.0253
    gen_server:handle_msg/6                        489        0.1 %    0.0188
    [...]

(These reports have been hacked to show the amount of wall-clock time
 in the rightmost column.  If I remember, I'll include a patch below.
 The relative time percentage formatting is ugly, but it's good enough
 for me....)

That's pretty close to 20 seconds.  It'd be nice to know how eprof
missed the other ~1.5 seconds, but I'm not that curious...

... unless it's the same reason that causes this:

    FUNCTION                                       CALLS      TIME    SECONDS
    gen_tcp:call/2                                 928        58. %    3.6845
    gen:wait_resp_mon/3                            1009       15. %    0.9692
    fnstuff:svc_tcp_main/5                         906        8.0 %    0.5042
    [...]

It's still a 20 second sample.  If I included all the report, the
wall-clock time only adds up to 6.6 seconds.  Since I'm reallyreally
interested in profiling what this process is doing, it would be
wonderful to know where the phantom time is.

That second example leads me to my other question: why would
gen_tcp:call/2 suck up so much time?  (In my application,
gen_udp:call/2 does the same thing.)  To be honest, I believe I've
answered that question, but the answer doesn't help (much) to fix the
underlying problem.

gen_tcp:call/2 looks like a classic example of what make_ref() is used
for: uniquely tagging messages so that you can grab only the response
to that packet.  Process-wise, things look like:

  my process  <--> inet_tcp:socket_loop/3 <--> Port handling the <-> The
                   and friends                 socket                Network


As far as I can tell, the reason gen_tcp:call/2 takes so much time is
that my system is actually busy enough that the message sent from my
process -> inet_tcp:socket_loop/3 (via gen_tcp:send()) takes about
2.3ms (per call!) to be "receive"d by the recipient process.  Almost
immediately, the recipient sends back an acknowledgement, but it takes
that long again before the ack is "receive"d by my process.

Ideally, that data should be *on*the*wire* as soon as possible.  I'd
love to get rid of most of that 2.3ms latency, but I'm at a loss how.
Well, short of playing games with the BEAM process scheduler.(**) Or
reinventing the gen_tcp and gen_udp wheels by having my process talk
directly to the port.  :-(

Any ideas from enlightened subscribers?

-Scott

(*) Good thing I started it shortly before noon!

(**) I'm sure my colleages will be delighted to know I've been
contemplating such things ... like a "!!" or some other hack to put
the sending process to sleep ASAP and wake up the recipient ASAP.
{sigh}

---
Scott Lystig Fritchie, <scott@REDACTED>
5401 - 10th Ave S, Minneapolis, MN 55417 USA
office: 612.827.2835, cell: 612.805.1383
Professional Governing: Is It Faked?

--- snip --- snip --- snip --- snip --- snip --- snip --- snip --- snip ---

--- eprof.erl.orig      Fri Feb 18 19:21:03 2000
+++ eprof.erl   Fri Feb 18 19:17:34 2000
@@ -302,7 +302,7 @@
 total_analyse(T) ->
     Pcalls = reverse(keysort(2, replicas(ets:tab2list(T)))),
     Time = collect_times(Pcalls),
-    format("FUNCTION~44s      TIME ~n", ["CALLS"]),   
+    format("FUNCTION~44s      TIME    SECONDS~n", ["CALLS"]),   
     printit(Pcalls, Time).
 
 analyse(notable) -> 
@@ -327,12 +327,12 @@
 
 printit([],_) -> ok;
 printit([{{Mod,Fun,Arity}, Time, Calls} |Tail], ProcTime)  ->
-    format("~s  ~s ~s % ~n", [ff(Mod,Fun,Arity), fint(Calls),
-                             fpf(100*(divide(Time,ProcTime)))]),
+    format("~s  ~s ~3s % ~9.4f~n", [ff(Mod,Fun,Arity), fint(Calls),
+                             fpf(100*(divide(Time,ProcTime))), Time/1000000]),
     printit(Tail, ProcTime);
 printit([{{_,{Mod,Fun,Arity}}, Time, Calls} |Tail], ProcTime)  ->
-    format("~s  ~s ~s % ~n", [ff(Mod,Fun,Arity), fint(Calls),
-                             fpf(100*(divide(Time,ProcTime)))]),
+    format("~s  ~s ~3s % ~9.4f~n", [ff(Mod,Fun,Arity), fint(Calls),
+                             fpf(100*(divide(Time,ProcTime))), Time/1000000]),
     printit(Tail, ProcTime); 
 printit([_|T], Time) ->
     printit(T, Time).
@@ -351,7 +351,7 @@
 strip_tail([H|T], I) -> [H|strip_tail(T, I-1)];
 strip_tail([], I) -> [].
 
-fpf(F) -> strip_tail(flatten(io_lib:format("~w", [round(F)])), 5).
+fpf(F) -> strip_tail(flatten(io_lib:format("~.2f", [F])), 5).
 fint(Int) -> pad(flatten(io_lib:format("~w",[Int])), 10).
 
 mklist(0) -> [];