[erlang-questions] DTrace for Erlang

G Bulmer gbulmer@REDACTED
Tue Nov 13 19:09:23 CET 2007


Is anyone working on building an Erlang DTrace provider?

I realise DTrace isn't available on all Erlang platforms, but now  
that DTrace is available Mac OS X (Leopard), as well as Solaris (and  
FeeeBSD), I feel it might be worth doing.

About DTrace
-------------------
For those of you unfamiliar with DTrace, it is, basically, magic.

DTrace provides facilities to trace any program, and many aspects of  
the OS kernel. It has several critical properties:
1. A program does not need any changes (no need to compile for debug,  
or anything like that). All existing programs work (to some extent).
2. When not DTrace'ing, the cost of being DTrace-able is almost zero  
(claimed < 0.5%) and this cost is already included (on Solaris & OS X).
3. It is 'secure', it honours the access control mechanisms of the  
host OS.
4. DTrace can cross process boundaries, and trace the kernel itself  
(if you have the appropriate security privileges)
These features allow DTrace to be used in *PRODUCTION*.

Put another way: it is straightforward to trace through a program in  
a user process, back out through the kernel, then into other  
processes. Tracing can be activated *after* a program has been  
deployed and started *without* changing the program or restarting it.  
DTrace overhead, when not tracing a program is (claimed to be)  
essentially zero, and DTrace costs when activated are (claimed to be)  
low.

DTrace is programmable, with a scripting language a bit like awk but  
without loops; instead of awk text patterns, DTrace probes are  
pattern matched to trigger script actions. Scripting lets you  
correlate events across processes and the kernel, and scripts can  
process the data so that 'noise' can be filtered out. For example, it  
is feasible to trace from an incoming HTTP request through a web  
server, through the kernel, to an application server, and back again  
and correlate many concurrent flows. You might want to time that end- 
to-end flow, or gather intermediate timing, or watch for a particular  
access to a specific file, or ..., and the scripting language and  
functions are powerful enough to do that.

If you are interested look at http://www.sun.com/bigadmin/content/ 
dtrace/
There is an outline about how to add probes to an application here:  
http://docs.sun.com/app/docs/doc/817-6223/chp-usdt
Here is a little bit about providers and naming the probes: http:// 
www.solarisinternals.com/wiki/index.php/DTrace_Topics_Providers
Here's an article: http://www.devx.com/Java/Article/33943 (trying  
google "DTrace Java examples")
Here's more DTrace bloggyness at the 'Dtrace Three': http:// 
blogs.sun.com/ahl/  http://blogs.sun.com/bmc/  http://blogs.sun.com/mws/

I believe there is work on Ruby, Perl and Python DTrace providers too.


Why an Erlang DTrace provider?
---------------------------------------
So I hear you ask, why build a DTrace provider for erl when DTrace  
already works (on Solaris and Leopard) with erl?

Well, DTrace will show which functions are called in the erl program  
(and which file descriptors and sockets are used, etc.), but will  
*not* show you the Erlang program events directly.

Java 6 implement a DTrace provider, which surfaces Java method calls,  
class loading/unloading, the garbage collector, concurrency monitors  
etc. (rather than the details of the JVM, which are available anyway).

So, I am suggesting an erl DTrace provider would show similar things,  
specifically:
- module loading/unloading,
- function calls (and maybe exits),
- garbage collection events,
- Erlang-process state/context-switching,
- message send/receive.
This would likely be sufficient for most purposes when combined with  
the existing DTrace support for tracing TCP and UDP sockets, file  
access etc from the erl process. Mac OS X 'Leopard' comes with a  
fancy GUI to show DTrace events, so that would give something like  
the new Percept but for many different types of event, not just  
process state.

If I were building a large, complex system, availability of DTrace  
would be an important consideration. I believe several companies  
claim they moved to Solaris to get DTrace, and having used it for  
simple debugging and tuning, I can believe that.

DTrace is so useful, that the erl developers and maintainers may find  
it extremely useful for debugging, testing, and tuning Erlang itself.


Could I raise it as an EEP, or is it already on the TODO list?

Garry




More information about the erlang-questions mailing list