[erlang-questions] Trace-Driven Development

Wed Jun 6 08:55:55 CEST 2012

On 6 Jun 2012, at 08:02, Michael Turner wrote:

> I'd like to know. I'm guessing that one of the big problems I've had
> with understanding Ulf here is that, for him, tracing is, by
> definition, a way to collect data about *anomalous* behavior. To me,
> tracing is "selectively (and only on occasion) collecting data about
> behavior." Period. You can do whatever you want with that data. The
> behavior doesn't have to be pathological. In fact, you can use the
> data as some assurance of correctness - the "occasion" can be running
> a test suite. Which is to say, the "trace-driven development" of this
> thread.

No, I don't think that.

For sure, tracing is indispensable for debugging - not least since 
you can trace on exceptions - but it is equally important for
profiling, for example.

There are several uses that one could imagine for permanent 
service in a live system: event triggers, memory monitoring,
etc. Unfortunately, turning on such tracers would mean that 
the processes being thus monitored couldn't be traced for
purposes of debugging.

This makes the (one tracer per process) limitation more 
limiting that it may at first seem, and forces most people to
reserve tracing for debugging and profiling purposes during
testing.

That Lamport clocks are useful for other things was illustrated
even by Lamport in his original paper, as he used them to 
solve the mutual exclusion problem (in his later musings,
he noted that some people thought the paper was *only*
about implementing mutexes:

> Many computer scientists claim to have read it.  But I have 
> rarely encountered anyone who was aware that the paper 
> said anything about state machines.  People seem to think 
> that it is about either the causality relation on events in a 
> distributed system, or the distributed mutual exclusion 
> problem.  People have insisted that there is nothing about 
> state machines in the paper.  I've even had to go back and 
> reread it to convince myself that I really did remember what
> I had written. (http://research.microsoft.com/en-us/um/people/lamport/pubs/pubs.html#time-clocks)

It would be absolutely brilliant to have native and *general* 
support for Lamport clocks in the Erlang VM. You think they
already exist, in seq_trace, whereas I think Lamport clocks
are only coincidentally exposed there as part of a solution
to selectively observe sequences of events in a running 
system, subject to the usual limitations of the tracing sub-
system - limitations that in practice render them near-useless 
for other purposes, even if OTP were to approve of such
uses, which they don't.

(In the email where Kenneth admitted that seq_trace 
implemented Lamport clocks, he also wrote that you shouldn't
use them for any other purpose than that described
in the seq_trace docs.)

Changing this would require a strategic decision and some
deep thinking from the OTP team.

Granted, for *your* intended purpose, they are absolutely 
fine. It's entirely in line with what they were first made for.

(In fact, Quviq's QuickCheck relies on Lamport's "happens
before" relation to reduce the state space during 
random testing of concurrent Erlang code. They don't
use seq_trace, though - nor, normally, the built-in tracing.)

BR,
Ulf W

Ulf Wiger, Co-founder & Developer Advocate, Feuerlabs Inc.
http://feuerlabs.com