[erlang-questions] Trace-Driven Development

Ulf Wiger ulf@REDACTED
Tue Jun 5 13:56:53 CEST 2012

On 5 Jun 2012, at 12:03, Michael Turner wrote:

> I've tried to stay on the point: if
> you're using Lamport clocks, exposing them in an API, *admitting* (as
> Kenneth Lundin did, on this list, in 2007) that you're using them,
> then the documentation should say so.

I have agreed that it wouldn't hurt to add that to the existing 
documentation, but have also argued that one needs to 
remember the *purpose* of seq_trace, and discuss whether
the current API is the right one, and what changes to the 
documentation would best help users to make use of it.

It could well be that such a process would result in an API 
documentation that *does not* expose Lamport clocks, or 
(as I suggested), creates a separate component that exposes
Lamport clocks in a more obvious and generic way.

> To save people time, if nothing
> else. But especially so that people who are looking for a Lamport
> clock implementation in Erlang will be able to find it easily in
> searches.

If that is the purpose, then creating a separate 'lamport' module
would be a much better solution, obviously.

>> I claim that seq_trace implements Lamport clocks "by
>> accident", and that it was not the original purpose, nor a complete
>> solution to the problem.
> (a) You have no evidence of this,

I was there, remember?

With 'by accident' I mean that they could have solved it differently,
and then the API would not have exposed Lamport clocks, and 
they would still have fulfilled the requirement.

I'm not saying they didn't know they were using Lamport clocks.
I'm saying it was not what the customer asked for, and the man
page, such as it is, reflects what the customer had ordered.

I guess a better way of putting it is that it was coincidental.

> As to the "original purpose" of "accidentally" implementing Lamport
> clocks in seq_trace, what, pray tell, WAS the original purpose of an
> "accidental" implementation of them, if it wasn't basically the same
> as Lamport's purpose? Just to have some intriguing pairs of numbers to
> look at, in otherwise-boring traces?

What the original purpose was is exactly what I have tried to 
tell you. I won't repeat it here.

I was not, however, present when Leslie Lamport started thinking
about Lamport clocks, so I can't speak from own experience.
But as you mentioned yourself, he traces it back to a different 

"The origin of this paper was a note titled The Maintenance of 
Duplicate Databases by Paul Johnson and Bob Thomas.  
I believe their note introduced the idea of using message 
timestamps in a distributed algorithm. […]
Because Thomas and Johnson didn't understand exactly what 
they were doing, they didn't get the algorithm quite right; their 
algorithm permitted anomalous behavior that essentially 
violated causality.  I quickly wrote a short note pointing this 
out and correcting the algorithm. 
"It didn't take me long to realize that an algorithm for totally ordering 
events could be used to implement any distributed system.  
A distributed system can be described as a particular sequential 
state machine that is implemented with a network of processors.  
The ability to totally order the input requests leads immediately 
to an algorithm to implement an arbitrary state machine by a 
network of processors, and hence to implement any distributed 
system.  So, I wrote this paper, which is about how to implement 
an arbitrary distributed state machine.  As an illustration, I used 
the simplest example of a distributed system I could think of--a 
distributed mutual exclusion algorithm."

My read on that: he didn't originally set out to solve the problem
of capturing sequence traces in a real-time system, but noted
after a while that his proposed solution was extremely general.

The OTP team could have set out to implement sequence 
tracing, decided to do it using Lamport clocks, then realizing that
the implementation could easily be generalized, and changed
the API and documentation accordingly.

This is not what happened. It could still happen.

>> or figure out how to build a documentation patch yourself.
> If in fact you represent the Ericsson point of view on this issue, I'd
> be wasting my time: it'll be rejected for the reason that "we invented
> that independently." I'd like to know what the Ericsson point of view
> is, before I try something that might be futile.

I don't represent Ericsson, and that is not at all what I have been saying.

> So far, only Robert Virding has spoken up on this issue. I
> pointed out to him that he's coauthor on a 1993 publication that cites
> Lamport's paper -- long before the 1997-8 timeframe you give for
> seq_trace requirements acquisition and implementation. He hasn't
> responded since.

What Robert wrote was that he was not part of the team that 
implemented seq_trace. He also doesn't represent Ericsson

>> your refusal to accept that the seq_trace API was not
>> meant to implement Lamport clocks, and might well depart from them
> If you mean "might well depart" in the *future*, how is that a "minor
> interface adjustment"? SerialInfo - the timestamp for Lamport logical
> clocks - is all over the API. No, that's a "major algorithm change."

The man page speaks of minor adjustments. I argue that one should
perhaps consider a major overhaul. For B/W compatibility, it would
be better to introduce a new, better API.

From your other mail:

> But changing "seq_trace" to "lamport" is
> (a) semantically wrong, since seq_trace *implements* Lamport clocks
> but is not *simply* Lamport clocks,
> and
> (b) pragmatically wrong, since it breaks any existing code that
> depends on seq_trace, and also breaks anything out there that has
> implemented a module called "lamport" independently.

I didn't suggest making an incompatible change to seq_trace,
but basing the initial implementation of 'lamport' on the 
seq_trace implementation.

The unprefixed namespace belongs to OTP. This matter has
been debated many times. It's not a very good setup, but 
so it is.

> If you mean "might well depart" *now*, why did Kenneth Lundin say it
> implements them? If he's wrong, why did nobody in Ericsson correct
> him? Lamport clocks are Lamport clocks, regardless of "intent."

So they are, but that doesn't necessarily mean that their use should
be commited to the API and highlighted in the documentation.
Note - this is meant in general terms, as I didn't object to adding
a reference to Lamport in the current seq_trace documentation.

An example: the Erlang docs describe how the VM samples the 
length of the receiver's message queue when sending a message,
and penalizes the sender with extra reductions if the queue length
exceeds a certain threshold.

This was a cool way to implement poor-man's flow control in a 
single-core system, but in a many-core system, it's a pretty bad idea.
As it's been committed to the documentation, it is harder to change
now that it is arguably more of a burden than a feature.

This is an example of why it is so important to go back to the 
original purpose, and ask, as Bjarne was wont to say "What's 
the bloody problem?" What problem did we originally set out to 
solve, and what changes might be needed to ensure that we 
solve that problem well - and keep solving it well based on where
we're heading?

Some things are better not added to the reference manual.

>> That seq_trace is completely independent of the built-in tracing is also
>> misleading.
> WTF? Where did I say it was "completely independent"? Where did anyone
> say it was?

It's in the seq_trace man page. First paragraph of Description,
second sentence. I didn't claim you said it, but could have 
been clearer about that - apologies.

>> ... one of the biggest drawbacks
>> of erlang's tracing support is that only one tracer per process at a time
>> can be supported.
> *Boink*. seq_trace is *part* of "erlang's tracing support". In what
> way is it limited to "one tracer *per* process at a time"?

Erlang's tracing support allows only one tracer (process) 
per process. This is a well-known and documented limitation.

The seq_trace system_tracer allows only one per node.

A generic lamport clock implementation has no need for
such limitations. And while you could use seq_trace for 
other purposes, these issues become distracting.

As it is, seq_trace is caught somewhere in the middle. It is a 
fairly nice implementation of Lamport clocks, but not really 
intended for, or entirely fit for, use as a generic Lamport clock
implementation. As a solution for sequence/transaction/forlopp
tracing, it is half-baked and slightly confusing. It kindof works,
but few people understand it well enough to use it. Even OTP
doesn't necessarily use it in all places where it would fit.

This indicates that the seq_trace API and docs could evolve
in either of two different directions - or be reworked, split,
and made more inutitive, addressing both issues at the 
same time.

>> (And no, the failure to mention Erlang's support for real-time
>> tracing in that thesis is more likely to be due to internal rivalry,
>> or simply lack of interest in technologies that they can't use
>> anyway - Erlang due to past policy issues and AXE since it's
>> a legacy system using a weird programming language).
> *Double boink*. The thesis was submitted in late 2008. I see Kenneth
> Lundin affirming, in mid-2007, that seq_trace implements Lamport
> clocks, for an Erlang/OTP that had been open sourced for years by
> then. You're saying it's possible that the author of that thesis might
> not have been able to find out about seq_trace (or Lamport clocks), or
> was not able to use seq_trace (or its Lamport clocks), because of
> "internal rivalry" or "past policy issues"???

Not sure what you mean by "boink". I read it as an insult, but 
perhaps it's just your way of expressing surprise?

> Things must be a lot weirder in there than I ever suspected.

Indeed. But that's definitely a side track. I just mentioned in 
passing that you'd think a thesis about *real-time tracing*
(not Lamport clocks, although it mentions them, and other
techniques) should mention the two best exponents of 
real-time tracing in the company. Again note the bigger 
picture here - *tracing*. Global ordering of events is one
challenge. There are others.

> And yet I'm supposed to be confident that if I submit a patch to
> the seq_trace documentation informing users that it implements
> Lamport clocks, it's very likely to be taken up?

If it improves the documentation, yes, of course.
This was seconded by Gustav Simonsson, who works in the OTP
team. He even suggested where to put it.

The OTP team is by no means hampered by any policy (esp. 
*past* policy) decisions not to mention Erlang.

Ulf W

Ulf Wiger, Co-founder & Developer Advocate, Feuerlabs Inc.

More information about the erlang-questions mailing list