[erlang-questions] Trace-Driven Development

Michael Turner michael.eugene.turner@REDACTED
Tue Jun 5 17:14:56 CEST 2012


On Tue, Jun 5, 2012 at 8:56 PM, Ulf Wiger <ulf@REDACTED> wrote:
>
> On 5 Jun 2012, at 12:03, Michael Turner wrote:
>
>> I've tried to stay on the point: if
>> you're using Lamport clocks, exposing them in an API, *admitting* (as
>> Kenneth Lundin did, on this list, in 2007) that you're using them,
>> then the documentation should say so.
>
> I have agreed that it wouldn't hurt to add that to the existing
> documentation, ...

Actually, you've said that it could hurt -- by committing to a
decision that you think might be premature. (Somehow. After 15 years.)

> but have also argued that one needs to
> remember the *purpose* of seq_trace, ...

I can't remember any purpose of seq_trace that's not described in the
documentation for it. And that documentation says, right there:

"Sequential tracing makes it possible to trace all messages resulting
from one initial message."

There's nothing in Lamport clocks that's in conflict with that purpose.

At one point it says:

"In the following sections Sequential Tracing and its most fundamental
concepts are described."

And in the first of those following sections, it says:

"The purpose [of the trace token component, Serial, with its Previous
and Current] is to uniquely identify each traced event within a trace
sequence and to order the messages chronologically and in the
different branches if any."

Then it goes on to describe "the algorithm" for updating those
counters. Not "an algorithm." The algorithm.

If Lamport clocks are not part and parcel of the purpose of seq_trace
according to the reference manual, then I guess you're privy to some
*secret* purpose of seq_trace.

> ... and discuss whether
> the current API is the right one, and what changes to the
> documentation would best help users to make use of it.

That need has existed for a while. *Failing* to describe seq_trace as
implementing Lamport clocks can only have worked against that need. It
means lots of people who might otherwise have had intelligent things
to say about the current API and how the documentation could be
improved have not even had seq_trace come to their attention. And
that's a lot of people. It's potentially everyone interested in Erlang
who has also had the relevant education. Is there a textbook out there
on distributed systems and MIMD parallel processing that *doesn't*
bring up Lamport clocks?

> It could well be that such a process would result in an API
> documentation that *does not* expose Lamport clocks, or
> (as I suggested), creates a separate component that exposes
> Lamport clocks in a more obvious and generic way.

Hiding useful functionality? Changing the API to break existing code?
It may well be that the sky can be turned a nice shade of green, then
yellow.

You want an "obvious and generic way" to "expose" Lamport clocks?
seq_trace already does it (except in being so subtle as to not name
them as such in the documentation.) You don't have to use those
Lamport clocks in seq_trace if you don't want to.

And I can't imagine a way to use Lamport clocks without also doing
tracing -- that's practically what they are for. Unless by "tracing"
you (narrowly) mean "written output to puzzle over for debugging
purpose."

MY purpose is testing (the "trace-driven development" of this thread,
if anything), and when my tests pass, after the tracer process has
seen messaging behavior that conforms to my spec, I don't want to see
*anything* on the output. seq_trace as it stands can do that for me.
Why should I complain?

>> To save people time, if nothing
>> else. But especially so that people who are looking for a Lamport
>> clock implementation in Erlang will be able to find it easily in
>> searches.
>
> If that is the purpose, then creating a separate 'lamport' module
> would be a much better solution, obviously.

Oh, yeah, obviously. Look, Lamport clocks exist to trace the behavior
of processes, and seq_trace can be used without using its Lamport
clocks. Why is separation better than just maintaining backward
compatibility, at this point?

>>
>>> I claim that seq_trace implements Lamport clocks "by
>>> accident", and that it was not the original purpose, nor a complete
>>> solution to the problem.
>>
>> (a) You have no evidence of this,
>
> I was there, remember?

No, I don't remember. How can I? I wasn't there. A claim that you
independently re-invented something by accident, after years of
exposure to co-workers who clearly know what that thing is (if their
book is any indication -- and you must have read that book) does not
qualify as evidence. And since below you contradict yourself, saying
first that Lamport clock behavior was a customer requirement, then
saying the OTP had discretion over whether to implement them, I don't
have much reason to trust your memory.

> With 'by accident' I mean that they could have solved it differently,
> and then the API would not have exposed Lamport clocks, and
> they would still have fulfilled the requirement.

Without exposure of the Lamport clocks in the seq_trace API, there's
no reason to implement them in seq_trace in the first place. Don't you
understand what they do?

> I'm not saying they didn't know they were using Lamport clocks.
> I'm saying it was not what the customer asked for, and the man
> page, such as it is, reflects what the customer had ordered.

You can't have it both ways, Ulf. The man page "reflects" Lamport
clocks, so you're saying the customer was asking for Lamport clocks in
their tracer (whether they called them that or not). Lamport clocks
are cited in the first book on Erlang. Lamport clocks are part of the
implementation of Mnesia. So you're saying that the customer ordered a
certain behavior, and nobody in the Erlang group recognized that the
customer was asking for Lamport clocks? Making it an "accident"? What
sense does that make?

> I guess a better way of putting it is that it was coincidental.

Color me incredulous.

>> As to the "original purpose" of "accidentally" implementing Lamport
>> clocks in seq_trace, what, pray tell, WAS the original purpose of an
>> "accidental" implementation of them, if it wasn't basically the same
>> as Lamport's purpose? Just to have some intriguing pairs of numbers to
>> look at, in otherwise-boring traces?
>
> What the original purpose was is exactly what I have tried to
> tell you. I won't repeat it here.

You have not told me what that purpose putting Lamport clocks in seq_trace was.

The purpose? As far as any reasonable reader should be concerned, the
document *defines* seq_trace in terms of Lamport clocks -- see my
excerpts above. That makes Lamport clocks integral to its purpose. At
worst, the reader should be prepared for changes to the *interface*,
not the implementation.. You seem to think that seq_trace could have
hidden Lamport clocks, when in fact hiding them would only have
defeated their purpose in a tracing package. This makes no sense at
all.

[snip long quote from Lamport]:
> My read on that: he didn't originally set out to solve the problem
> of capturing sequence traces in a real-time system, but noted
> after a while that his proposed solution was extremely general.

So what? I'm not crediting Lamport with seq_trace, much less with AXE
forlopp's. I'm only seeking credit for Lamport in the seq_trace
documentation. Which he deserves. And which we all deserve, since it
makes it easier to find his work in Erlang if you already know his
work, and easier (upon reading this fact in the seq_trace
documentation) to find other people's work where it uses Lamport
clocks for various practical purposes, results that might be
implemented in Erlang, redounding the the benefit and glory of
Erlang/OTP in the process. How does anybody lose? I don't get it.

> The OTP team could have set out to implement sequence
> tracing, decided to do it using Lamport clocks, then realizing that
> the implementation could easily be generalized, and changed
> the API and documentation accordingly.

First you say (above) that Lamport clocks were a customer requirement.
Now you're saying the OTP team had discretion in this matter. It can't
be both.

> This is not what happened. It could still happen.

And I could skate across hell -- when it freezes over.

[snip comments about Virding's contribution to this debate.]

> The man page speaks of minor adjustments. I argue that one should
> perhaps consider a major overhaul. For B/W compatibility, it would
> be better to introduce a new, better API.

Well, you're free to fork Erlang/OTP and try to sell people on the
result. As it is, not having Lamport clocks mentioned explicitly in
the seq_trace documentation means that there's basically no customer
base to address anyway, since hardly anybody ever found out they were
in there.

> I didn't suggest making an incompatible change to seq_trace,

You've repeatedly suggested it might be desirable. Even in this e-mail.

> but basing the initial implementation of 'lamport' on the
> seq_trace implementation.

But if Lamport clocks subsequently disappear from the seq_trace
implementation, as you seem to think should happen, you've created
backwards incompatibility. So what's the difference?

>> If you mean "might well depart" *now*, why did Kenneth Lundin say it
>> implements them? If he's wrong, why did nobody in Ericsson correct
>> him? Lamport clocks are Lamport clocks, regardless of "intent."
>
> So they are, but that doesn't necessarily mean that their use should
> be commited to the API and highlighted in the documentation.

As far as any reasonable reader should be concerned, the document
*defines* seq_trace in terms of Lamport clocks -- see my excerpts
above. At worst, the reader should be prepared for changes to the
*interface*, not the implementation.

> Note - this is meant in general terms, as I didn't object to adding
> a reference to Lamport in the current seq_trace documentation.

Yes, you did. You openly feared it would overcommit Erlang/OTP to
Lamport clocks in the implementation of seq_trace.

> An example: the Erlang docs describe how the VM samples the
> length of the receiver's message queue when sending a message,
> and penalizes the sender with extra reductions if the queue length
> exceeds a certain threshold.....
> As it's been committed to the documentation, it is harder to change
> now that it is arguably more of a burden than a feature.

If Erlang/OTP has overcommitted itself on one point, that still says
nothing about whether seq_trace also has. I don't see where it does,
and I've been using it for months. If you don't want to use
seq_trace's Lamport clocks, you don't have to. (I don't -yet.) You
will pay for it only in a counter-increment and some copying of those
counters on each trace call -- computational costs that are completely
overwhelmed, I'm sure, by everything else required to do any tracing
at all. As for using Lamport clocks independent of seq_trace *as a
debugging tool*, I see no reason why people can't, nor much reason why
they should be bothered by the fact that they are using a package
originally intended for debug traces -- it won't pose any significant
added burden on them, either computationally or in coding keystrokes,
over having a separate implementation. (If you can even *have* a
separate implementation of Lamport clocks that doesn't basically
replicate almost everything seq_trace does.)

> This is an example of why it is so important to go back to the
> original purpose, and ask, as Bjarne was wont to say "What's
> the bloody problem?" What problem did we originally set out to
> solve, and what changes might be needed to ensure that we
> solve that problem well - and keep solving it well based on where
> we're heading?

MY problem was that I needed to record a message traffic pattern, and
a way to reasonably order those messages in order to establish whether
that pattern is canonical for my purposes. seq_trace does that for me.
I bet it could also do that job for Riak's vector clocks (if it
doesn't already -- and if it does, that's yet another argument for "it
ain't broke, so don't fix it.")

> Some things are better not added to the reference manual.

Give me an argument that this is such a case. A concrete argument, not
a handwaving one.

>> WTF? Where did I say it was "completely independent"? Where did anyone
>> say it was?
>
> It's in the seq_trace man page. First paragraph of Description,
> second sentence. I didn't claim you said it, but could have
> been clearer about that - apologies.

Yes, if you want to report a bug against the documentation, go ahead.
But it might actually be true enough -- i.e., that you could remove
other tracing APIs from Erlang and seq_trace would still work just
fine. "Completely independent" might be bad writing, but not
necessarily *technically* inaccurate.

>>> ... one of the biggest drawbacks
>>> of erlang's tracing support is that only one tracer per process at a time
>>> can be supported.
>>
>> *Boink*. seq_trace is *part* of "erlang's tracing support". In what
>> way is it limited to "one tracer *per* process at a time"?
>
> Erlang's tracing support allows only one tracer (process)
> per process. This is a well-known and documented limitation.

I wouldn't know, since the documentation scared me off. As already noted.

> The seq_trace system_tracer allows only one per node.

An unfortunate limitation.

> A generic lamport clock implementation has no need for
> such limitations.

Gosh, could it be that if the documentation for seq_trace had always
said it implemented Lamport clocks, this shortcoming would have come
to light much sooner and been remedied long ago?

> And while you could use seq_trace for
> other purposes, these issues become distracting.

I guess it depends on how distractable you are. I find the relative
simplicity of seq_trace a source of consolation: for now, it's keeping
me out of something Jason described as "a special kind of hell." And I
find the existence of 2000+ publications citing Lamport's paper
encouraging as well: I can probably use seq_trace to solve a wide
variety of testing problems.

> As it is, seq_trace is caught somewhere in the middle. It is a
> fairly nice implementation of Lamport clocks, but not really
> intended for, or entirely fit for, use as a generic Lamport clock
> implementation. As a solution for sequence/transaction/forlopp
> tracing, it is half-baked and slightly confusing. It kindof works,
> but few people understand it well enough to use it. Even OTP
> doesn't necessarily use it in all places where it would fit.

Yep. Could the fact that it was never openly identified as
implementing Lamport clocks explain, in large part, why it has
remained obscure? "Oh look, behind the shed: a wheel like the one
we're working on now. Huh. What was it doing behind the shed?"

> This indicates that the seq_trace API and docs could evolve
> in either of two different directions - or be reworked, split,
> and made more inutitive, addressing both issues at the
> same time.

Oh, whatever. Just don't break the current API, OK?

[snip]
> Not sure what you mean by "boink". I read it as an insult, but
> perhaps it's just your way of expressing surprise?

Yes.

[snip]
>> And yet I'm supposed to be confident that if I submit a patch to
>> the seq_trace documentation informing users that it implements
>> Lamport clocks, it's very likely to be taken up?
>
> If it improves the documentation, yes, of course.

The issues here go straight to the question of what constitutes an
"improvement" in this case.

> This was seconded by Gustav Simonsson, who works in the OTP
> team. He even suggested where to put it.

Great. But what's the hangup with just *doing* it? Do you have to
first circulate memos among Ericsson's lawyers or something?

> The OTP team is by no means hampered by any policy (esp.
> *past* policy) decisions not to mention Erlang.

Well, I sure hope people are allowed to mention Erlang, if they work
on OTP. It could get awkward always having to say "that language"
instead.


-michael turner



More information about the erlang-questions mailing list