[erlang-questions] FOP (was: Re: Trace-Driven Development)

Fri Jun 8 20:01:52 CEST 2012

Ulf, I've been using it seq_trace for months. I know what it does.
It's pretty clear what it's *typically* for. That's not to say there
shouldn't be a tutorial.

If Lamport clocks are (part of) what the customer asked for, there's
no mystery hiding behind the NDA. What they asked for has already been
been capital-D Disclosed -- in the seq_trace document. It just hasn't
been given its proper name there.

> Rather than 'independent' of the standard tracing, I would
> say that sequence tracing is 'orthogonal'.

I'm glad I wasn't drinking anything when I read that, because it would
have spurted out of my nose. But I guess I get it anyway.

[snip a long explication of what I already thought was pretty obvious]
> Is this all clear from the seq_trace manual? No.

If someone wants to write detailed use-cases for seq_trace, I'm all in
favor of that.

I just want to get Lamport the credit he's due, while also helping to
inform those who already know of his work that they can get Lamport
clocks out of Erlang/OTP, if they need them. That can be done in four
added words, words which nobody seems to willing to explicitly commit
to, much less actually add themselves. No, I have to write a patch.
Otherwise it won't happen. Weird. As if the *ethical* onus here is not
on the provider of the documentation.

> That seq_trace makes use of Lamport clocks is an added
> bonus, and at least the tools in OTP, like e.g. ttb and et,
> should take advantage of this whenever possible. As it
> is now, they don't (or I missed it, which is also a possibility).

Or everybody missed it, because when they searched the documentation
on "Lamport clocks", under the very reasonable assumption that someone
writing them up would call them what they're actually called, nothing
turned up.

And I fear it's because of ego: people at Ericsson still think they
invented this everything in seq_trace first, maybe somewhere in
forlopp tracing ....

> Is this all clear from the seq_trace manual? No.
> It's easy to get the idea that it was created for a different
> purpose entirely ...

I have no idea why you think that. Among my first thoughts, upon
encountering seq_trace, was, "hey, that's cool: if I've got zillions
of messages flying around, and I'm only interested in those that
originate from one process (or a few of them) and kick off yet others
in a kind of chain reaction, seq_trace helps me reduce overhead. Yes,
I have uses for that."

Then I discovered that it implements Lamport clocks. I was even
happier. In fact, it implements almost nothing *but* Lamport clocks --
which implicitly include tokens being passed between processes that
might be looking at skewed clocks - so almost everything you describe
falls out as the simplest default choice of what messages to record.
But so what?

We agree that it has a bad name -- I think if you're going to call it
a trace package, call it "par_trace" or something. But so what?
Computer technology is full of misnomers.

> But it sounds as if you are putting seq_trace to good use,
> in a way that is different from what I describe above.
> Boiling this down to an example would be a great contribution.

Um, no. I'll be trying to use seq_trace to verify a partial ordering
of multiple default inheritance operations driven by spreading
activation in (natural) language network descriptions. Somewhat as
described by Richard Hudson in his last two books on Word Grammar.
Except that he's a little scared of doing the inheritance in a
spreading-activation style (which is already very important in his
theory), and I'm trying to find out whether he's actually right to be
scared. I don't think he is - I think the win is always much bigger
than any possible loss. I've convinced him of other (less ambitious)
operational simplifications for his theory, and maybe I can nail this
one, too.

It's kind of an annoying testing problem, because you don't care about
the order or precise timing for some events, but the order definitely
has to be right in other ways - i.e., your basic partial order, which
is something Lamport clocks ("happened before" relation) gives you.
It's not like, "I sent this message, I should see the following exact
sequence of resulting messages." Then it would be easy. Or easiER,
anyway.

I *could* write these tests as simply a process that sends activation
messages to some language-network concept nodes (each of which is a
process), then send an "inherit" message to the concept node that's
supposed to inherit, and wait around to see what it inherits. The
problem with that: if it doesn't get the right traits, I'll have no
idea why. seq_trace is cool because it can tell me what happened. It's
also cool for regression tests because if the right traits are
inherited (and other things happen in the right order) the test
silently passes. It's also pretty scalable from what I can tell. I can
parallelize my tests. That's important because human languages are big
and will require a lot of test cases.

But I understand the need for a tutorial for more typical uses of
seq_trace. This whole thread is, amazingly, is just a lot of topic
drift from somebody asking "What's the equivalent of Hello World for
Erlang tracing?" For seq_trace, at least in a testing role, the Hello
World should be a kind of unnaturally simple problem like this:

   Make sure that when process A sends a message X to process B,
   B sends a message Y to process C as a result,
   and C receives it.

So you provide a tracer that sends A an acknowledgement if it sees C
get Y after A sends X to B. Process A waits in a "receive/until". The
test fails if that receive/until times out. And in fact a lot of my
tests so far are just a process sending "inherit" up the inheritance
chain(s) in the network of concept processes and then waiting for the
tracer to report that a certain message arrived at some concept node.
I'll need to go beyond this, soon enough. But it's good enough for
where I am now.

-michael turner

On Sat, Jun 9, 2012 at 1:50 AM, Ulf Wiger <ulf@REDACTED> wrote:
>
> On 8 Jun 2012, at 17:51, Michael Turner wrote:
>
>> Perhaps they expressed them as, "we have clock skew problems in our
>> distributed system, and we need some way to correctly sequence our
>> traces in spite of that." Lamport clocks are a simple, classical
>> solution to that problem.
>
> No (or, if they asked for that, I'm not aware of it. The OTP team
> can correct me if I'm wrong, but that was not the impetus to seq_trace.)
>
> I will try one more time.
>
> Tracing a sequence, as in "our system handles a thousand call
> setups per second, and if we turn on a trace on all of them, we
> will not only learn nothing - we will kill the system. We need a way
> to trigger trace output on *one* session in the midst, and have
> that trigger 'contaminate' processes as the request is passed
> around in the system, and then, obviously, turn off, so we get
> only what we asked for, and nothing more".
>
> Effectively, automatically selecting trace output so that it looks
> as if we traced everything and ran only one single call through
> the system (which is what most people resort to).
>
> If we call it "session trace", does that make it clearer?
>
> Obviously ordering (sequencing) is *one* part of the problem,
> for which Lamport clocks are a great solution. But the part where
> tokens act as "probes" whizzing through the system activating
> trace output selectively, is part of trying to reduce the amount
> of trace data generated.
>
> A large part of the complexity of the tracing subsystem in Erlang
> comes from the need of the user to be able to define, ad-hoc,
> just the right filters so that one can get useful trace output without
> killing the system. While you can accomplish a "session-specific"
> trace just using pattern-matching on function calls, this quickly
> becomes unwieldy. Usually, you just want to enable a wide trace,
> to include all important calls, but *only* for the one session you
> decide to trace - not for the perhaps hundreds or thousands of
> other sessions that may touch the same process.
>
> For this, the trace patterns in the tracing subsystem can match
> on function call parameters and dynamically set and clear
> trace tokens.
>
> Rather than 'independent' of the standard tracing, I would
> say that sequence tracing is 'orthogonal'. The standard trace
> is great for tracing on a small set of functions or modules, or
> showing all activity in one or a few processes. But in large
> systems under commercial load, doing any kind of tracing is
> really scary. Some Erlang old-timers are known to explain
> how they took down entire mobile networks by carelessly
> setting up a wrong trace.
>
> This is my take on this area. In most practical uses, the
> ordering one gets from timestamps is perfectly fine (for
> tracing - *not* if one really wants to ensure that the trace
> reflects the exact causal order. The hardest problem in
> the scenario I describe is avoiding killing the node or
> at least getting so much trace data that any analysis
> of it becomes prohibitively hard.
>
> That seq_trace makes use of Lamport clocks is an added
> bonus, and at least the tools in OTP, like e.g. ttb and et,
> should take advantage of this whenever possible. As it
> is now, they don't (or I missed it, which is also a possibility).
>
> Is this all clear from the seq_trace manual? No.
> It's easy to get the idea that it was created for a different
> purpose entirely, and even people who seek it out wanting
> to do exactly what I describe above, tend to turn away
> frustrated. But it's a hard problem to solve, and I'm not
> saying that the current support is sub-par. If I had a clear
> idea of how to improve it, I would have submitted patches
> long ago. What I did to try to improve things was work to
> get ttb's support for saving useful trace patterns and
> replaying them later, more stable and better documented.
> This doesn't relate to seq_trace as much as to the ability
> to manage trace tokens through trace patterns.
>
> It's badly named. It should definitely not speak of
> "sequential tracing". I'm pretty sure "sequence"
> came out of "förlopp", which basically means "a
> sequence of events". The function of a "forlopp" in AXE
> is that it the smallest single unit of failure. It can be aborted
> and re-run, like a transaction.
>
> Does all this preclude adding a reference to lamport
> clocks in the seq_trace manual? Obviously not.
>
> If it bugs you so much, write a patch and submit it.
> I agree that it will cost OTP as much to vet your patch as
> it would for them to put the sentence in themselves, but
> if it's not a priority to them, and it is to you, you know what
> to do.
>
> But it sounds as if you are putting seq_trace to good use,
> in a way that is different from what I describe above.
> Boiling this down to an example would be a great contribution.
>
> BR,
> Ulf W
>
> Ulf Wiger, Co-founder & Developer Advocate, Feuerlabs Inc.
> http://feuerlabs.com
>
>
>