Line numbers in stack traces

Wed Aug 10 12:33:27 CEST 2005

On 10 Aug 2005, at 06:16, Richard A. O'Keefe wrote:

> You are confusing a *source position* (which makes sense even if,  
> as in
> Interlisp-D and Smalltalk, the code is technically held in a data  
> base)
> with *line numbers*.

OK. This is perhaps straying away from the point though. Whether I  
get a line number, or some other way of working out where my program  
died, that's got to be good news. I've just spent a good five minutes  
tracking down a badmatch inside a gen_server handle_call/3 function.  
There's a clause for each message the server handles, so saying the  
error is in handle_call/3 is effectively sending me off to look for a  
needle in a haystack - especially when the virtual machine probably  
knows (or could know) more information about where it went wrong and  
is only giving me cryptic clues so I can work it out for myself.

> Note that if a programming language is intended for use with tail
> recursion optimisation, as Prolog and Scheme and Erlang are, then
> a debugger *must* discard most dynamic line number information or
> else choke on its own stack.  (Prolog debuggers commonly choose the
> choke-on-your-own-stack option.)

Fine. I admit that optimising away tail-recursive function calls  
complicates matters somewhat, but I don't think it makes it  
impossible. "Programs are data" as you point out before, and Lisp is  
an example of a language with far more complex macro substitution  
than Erlang. If you imagine the compiler repeatedly transforming the  
S-expressions of your raw source code down into simpler and simpler  
forms which eventually converge into something which can be  
translated down into bytecode or machine instructions, then there  
*is* an inherent concept of source position which can be inferred  
from the address of the crash - you simply unravel the compilation  
process back to find out which expression in the user's code caused  
the problem. If you partially unravel the compilation process you get  
to see intermediate results of macro substitution.

Whether this gives a line number directly is irrelevant. In your Lisp  
environment where all code is pretty printed you can simply ask the  
pretty printer which line it's chosen to print that expression on  
today. Of course, if I can a location of the error more detailed than  
just a line number, I'd take that too.

> to line numbers.  Basically, Smalltalk debuggers rely on finding  
> 'call'
> instructions in the byte code and matching them up to message sends in
> the source.  (In Smalltalk-80, Squeak, and Ambrai, stack traces do NOT
> include line numbers.)

So we're talking about an implementation issue here? Instead of, say,  
bloating up your bytecode by adding a (filename, linenumber) tuple to  
each instruction, can't we build up a mapping table (the one I talk  
about above) at compile time which maps the bytecode address back up  
into the original source code? From what I recall, this is how the  
debug information is stored in a C executable.

Does this address your concerns about bloating the size of the  
bytecode in a world where the VM can handle debugging information,  
but the user has chosen to disable it? In that case, we'd just  
eliminate the source<->bytecode mapping structure and the overhead of  
"potentially" having debug information would be pretty much zero?

> If the compiler recorded the source position of the first token
> of each clause in each function, then knowing which clause of which
> function you were in would get you very close in terms of line  
> numbers.

Yes. I think that's what I'm suggesting. Actually, if you're going  
for broke, why not record the position of each expression in the  
function in our separate (optional) mapping table.

> Like well-written Smalltalk, well-written Erlang is supposed to have
> lots of *small* clauses, no?

handle_call/3 is probably the worst case I've seen at the moment.  
Lots of little function clauses which make up one big function.  
Unfortunately the stack trace doesn't tell you which *clause* it  
failed in... only the function name and arity. That's decidedly  
annoying.

> It's sufficiently difficult that people have earned PhDs for doing it.
> I think there are much better things to spend the time and money on.

Well... if it's a choice between spending my time and money trying to  
interpret somewhat vague stack traces, then I could see the  
attraction of investing some of my time sorting this out.

>     In any case, if the particular line that is reported contains a
>     macro application, then it is obvious that the problem might be
>     with the macro definition.  I don't see why that would be
>     surprising or difficult to deal with.
>
> Because it isn't "obvious" *where* the problem is within the  
> cascade of
> macro applications.

But this problem has been satisfactorily dealt with in Lisp, which  
has far more powerful and confusing macro support than Erlang.

> There is one important difference between Erlang and Java:
> the level of support.
> There is so much money and manpower behind Java that Sun can afford
> to do things (and so much muscle behind C# that Sun cannot afford NOT
> to do things) that are not necessarily a good use of resources for
> enhancing Erlang.

Well, I really don't think that expecting the stack trace to tell you  
precisely where the error occurred is something that Sun/Microsoft  
implement in their VMs purely for beauty parade purposes such that IT  
consultant produce ticks in the appropriate boxes. It's highly  
useful, and it's really winding me up that I can't see this  
information at the moment.

My initial impressions of Erlang are that I'm far more efficient  
writing "control" applications in it than I am in Python/Ruby/Tcl  
etc, and that it seems to encourage me to think in way where I  
produce fewer bugs. I'm generally much more confident that, once the  
code compiles, it'll work first time than I am in pretty much any  
other language. However, the really horrible part comes when I really  
do have to start interpreting stack traces. I feel like I'm playing a  
guessing game, and I'm just wasting my time.

It's possible I'm just not yet experienced enough in Erlang to think  
in ways which don't require me to know where the code has crashed, or  
to perhaps just grumpily accept the fact that the runtime doesn't  
tell me that. But, for now, I'd have to rank it as the most  
significant disadvantage of the language.

> So in order to support line number information (should that prove  
> to be
> usefully more precise than {module,function,arity,clause}) in a  
> debugging
> mode, it is NOT necessary to have any support for line numbers in  
> the VM.
> (As noted above, QP was able to provide source positions in  
> debugging mode
> without having any support for it in the WAM.)

I think this is exactly what I'm talking about. I just can't see why  
you don't want it.

> It is large, because the information required basically amounts to  
> undoing
> the transformations.  Inlining is just the beginning.

Doesn't this just result in a more complex source<->bytecode mapping  
table?

> My earlier mention of TRO doesn't seem to have sunk in.
> Iterative code in Erlang relies on turning the dynamically last
> call in a function into a jump.
>
> When an error (badmatch, badarith, badarg, &c) is reported, the line
> number of the actual error report doesn't tell you very much.  Quite
> often it's inside some system function.  *Your* function call which
> contains the error has very often disappeared completely from the  
> stack.

So your function call disappears from the reported stack trace too. I  
don't see how that makes the availability of more detailed code  
position information relatively less useful.

I can (just about) live with functions disappearing from the stack  
like that... and I really don't want to complicate things by throwing  
yet another idea in, but how about creating a ring-buffer in the  
virtual machine to keep track of the top n items of the "logical"  
stack. So, instead of simply producing a jump instruction at a tail- 
recursive call, why don't we write the stack frame which would have  
been generated if we didn't have this optimisation into the buffer?  
That might go some way into solving the other (but less annoying)  
erlang stack trace problem of it omitting function calls which have  
been optimised away - you'd generate the top n levels of the stack  
trace from the ring buffer, and then glue on whatever else is sitting  
below on the real stack?

> The important question is not "is adding line numbers a good way to
> improve Erlang stack traces" but "what is the best use of Erlang
> development resources to help people get their Erlang programs right?"
>
> As I think I've said, I'd rather have QuickCheck.

Oh, I don't think so. Programs crash, and all I want to know is where  
they've crashed. This information is (almost) in the virtual machine  
already, and it really annoys me to have to play a guessing game  
every time.

Richard.