Line numbers in stack traces

Thu Aug 11 03:31:49 CEST 2005

Richard Cameron <camster@REDACTED> wrote:
	I've just spent a good five minutes  
	tracking down a badmatch inside a gen_server handle_call/3 function.  
	There's a clause for each message the server handles, so saying the  
	error is in handle_call/3 is effectively sending me off to look for a  
	needle in a haystack

This is a case where the lightweight approach I've repeatedly suggested,
namely reporting

    {Module,Function_Name,Arity,Clause_Number},
                                ^^^^^^^^^^^^^

would probably give you 90% of what line numbers might have, at a fraction
of the implementation effort.

I don't know what BEAM looks like.  Is there any documentation of it,
other than the source code of the emulator (and of course the compiler
sources, all roughly 20kSLOC of them)?  genop.tab tells me there are
instructions
	call_last			-- maybe call_only?
	call_ext_last
	return
and poking around in beam_*.erl files tells me that code after these
instructions is unreachable.  So in the absence of inlining, it would
suffice to plant an extra
	{clause_number,N}
instruction after each of these.  Then, to find out the module, function,
and arity, given an instruction address, do whatever is done now, and to
find out the clause number, just parse instructions forward until the
next {clause_number,N} instruction.

A key phrase there is "in the absence of inlining".  But the Erlang
compiler *does* support inlining, both explicit, via
	-compile({inline, [{Name,Arity},...]}).
and implicit.  This means that

	...
	f(...) -> % clause N
	    ...
	    g(...)
	    ...;
	...

either has to include code to push "I'm really in g" at the beginning
of the expansion of g and pop it at the end AND refrain from merging
code in g with code before or after the call to g, which is one of the
main reasons for inlining, OR the location reported will be clause N of
f instead of clause whatever of g, OR the compiler must refrain from
inlining in debugging mode.

The simplest thing to do would seem to be
    - if a module (or a function) is compiled in debugging mode,
      then function calls in that module (or in the body of that function)
      will not be expanded inline

    - if a module is not compiled in debugging mode, people must know
      to expect that "exception in {M,F,A,C}" *really* means "exception
      in {M,F,A,C} or some function called therein", which, thanks to
      tail recursion optimisation, it does anyway.

	especially when the virtual machine probably knows (or could
	know) more information about where it went wrong and is only
	giving me cryptic clues so I can work it out for myself.

I thought the point of this thread was that the virtual machine
almost certainly DOESN'T have any more information than it gives you.

	"Programs are data" as you point out before, and Lisp is  
	an example of a language with far more complex macro substitution  
	than Erlang.

Yes, and I have spent enough time inside Lisp debuggers to know what
a NIGHTMARE macros are for debugging Lisp.  (Especially the really nasty
thing where you have fix a macro but the debugger is stepping you through,
but not showing you, the OLD expansion...)  It was precisely my bad
experiences with trying to debug macro-heavy Lisp code that led to my
opposition to macros in Quintus Prolog and and my distress that Erlang
has succumbed to C envy.

	If you imagine the compiler repeatedly transforming the
	S-expressions of your raw source code down into simpler and
	simpler forms which eventually converge into something which can
	be translated down into bytecode or machine instructions, then
	there *is* an inherent concept of source position which can be
	inferred from the address of the crash

It is precisely because I have written complex macro-generating macros
(an OOP library obsoleted by CLOS, an SGML-hacking library, some old
long-discarded AI stuff) that I know that this is quite untrue.

In macroised code, each byte of the result owes its existence to the
*interaction* of many lines; you cannot point to ANY of them and say
"that is my source".

	- you simply unravel the compilation process back to find out
	which expression in the user's code caused the problem.

Hysterical laughter.  In a stateful language like Lisp, the translation
may have depended on such a way that it is no longer *possible* to redo
(let alone undo) the compilation process.  Guess what?  Erlang is
stateful enough for this to be a problem.  (module = global variable.)
Don't forget, an Erlang compile-time transformation can do ANYTHING it
wants to, including picking a transformation according to the current
price of Microsoft stock.

	So we're talking about an implementation issue here? Instead of, say,  
	bloating up your bytecode by adding a (filename, linenumber) tuple to  
	each instruction, can't we build up a mapping table (the one I talk  
	about above) at compile time which maps the bytecode address back up  
	into the original source code?

Basically a mapping table *does* add a (filename,linenumber) tuple to
each instruction, albeit with some compression.

	From what I recall, this is how the  
	debug information is stored in a C executable.

And as I pointed out with numbers, this GREATLY INCREASES the size of
the C executable.

	Does this address your concerns about bloating the size of the  
	bytecode in a world where the VM can handle debugging information,  
	but the user has chosen to disable it?

How could it?  I've read the DWARF manual cover to cover; I can still
remember when the code reorganiser for a certain C compiler (shifting
procedures around to improve paging) managed to break C++ because C++
relied on the mapping tables for exception handling, and the code
reorganiser forgot to update them.  In short, you're not telling me
anything I wasn't already well aware of, so how code it address my
concerns?

My primary concern is the *human* time required to implement something
like this, and the loss of other good things we might have had if that
time were spent in more productive ways.

People seem to think it is a simple thing to get right.  I have learned
over the years *not* to trust C compiler line number information; the
debuggers will skip backwards and forwards in very strange ways, and too
often fail to stop at all at a line which is plainly there.  If I cannot
rely on C compilers with large amounts of money and time behind them, how
easy is it going to be for Erlang?

We are talking about a serious increase in the complexity of BEAM files.
We are talking about nearly every part of the compiler being affected in
trivial to enormously complex ways.
And for what looks to me to be very minor benefit.

{Module,Name,Arity,Clause} could give us _most_ of the benefit of line
number information for MUCH less cost.

	> If the compiler recorded the source position of the first token
	> of each clause in each function, then knowing which clause of which
	> function you were in would get you very close in terms of line  
	> numbers.

	Yes. I think that's what I'm suggesting.

No, it's very different.  I am talking about locating *only* the beginnings
of *clauses*.

	Actually, if you're going for broke, why not record the position
	of each expression in the function in our separate (optional)
	mapping table.

My whole point is that we SHOULDN'T go for broke; that clause level
information is cheap to obtain and store, cheap enough that we might as
well always do it, while the position of each "expression" isn't even
*defined* when you have, as Erlang does have, macros and inlining.

	> Because it isn't "obvious" *where* the problem is within the  
	> cascade of
	> macro applications.

	But this problem has been satisfactorily dealt with in Lisp, which  
	has far more powerful and confusing macro support than Erlang.

No, it has *not* been satisfactorily dealt with in Lisp, not at all.
Very far from it, in fact.

	My initial impressions of Erlang are that I'm far more efficient  
	writing "control" applications in it than I am in Python/Ruby/Tcl  
	etc, and that it seems to encourage me to think in way where I  
	produce fewer bugs. I'm generally much more confident that, once the  
	code compiles, it'll work first time than I am in pretty much any  
	other language. However, the really horrible part comes when I really  
	do have to start interpreting stack traces. I feel like I'm playing a  
	guessing game, and I'm just wasting my time.

Well yes, interpreting stack traces IS a waste of time for ANY programming
language.  I am very pleased to learn that there is a prototype version of
QuickCheck for Erlang:
	http://www.cs.chalmers.se/~rjmh/ErlangQC/
Now if someone will just tell me where the actual source code
	http://www.cs.chalmers.se/~rjmh/ErlangQC/qc.erl
went to (because I get a 404 when I try to download it),
I'll be a happy camper.  There's a wonderful binary search tool
that I'd like to see adapted to Erlang as well.

	> So in order to support line number information (should that prove  
	> to be
	> usefully more precise than {module,function,arity,clause}) in a  
	> debugging
	> mode, it is NOT necessary to have any support for line numbers in  
	> the VM.
	> (As noted above, QP was able to provide source positions in  
	> debugging mode
	> without having any support for it in the WAM.)

	I think this is exactly what I'm talking about. I just can't see why  
	you don't want it.

I didn't say I don't want it, I said I don't want it ***IN THE VM***.
This is what debugging interpreters are for.  (In Interlisp-D, for example,
debugging was based on interpreting trees, not byte codes, although
compiled code was byte codes.  The only VM support required was the
ability to trap function entry.)

	So your function call disappears from the reported stack trace too. I  
	don't see how that makes the availability of more detailed code  
	position information relatively less useful.

Surely it is obvious?  What good does a "precise" position give you
if that position isn't *THERE* in the stack trace?

	I can (just about) live with functions disappearing from the stack  
	like that... and I really don't want to complicate things by throwing  
	yet another idea in, but how about creating a ring-buffer in the  
	virtual machine to keep track of the top n items of the "logical"  
	stack.

How about NOT doing that?  THIS is my point:  DO NOT MESS AROUND WITH
THE VM!  Doing that has effects all OVER the place, costs that are grossly
out of proportion to the benefits.

All you really need is a source to source transformation:

    ...
    f(--args--) when --guard-- ->
	--body--;
    ...

rewrites to

    ...
    f(--args--) when --guard-- ->
	tracer!{self(), --module--,f,--arity--,--clause number--},
	X = (--body--),
	tracer!{self(),pop},
	X;
    ...

and now, without ANY changes to the VM or compiler AT ALL, you have
your "logical" stack, 

	So, instead of simply producing a jump instruction at a tail- 
	recursive call, why don't we write the stack frame which would have  
	been generated if we didn't have this optimisation into the buffer?  

You cannot be serious.  Heck, why don't we just go all the way and
run Erlang interpreted by a Tcl program interpreted by a Tcl interpreter
written in interpreted BASIC?  Then if that's still too fast we could
throw in a few delay loops.

	Oh, I don't think so. Programs crash, and all I want to know is where  
	they've crashed. This information is (almost) in the virtual machine  
	already, and it really annoys me to have to play a guessing game  
	every time.

No, it *ISN'T* in the virtual machine already, or even close to it.
Providing that information in the VM would cost a lot of development
time and would interfere with a lot of other things it would be nice to
have.  There is already a serious conflict between accurate (complete)
stack traces and usable performance (=TRO) which has been resolved in
favour of usable performance.

Me, I find that knowing *where* something crashed isn't terribly useful;
what I want to know is where the *error* is, and that's usually somewhere
else.  Just yesterday, for example, I had a program crash.  It turned out
that the thing I needed to know was not where the program crashed, but
what data it was looking at.  The bug was in another program entirely!
(The one that generated the bad data.)  In Erlang terms, I would normally
find more benefit in having the *arguments* of the functions in the stack
trace than in having the line numbers.  With the arguments of the top
function, since they cannot possibly have changed, and since function
headers and guards cannot have side effects and cannot depend on anything
that could be changed by a side effect, it is possible to "replay" clause
selection and discover which clause the crash was in (and, more importantly,
why that clause was selected).  So arguments can give you positions, but
positions cannot give you arguments.

Of course keeping arguments creates a conflict between accurate debugging
information and usable performance (garbage collection).  TANSTAAFL.