Erlang bytecodes and/or VM description?
Richard A. O'Keefe
Tue Jun 6 03:16:46 CEST 2006
David Hopwood asked:
> This might seem like a pointed question, but I'll ask it anyway:
> how does one maintain a VM without accurate documentation of the
> executable format it implements? I know I couldn't do it.
This year I have my third-year software engineering students
doing maintenance work on Mawk -- Mike Brennan's version of AWK.
We are in *exactly* this situation. Twice over, in fact.
Mawk includes two instruction sets, one generated from a Yacc grammar
for ordinary statements and expressions, and one generated from some
fairly impenetrable code for regular expressions. If you have read
"Software Tools" -- which I have and my students haven't -- the regexp
code is fairly obvious, although my considered opinion is that it is
going to be easier to rewrite than repair. If you've read Per Brinch
Hansen "On Pascal Compilers" -- which I have and my students haven't --
the main instruction set isn't _that_ hard to figure out, although
there are two quirks of the compiler that make things harder to
understand than they need be. (1) Instead of building ASTs and then
generating code from those, the compiler generates code as it goes,
sometimes bodily shifting large chunks of code from one place to
another in order to cope with generating them out of order. (2) The
AWK language allows you to pass scalars or entire arrays as function
parameters (what if you mix them up? the documentation is either
entirely silent or explicitly refuses to say, depending on which AWK
you are dealing with), and Mawk does a wee bit of type inference to
try and figure out what's what. If you've read anything on type
inference -- you know what goes here -- it's not terribly hard to
figure out what is going on, BUT by the time the type inference is
done the code has already been generated, and so the type inference
pass has to go back and patch some of it up, which is a little bit
You're probably underestimating your abilities. It just takes
imagination and dedication.
What it takes is *BACKGROUND*. If you have a reasonably good idea of
how something like this OUGHT to work, then you can probably figure
things out. But while you do so, you will curse the author for not
having the decency to *TELL* you what he knew and you have to rediscover.
Why maintain Mawk? Because it needs maintenance (we have now fixed almost
all of the known Mawk bugs and several that weren't on the list) and because
it is worth maintaining: it is easily the fastest AWK around.
More information about the erlang-questions