[erlang-questions] Is there a good source for documentation on BEAM?

Tue May 8 03:07:13 CEST 2012

On 8/05/2012, at 7:15 AM, Thomas Lindgren wrote:
> There has been a substantial number of non-BEAM Erlang implementations already, so I'm
> not convinced detailed BEAM docs is the key property* to spread Erlang.

And how many of those non-BEAM implementations still exist?
Does GERL?  Is E2S still maintained?  How much of OTP can it handle?

> Indeed, requiring detailed docs of every change of BEAM seems likely to slow innovation down instead.

I not only *don't* believe that, I *can't* believe that.
Joe has informed us that there are TWO levels of BEAM,
one of which has been very stable, and one of which has
changed many times.

I don't even believe your claim if made about the low level
much changed "BEAM", but let's suppose it true for the sake
of argument.  If the high level of BEAM has remained pretty
stable for quite a while, how would documenting it have
slowed innovation down?

I can *prove* that the absence of documentation has definitely
slowed innovation down.  There are several of my EEPS where I
*would* have provided model implementations had I been able to.

Heck, the frames proposal was in more or less its present
shape *years* ago.  It was only last month when it finally
dawned on me that I could make measurements by constructing
my *own* micro-BEAM which I *could* understand.

> 
> If the motive is education, I think someone interested in compilers and virtual machine architectures
> would have little trouble with BEAM as such.

I have an interest in compilers and VMs.  I worked professionally on Quintus
Prolog and the real WAM (not the one in the papers or Aït-Kaci's book).  And
trying to figure out the BEAM was such a slog that to be honest, I said to
myself "the hell with it, if they don't *WANT* me to understand the BEAM,
I'm not going to waste any more of my time trying to penetrate the obscurity".

There are three key software engineering lessons:
 - if it isn't tested, it doesn't work;
 - if it isn't documented, it doesn't work;
 - tests are one kind of documentation but not enough.

For example, at Quintus, shortly after David Warren left, a compiler bug
was reported.  The whole compiler had two comments in it.  One was a
copyright notice, and the other was a commented out bit of code saying
"this doesn't work".  All variable names were either one letter or one
letter and one digit.

To fix the bug, I had to document the data structures used by the compiler,
including running lots of tests through it to make sure my documentation
was correct.  Whenever I figured out what a variable meant, I gave it a
longer name.  Once I understood the data structures, I was able to revise
them and make the compiler about 20% faster.  That was good, because the
bug turned out to be in the one part of the compiler I could never
understand.  Adding some extra code to check if the bug had happened and
deoptimise in that case increased compile time by 10%, but overall the
compiler was now faster.

I am still quite angry that the information I needed to fix that bug
properly was *KNOWN* to the author, who didn't bother writing it down.
I didn't even need to know how the thing worked; what I needed to know
was what exactly it was supposed to *do*.

> In a real sense, BEAM is just a vehicle to express compiler optimizations for a
> restricted part of ERTS (the sequential execution part, basically).

No, compiler optimisations are expressed in the executable code of the
compiler.  BEAM lets you express the *results* of such optimisations,
which is a different thing.  It's just like the Quintus compiler:  I could
figure out in that case what the *results* were, but the actual process
remained obscure.  (More precisely, what the 'invariants' were.)

The thing is, the compiler module in question was about 2 kSLOC, and it took
me two full *weeks* to figure out what the author already *knew*.  That was
not a good use of my time.

I've already spent about 4 full days equivalent trying to figure BEAM out,
and with nobody paying me to do that, it's just not worth it, especially
because somebody already KNOWS and just can't be bothered TELLING.

Yes, I'm shouting.  "We don't need it" and "you don't need it" are utterly
different propositions, and too many people in too many areas of life fail
to realise that.

> Another argument might be that BEAM should be specified in detail in order to be a suitable binary format for distribution, 
> which is essentially what the JVM instruction set has become.

I suggested many years ago that Erlang should take a leaf out of Kistler's
book (or PhD thesis).  The "Juice" system for Oberon compiled source files
to abstract syntax trees, then cleverly compressed the ASTs and used them
as the binary distribution form.  They came in smaller than .class files
and had no presuppositions about the target hardware (not even primitive
size and alignment if I recall correctly).  The cost of decompressing and
generating native code was low, to the point where it was faster to
dynamically load Juice files than their equivalent of .so/.dll files, and
the generated code actually ran faster because the code generator knew
more about the environment of the target, including existing code.  (I
don't know if the Juice runtime did cross-module inlining, but it would
have been possible.)