[erlang-questions] Is there a good source for documentation on BEAM?

Thu May 10 08:10:24 CEST 2012

(1) We were told that BEAM documentation isn't needed because
    there are other Erlang implementations.
(2) I ask whether any of those other implementations ever kept
    up with The Real Thing.  (By the way, as far as I know,
    none of them ever supported bit syntax, and my recent
    attempt to install GERL failed miserably.)
(3) Suddenly we are told that the abandoning of those other
    things just *proves* that we don't need BEAM documentation.

?
> 
> 
>>> Indeed, requiring detailed docs of every change of BEAM seems likely to 
>> slow innovation down instead.
>> 
[Me]

>> I not only *don't* believe that, I *can't* believe that.
>> Joe has informed us that there are TWO levels of BEAM,
>> one of which has been very stable, and one of which has
>> changed many times.
>> 
>> I don't even believe your claim if made about the low level
>> much changed "BEAM", but let's suppose it true for the sake
>> of argument.  If the high level of BEAM has remained pretty
>> stable for quite a while, how would documenting it have
>> slowed innovation down?
> 
[Thomas Lindgren]

> Note that BEAM files are not guaranteed to be compatible across releases, and they do change incompatibly
> every now and then. (Not very often, to be sure. I recall it happening twice.) Check the mailing list for some discussions.

If BEAM were completely stable, it might be reasonable to expect
anyone who cares to figure out BEAM for themselves, once and for
all.  The more it changes, THE MORE IT NEEDS TO BE DOCUMENTED.

I repeat my claim about the fragmentary emulator I wrote:
the better the documentation was, the *FASTER* I could write
and rewrite it.  Having the tables needed for the disassembler
(which exists) and the assembler (which doesn't yet, but will)
automatically generated from the same file that the emulator
switch is generated from means they are consistent *all the time*
without me having to check.  Switching a numeric operand from
scaled (use @size) to unscaled (use @number) or back with the
instruction description where the preprocessor can see it means
that an incomplete edit (changing some occurrences but not
others) will be caught *before the C compiler is run* let alone
before run time.

> 
> As for what I see would cause a slowdown: the attention of the key hackers would be spent on writing this
> documentation (and then maintaining it, I assume).

In my fragmentary emulator, there are roughly equal lines of
documentation and code, except that thanks to the preprocessor,
the lines of code are simpler and more often correct than they
would have been without.  *This* hacker, at least, found it
took *less* time to write documentation+code than to just write
code.

It's not hugely detailed documentation, but whether operands are
raw or tagged, whether a tag check is *intentionally* omitted,
whether some other instruction is expected to set up a context
in some register, the name of a related instruction where the
details can be found, sometimes what possibly surprising source
forms an instruction was meant for.  It DOESN'T have to be huge
benefit, and I would expect 'key hackers' to be doing it for
their *own* sake, never mind anyone else's.

I can understand the documentation being stripped out before
release; what I'm having trouble with is the idea of it never
having existed, and the idea that not having it makes life in
some unimaginable way easier for the developers.

> Perhaps people will start depending on documented details 
> of implementation, explicitly or implicitly. Major changes would also mean major internal docs rewrites.

Well, yes, BUT a preprocessor that helps you get the original code
right will also make it easier to make major changes right.
> 
> At that level of knowledge, I assume the BEAM instruction set in itself is no big hurdle.
> If you want to learn the internals beyond that, what level of detail are you looking for?

Getting a *rough* idea of the compiler's output is no big deal.
Understanding it well enough to generate correct code myself *is*.

For one example of the kind of understanding you can get from
documentation, I was puzzled because I couldn't see the instructions
I knew must be there to nil out X registers that ceased to be live.
Joe's little document made it clear that
(a) there are a lot more X registers allowed in Erlang that in
    Quintus Prolog;
(b) maintaining them is more expensive than in Quintus Prolog;
(c) the nilling instructions I expected don't exist;
(d) there is a (temporary) space leak: if register K is live
    at an allocation point, all registers <= K are assumed to
    be live.
With hindsight, I can now see (most of) that in the .S files.
But it really wasn't obvious.

One important detail is the layout of Erlang stack frames.