[erlang-questions] Is there a good source for documentation on BEAM?

Wed May 9 20:03:27 CEST 2012

----- Original Message -----
> From: Richard O'Keefe <ok@REDACTED>
> To: Thomas Lindgren <thomasl_erlang@REDACTED>
> Cc: Michael Turner <michael.eugene.turner@REDACTED>; "erlang-questions@REDACTED" <erlang-questions@REDACTED>
> Sent: Tuesday, May 8, 2012 3:07 AM
> Subject: Re: [erlang-questions] Is there a good source for documentation on BEAM?
> 
> 
> On 8/05/2012, at 7:15 AM, Thomas Lindgren wrote:
>>  There has been a substantial number of non-BEAM Erlang implementations 
> already, so I'm
>>  not convinced detailed BEAM docs is the key property* to spread Erlang.
> 
> And how many of those non-BEAM implementations still exist?
> Does GERL?  Is E2S still maintained?  How much of OTP can it handle?

This, to my mind, says more about the (lack of) need for a second source implementation than any inherent
problems with learning BEAM. If you want to try your hand, quite a bit of the complexity is not in handling BEAM
as such but in reimplementing ERTS: writing the BIFs, SMP, memory management, etc.

>>  Indeed, requiring detailed docs of every change of BEAM seems likely to 

> slow innovation down instead.
> 
> I not only *don't* believe that, I *can't* believe that.
> Joe has informed us that there are TWO levels of BEAM,
> one of which has been very stable, and one of which has
> changed many times.
> 
> I don't even believe your claim if made about the low level
> much changed "BEAM", but let's suppose it true for the sake
> of argument.  If the high level of BEAM has remained pretty
> stable for quite a while, how would documenting it have
> slowed innovation down?

Note that BEAM files are not guaranteed to be compatible across releases, and they do change incompatibly
every now and then. (Not very often, to be sure. I recall it happening twice.) Check the mailing list for some discussions.

The "sub-BEAM" implementation can change more rapidly, of course. I assume implementors there can do 
platform specific things like inline expanding instructions into native, mapping VM registers to native registers,
constructing superinstructions, etc. (I seem to recall all of these being tried at one time or another.)

As for what I see would cause a slowdown: the attention of the key hackers would be spent on writing this
documentation (and then maintaining it, I assume). Perhaps people will start depending on documented details 
of implementation, explicitly or implicitly. Major changes would also mean major internal docs rewrites.

See below for one option.

> ... [pace of innovation, see below on kickstarter for my comment]
>> 

>>  If the motive is education, I think someone interested in compilers and 
> virtual machine architectures
>>  would have little trouble with BEAM as such.
> 
> I have an interest in compilers and VMs.  I worked professionally on Quintus
> Prolog and the real WAM (not the one in the papers or Aït-Kaci's book).  And
> trying to figure out the BEAM was such a slog that to be honest, I said to
> myself "the hell with it, if they don't *WANT* me to understand the 
> BEAM,
> I'm not going to waste any more of my time trying to penetrate the 
> obscurity".

At that level of knowledge, I assume the BEAM instruction set in itself is no big hurdle.
If you want to learn the internals beyond that, what level of detail are you looking for?

>>  In a real sense, BEAM is just a vehicle to express compiler optimizations 
> for a
>>  restricted part of ERTS (the sequential execution part, basically).
> 
> No, compiler optimisations are expressed in the executable code of the
> compiler.  BEAM lets you express the *results* of such optimisations,
> which is a different thing.  It's just like the Quintus compiler:  I could
> figure out in that case what the *results* were, but the actual process
> remained obscure.  (More precisely, what the 'invariants' were.)

Here is how I see it: The instruction set of BEAM has been chosen for the purpose of expressing, and then used to express, various optimizations.
Consider a simple example: targeting BEAM vs JAM (a stack machine used previously to implement erlang). 
In order to optimize register use on JAM, you first have to translate it to a new intermediate language (and then probably never
try to translate it back to JAM), while BEAM (like its uncle WAM) expresses registers explicitly and so makes such optimizations straightforward.

> ...
> Yes, I'm shouting.  "We don't need it" and "you don't 
> need it" are utterly
> different propositions, and too many people in too many areas of life fail
> to realise that.

(To avoid any confusion, let me add that I last worked at Ericsson CSLAB in 1998. So I'm hardly an OTP insider.)

So perhaps the right approach is to do a kickstarter to fund someone writing a deep dive Erlang/OTP internals book? 
Complexity: roughly the level of writing a Linux kernel book, at a quick guess. Perhaps a bit easier.

>>  Another argument might be that BEAM should be specified in detail in order 
> to be a suitable binary format for distribution, 
>>  which is essentially what the JVM instruction set has become.
> 
> I suggested many years ago that Erlang should take a leaf out of Kistler's
> book (or PhD thesis).  The "Juice" system for Oberon compiled source 
> files
> to abstract syntax trees, then cleverly compressed the ASTs and used them
> as the binary distribution form.  They came in smaller than .class files
> and had no presuppositions about the target hardware (not even primitive
> size and alignment if I recall correctly).  The cost of decompressing and
> generating native code was low, to the point where it was faster to
> dynamically load Juice files than their equivalent of .so/.dll files, and
> the generated code actually ran faster because the code generator knew
> more about the environment of the target, including existing code.  (I
> don't know if the Juice runtime did cross-module inlining, but it would
> have been possible.)

Not a bad idea.

Best regards,
Thomas