[erlang-questions] Is there a good source for documentation on BEAM?

Thu May 10 09:59:04 CEST 2012

What a long discussion ...

I have a few comments.

I have started writing a 2'nd edition of "Programming  Erlang" - I have a
dumping ground for potential chapters in books that
one-day-some-time-if-i-get-time
I might write. One of these in called "beam".

A question was asked about how the beam worked - so I thought it's bit silly me
sitting on this - I'll post it off since it might help.

I don't actually know how the Beam works - I have a vague idea - but
fortunately I
can just wander down the corridor and ask Björn who does know how it works.
I also have a strong aversion to reading code - I like to know how
stuff is supposed to work
and not reading the code to find out.

(I hate reading code - as soon as I read code - I get sidetracked by
wondering "why was it written this way"
 and often get a strong desire to rewrite it - I once wondered how PDF
worked and that was a complete
 disaster - 3 months down the drain and ErlGuten was the result - and
all I really wanted
 to do was figure out why the kerning in an open office slide was
manifestly wrong.
 Any sensible person would have said "don't ask")

I know of only two people who have figured out how the beam works
*without* asking Björn
that was Fredrik Svan and Kresten Krab Thorup and I am deeply
impressed that they
managed to do this. I asked both Fredrik and Kresten how they did this
- they both said
"I reverse engineered the code" - they both had good reasons Fredrik
made a javascript
Erlang in the browser thing, and Kresten made Erjang.

Now there are two levels at which one could describe the Beam - level one
is the relationship between erlang code and the beam instructions -
this is what I described.
Actually running erl -S and guesswork gets you pretty far - this what
I did - I only had
to play my "ask Björn card" a few times (there's some stuff about
marking which registers
have to be garbaged, which is not guessable).

At this level of abstraction we can completely ignore memory management, most of
garbage collection, how process stacks and heaps are organised, how multicores
are locked etc.

To describe the next level - we suddenly jump from a one chapter
description to a
entire book. This is a book that is tricky to write - I guess no one
person knows
all the answers. it's also a book that few (I suspect) would read.
Fredrik and Krestin
didn't have to understand much of the beam memory managment. Fredrik used
whatever GC and object representation javascript uses and Krestin used
whatever the JVM did - so it wasn't really relevant.

Realistically the only thing that might get written is a piffed up version of
what I've distributed - but I would be reluctant to include it in the next
edition of Programming Erlang, I can use the space for content
with wider appeal.

I'll try and make a better version of what I've distributed and
put it up on the main web site... but this is a low priority task

Cheers

/Joe

On Thu, May 10, 2012 at 8:22 AM, Michael Turner
<michael.eugene.turner@REDACTED> wrote:
> "As for what I see would cause a slowdown: the attention of the key
> hackers would be spent on writing this
> documentation (and then maintaining it, I assume)."
>
> Perhaps better: volunteers could document it (on a relatively
> controlled wiki, for example). Then the "key hackers" could mention
> any needed corrections.
>
> As for maintenance, you say yourself (in a later e-mail) that you can
> only remember significant changes happening twice. Documenting such
> infrequent changes doesn't exactly sound like some grinding daily
> burden for already-overworked Ericsson programmers. If they have to
> propose these changes in writing anyway (at least in internal e-mail),
> sounds like most of the documentation work gets done before the
> changes are made.
>
> -michael turner
>
>
> On Thu, May 10, 2012 at 3:03 AM, Thomas Lindgren
> <thomasl_erlang@REDACTED> wrote:
>>
>>
>>
>>
>> ----- Original Message -----
>>> From: Richard O'Keefe <ok@REDACTED>
>>> To: Thomas Lindgren <thomasl_erlang@REDACTED>
>>> Cc: Michael Turner <michael.eugene.turner@REDACTED>; "erlang-questions@REDACTED" <erlang-questions@REDACTED>
>>> Sent: Tuesday, May 8, 2012 3:07 AM
>>> Subject: Re: [erlang-questions] Is there a good source for documentation on BEAM?
>>>
>>>
>>> On 8/05/2012, at 7:15 AM, Thomas Lindgren wrote:
>>>>  There has been a substantial number of non-BEAM Erlang implementations
>>> already, so I'm
>>>>  not convinced detailed BEAM docs is the key property* to spread Erlang.
>>>
>>> And how many of those non-BEAM implementations still exist?
>>> Does GERL?  Is E2S still maintained?  How much of OTP can it handle?
>>
>> This, to my mind, says more about the (lack of) need for a second source implementation than any inherent
>> problems with learning BEAM. If you want to try your hand, quite a bit of the complexity is not in handling BEAM
>> as such but in reimplementing ERTS: writing the BIFs, SMP, memory management, etc.
>>
>>>>  Indeed, requiring detailed docs of every change of BEAM seems likely to
>>
>>> slow innovation down instead.
>>>
>>> I not only *don't* believe that, I *can't* believe that.
>>> Joe has informed us that there are TWO levels of BEAM,
>>> one of which has been very stable, and one of which has
>>> changed many times.
>>>
>>> I don't even believe your claim if made about the low level
>>> much changed "BEAM", but let's suppose it true for the sake
>>> of argument.  If the high level of BEAM has remained pretty
>>> stable for quite a while, how would documenting it have
>>> slowed innovation down?
>>
>> Note that BEAM files are not guaranteed to be compatible across releases, and they do change incompatibly
>> every now and then. (Not very often, to be sure. I recall it happening twice.) Check the mailing list for some discussions.
>>
>> The "sub-BEAM" implementation can change more rapidly, of course. I assume implementors there can do
>> platform specific things like inline expanding instructions into native, mapping VM registers to native registers,
>> constructing superinstructions, etc. (I seem to recall all of these being tried at one time or another.)
>>
>> As for what I see would cause a slowdown: the attention of the key hackers would be spent on writing this
>> documentation (and then maintaining it, I assume). Perhaps people will start depending on documented details
>> of implementation, explicitly or implicitly. Major changes would also mean major internal docs rewrites.
>>
>> See below for one option.
>>
>>> ... [pace of innovation, see below on kickstarter for my comment]
>>>>
>>
>>>>  If the motive is education, I think someone interested in compilers and
>>> virtual machine architectures
>>>>  would have little trouble with BEAM as such.
>>>
>>> I have an interest in compilers and VMs.  I worked professionally on Quintus
>>> Prolog and the real WAM (not the one in the papers or Aït-Kaci's book).  And
>>> trying to figure out the BEAM was such a slog that to be honest, I said to
>>> myself "the hell with it, if they don't *WANT* me to understand the
>>> BEAM,
>>> I'm not going to waste any more of my time trying to penetrate the
>>> obscurity".
>>
>>
>> At that level of knowledge, I assume the BEAM instruction set in itself is no big hurdle.
>> If you want to learn the internals beyond that, what level of detail are you looking for?
>>
>>>>  In a real sense, BEAM is just a vehicle to express compiler optimizations
>>> for a
>>>>  restricted part of ERTS (the sequential execution part, basically).
>>>
>>> No, compiler optimisations are expressed in the executable code of the
>>> compiler.  BEAM lets you express the *results* of such optimisations,
>>> which is a different thing.  It's just like the Quintus compiler:  I could
>>> figure out in that case what the *results* were, but the actual process
>>> remained obscure.  (More precisely, what the 'invariants' were.)
>>
>>
>> Here is how I see it: The instruction set of BEAM has been chosen for the purpose of expressing, and then used to express, various optimizations.
>> Consider a simple example: targeting BEAM vs JAM (a stack machine used previously to implement erlang).
>> In order to optimize register use on JAM, you first have to translate it to a new intermediate language (and then probably never
>> try to translate it back to JAM), while BEAM (like its uncle WAM) expresses registers explicitly and so makes such optimizations straightforward.
>>
>>> ...
>>> Yes, I'm shouting.  "We don't need it" and "you don't
>>> need it" are utterly
>>> different propositions, and too many people in too many areas of life fail
>>> to realise that.
>>
>> (To avoid any confusion, let me add that I last worked at Ericsson CSLAB in 1998. So I'm hardly an OTP insider.)
>>
>> So perhaps the right approach is to do a kickstarter to fund someone writing a deep dive Erlang/OTP internals book?
>> Complexity: roughly the level of writing a Linux kernel book, at a quick guess. Perhaps a bit easier.
>>
>>>>  Another argument might be that BEAM should be specified in detail in order
>>> to be a suitable binary format for distribution,
>>>>  which is essentially what the JVM instruction set has become.
>>>
>>> I suggested many years ago that Erlang should take a leaf out of Kistler's
>>> book (or PhD thesis).  The "Juice" system for Oberon compiled source
>>> files
>>> to abstract syntax trees, then cleverly compressed the ASTs and used them
>>> as the binary distribution form.  They came in smaller than .class files
>>> and had no presuppositions about the target hardware (not even primitive
>>> size and alignment if I recall correctly).  The cost of decompressing and
>>> generating native code was low, to the point where it was faster to
>>> dynamically load Juice files than their equivalent of .so/.dll files, and
>>> the generated code actually ran faster because the code generator knew
>>> more about the environment of the target, including existing code.  (I
>>> don't know if the Juice runtime did cross-module inlining, but it would
>>> have been possible.)
>>
>>
>> Not a bad idea.
>>
>> Best regards,
>> Thomas
>>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions