BEAM documentation (was Re: Packages in Erlang...)

Mon Sep 8 19:00:37 CEST 2003

On Mon, 8 Sep 2003, James Hague wrote:

> Vlad Dumitrescu wrote:
> >That would be interesting, but you still need
> >something running the Erlang code that runs
> >the new BEAM VM? Or does the bootstrapping
> >trick work in this case too? Doesn't one 
> >need machine code generation for that?
> 
> Yes, it would need to generate machine code as a final step.  The more I
> think about it, the more similarities I see between implementing a Forth
> compiler and the BEAM emulator.  I'm not talking about RPN and such--BEAM is
> register based--but how relatively easy it is to generate code when you are
> specializing it for a particular abstract machine and not a full-fledged
> C-like language.
> 
> An equally interesting project would be be to write a BEAM emulator using
> one of the modern, machine code generating Forths that have become standard
> (here's an example: http://www.mpeltd.demon.co.uk/pfwvfx.htm).  Essentially
> the main loop of the emulator, the one that's reliant on the gcc "address of
> label" extension, is how a Forth with tail-call support works.
> 
> For these paths to be opened up, I'd like to see two things happen:
> 
> 1. Decent documentation of the BEAM architecture and instruction set.
> 2. Move most of the code from the "beam_load.c" module to the Erlang
> compiler.  This is the module that replaces generic BEAM instructions with
> specialized instructions.  This code would be much cleaner in Erlang than in
> C, and it would also simplify the emulator (beam_load.c is the second
> largest module in the emulator).
> 

I quite agree - I have very recently done some experiments in loading code
- admittedly for my old JAM machine but the results would equally apply
to beam - Here I create *all* data structures (hash tables, code ...) as
Erlang terms - squirt them out in my UBF (slightly optimized) - then to
read the code I just mmap the file and do a parse_ubf (written in C) -
this is *very* fast - the code just "falls" into the right place :-)

I started building atom tables in Erlang that looked like this:

	{<<"aaa">, <<"bbb">, ...    <<"<<<"}

this is a sorted tuple of all the atoms in a module. Now I know that the
memory layout of a tuple is a flat sequence of consecutive addresses,
using this fact I can tweak things to get almost exactly the same data
structure that a C program would have used - only do it all in Erlang
- which is much easier ...

What I'd like so see done is the following.

Take a working machine (Beam, Jam ...) and *remove* as much as possible
which still keeping the machine running (is this what re-factoring is) -

I find it very difficult today to see what exactly is the essential code.

I'd really like to see the entire system in two directories
one C on Erlang - the emulator and the compiler with *no* dependencies
and a *very* good documentation of all the instructions - 

/Joe