This *SHOULD* be simple... compile a string to assembler.

Michael Richter ttmrichter@REDACTED
Fri Jul 23 09:15:50 CEST 2010


This should be simple, given the sheer bewildering variety of parsing bits
in stdlib and tools, but I can't for the life of me figure out how to make
this work.

I want to take a string like "a() -> 42.".

I want to see what this looks like in the "assembler" output from the
compiler, but without having to save it to a file (artificially adding a
module definition), compiling that file with "assembler" output, then
displaying the output file.

For reference, here's how the equivalent would look using bash and the gcc
compiler:

echo 'int a() { return 42; }' | gcc -xc -S -o- -


Similar code works for clang too (no surprise since clang aims to shadow the
gcc command line):

echo 'int a() { return 42; }' | clang -xc -S -o- -


So... how to do the same in Erlang?  A quick glance at erlc tells me there's
no way to take input from stdin and put output to stdout, but that's not a
problem.  I'll just make an escript.  Let's look at what we can do after
reading over the docs:

1> {ok, Tokens, _} = erl_scan:string("a () -> 42.").
{ok,[{atom,1,a},
     {'(',1},
     {')',1},
     {'->',1},
     {integer,1,42},
     {dot,1}],
    1}

So far so good.  Let's keep it up.

2> {ok, AbsForm} = erl_parse:parse_form(Tokens).
{ok,{function,1,a,0,[{clause,1,[],[],[{integer,1,42}]}]}}

This is really promising.  Now how to turn this into assembler?  Ah!
 compile:forms/2 seems to be a good fit.  Let's give it a try.

3> compile:forms([AbsForm], 'S').
.:1: no module definition
.:1: Warning: function a/0 is unused
error

Oops!  Something is wrong here.  It wants a module, but I don't have one.
 Let's go back and add those.

4> {ok, Tokens2, _} =
erl_scan:string("-module(null).\n-compile(export_all).\na () -> 42.").
{ok,[{'-',1},
     {atom,1,module},
     {'(',1},
     {atom,1,null},
     {')',1},
     {dot,1},
     {'-',2},
     {atom,2,compile},
     {'(',2},
     {atom,2,export_all},
     {')',2},
     {dot,2},
     {atom,3,a},
     {'(',3},
     {')',3},
     {'->',3},
     {integer,3,42},
     {dot,3}],
    3}
5> {ok, AbsForm} = erl_parse:parse_form(Tokens2).

** exception error: no match of right hand side value
{error,{2,erl_parse,["syntax error before: ","'-'"]}}

...

What the ...?!

OK, maybe it doesn't like the compiler options there.  Let's try it with
just the module.

6> {ok, Tokens3, _} = erl_scan:string("-module(null).\na () -> 42.").
{ok,[{'-',1},
     {atom,1,module},
     {'(',1},
     {atom,1,null},
     {')',1},
     {dot,1},
     {atom,2,a},
     {'(',2},
     {')',2},
     {'->',2},
     {integer,2,42},
     {dot,2}],
    2}
7> {ok, AbsForm} = erl_parse:parse_form(Tokens3).
** exception error: no match of right hand side value
{error,{2,erl_parse,["syntax error before: ","a"]}}

Nope.  That didn't do it either.  It just doesn't like anything after I give
it the module.  That's ... odd.

Hang on!  The location of the error moved to now give an error in front of
code that USED TO COMPILE.  What's going on here?!

8> {ok, Tokens4, _} = erl_scan:string("a () -> 42.\nb () -> 24.").
{ok,[{atom,1,a},
     {'(',1},
     {')',1},
     {'->',1},
     {integer,1,42},
     {dot,1},
     {atom,2,b},
     {'(',2},
     {')',2},
     {'->',2},
     {integer,2,24},
     {dot,2}],
    2}
9> {ok, AbsForm} = erl_parse:parse_form(Tokens4).
** exception error: no match of right hand side value
{error,{2,erl_parse,["syntax error before: ","b"]}}

The parser... parses one . at a time it seems.  That's a bit odd.

Now here's where we stop because we hit my point for this message.  (For the
record, Mononcqc assisted me with this until we figured out how to
accomplish what was needed.  It's an embarrassing amount of work, however,
given that I'm trying to replicate a bash/gcc *one-liner* here, involving
splitting lists on magic token values, inserting hand-crafted forms to the
list of abstract forms, etc.)

The real point of this message lies in documentation.  There was a recent
thread in the mailing list in which some people made semi-to-fully-snide
comments about "knowing the toolbox" because someone hand-rolled a custom
zip* function instead of using lists:zip which he should have known about if
he had only read the docs.  (I'll let you picture the very Victorian back of
the hand to the forehead routine while reading that previous sentence's
closing clause.)

The problem here is that in this case, trying to solve what should be a
trivial problem, I did actually look very closely at my toolbox.  I looked
at the tools available.  I figured out which pieces would be needed
(erl_scan:string/1, erl_parse:parse_form/1 and compile:forms/2) *and could
not figure out how to piece them together*.  The docs on the individual
pieces are ... technically present, but lacking of important details (like
parse_form/1 only going through one {dot,1} token at a time) and showing *
zero* examples of how to put them all together in a working whole.  Instead
of having nice Lego pieces that snap together easily to make eye-pleasing
shapes we have cheap Chinese knock-off Legos with poorly-fitting pieces and
random pins and razor blades stuck into them to get pricked and/or cut by
while we play.

There is an embarrassing wealth of very useful functionality in the
Erlang/OTP distribution.  It is, however, mostly inaccessible because of
these sorts of documentation issues.  Were I not so pigheaded (and Monocqc
not so patient) I would have thrown the tools out and just hacked together a
stupid bash script to do what I wanted: output a bogus module into a
temporary file, append the string, call erlc -S on that and display the
resulting output file before deleting it all.  Why?  Because as comically
inept and hackish as that solution is, it's better than trying to figure out
the library's docs.

Now in my case, I'm going to look at the open source OTP distribution and
see if I can't contribute some actual *examples of use* to the docs for
these pieces.  That solves this little problem, but leaves us with a million
more still.  (Like the dire docs situation for the xref and related modules,
for example.)

So, please, next time someone adopts a "less than optimal" solution to a
problem because they didn't use a library function that did what they
wanted?  Please remember to try looking at the Erlang/OTP docs from a
newcomer's perspective.  Look at what's actually written in the docs and
compare it to the lore that's congealed in your head over your years of use.
 Run a diff over the two and maybe then you'll understand why so many
newcomers do such "broken" things.


(For the record, the solution to the problem I started working on can be
found at http://ideone.com/MXT11.  This is a replacement for a
shell/compiler one-liner.  I'm sure I haven't got the most efficient
possible code for this, but I doubt you could shrink it by an order of
magnitude to match the one-liner.)

-- 
"Perhaps people don't believe this, but throughout all of the discussions of
entering China our focus has really been what's best for the Chinese people.
It's not been about our revenue or profit or whatnot."
--Sergey Brin, demonstrating the emptiness of the "don't be evil" mantra.


More information about the erlang-questions mailing list