[erlang-questions] Proposal: a new Dbgi BEAM chunk

Tue Mar 14 11:24:04 CET 2017

Hi Kostis,

Thanks for the comments. Answers inline.

>  - What does the mechanism that finds the source code have to do with the
> new chunk which is stored in the .beam file?  These two are totally
> orthogonal mechanisms, aren't they?
>

The new proposed Dbgi chunk does not follow the same format as the Abst
chunk. It is made of three fields:

{debug_info_v1, Backend, Metadata | none}

The backend field must be a module that knows how to:

- How to convert Metadata to different formats. For example, Elixir will
likely store Elixir AST in the Metadata field and be able to convert the
Metadata field to Elixir AST, Erlang AST and Core AST.

- How to retrieve the AST from source if Metadata is none. The process will
likely involve: 1. find the source for the beam file in the :compile
attributes 2. parse the source file and 3. convert it to desired format.
That's exactly how fetching abstract code from source works today on tools
like cover and debugger

The proposed API for the Backend is outlined in the PR:
https://github.com/erlang/otp/pull/1367

 - How is finding "the respective Erlang source" related to solving the
> problems that LFE or other languages (existing and future ones) may be
> facing?  Does the proposal come with some magic mechanism to "find" (I
> guess "generate" is a more appropriate word here) Erlang source code from
> e.g. LFE source?
>

As per above, the Dbgi chunk contains the backend module and the backend
module has the implementation of how to retrieve the AST from source.
That's why it is important for functions like beam_lib:strip/1 to not erase
the Dbgi chunk but instead set the metadata field to none.

> Don't misunderstand me, I am not necessarily against the proposal.  It's
> just that I do not see why/how renaming a BEAM chunk is helping us solve
> problems that are orthogonal to the info that gets stored in this
> particular chunk.

Hopefully the points above clarify it. We are not only renaming the chunk,
we are adding extra information to it as well and changing the shape of the
metadata stored (which is why a new chunk is required).

> Does this mean that it will be impossible to hide the original source code
> from now on?
>

This behaviour will be the same as today. To fully answer the question,
let's outline how tools that need the AST work today:

1. Attempt to load the AST from the beam chunk

2. If the AST is not available, see if there is a source file on disk

3. If the source file is available, parse it and convert to AST

In other words, the process of hiding a source from a tool is:

1. You can encrypt debug_info

2. Or you can pass debug_info false and remove the source from disk

Today, if you set debug_info to false but the source is still on disk, most
tools will end-up building the AST from source. If you don't want that
reconstruction then the source must not be available on disk. I aim to keep
this behaviour.

> Does this mean that if I have a .beam file lying around from long ago or I
> have written a compiler that generates .beam files without a .Dbgi chuck
> this is not a valid .beam file anymore?  How is that "backwards
> compatible"?  (as claimed in the PR)
>

The beam_lib:chunk(BinOrPath, [:abstract_code]) will continue to look for
the Abst chunk for at least 3 releases for backwards compatibility reasons.
It will work like this:

* Look for the Dbgi chunk, if it is available, it will ask the backend to
convert the metadata to Erlang format

* If the Dbgi chunk is not available, it will look at the old Abst chunk
and return it

This means that beam_lib will be able to handle the differences between old
and new beams. The only exception is if you lookup directly for the "Abst"
chunk, which now will no longer be available, but that should not cause
errors because the chunk has always been optional.

Your feedback here is very valuable because you have built many tools that
work on core. With the proposal above, I hope such tools will have code
like this:

case beam_lib:chunks(Beam, [debug_info]) of
  {ok,{Module,[{debug_info, {debug_info_v1, Backend, Metadata}}]}} ->
    case Backend:debug_info(core, Module, Metadata, [allow_source_lookup])
of
      {ok, CoreAST} ->

      {error, Reason} ->
        %% handle error
    end
  {error, Reason} ->
    %% handle error
end

The tool no longer needs to retrieve Erlang AST and translate it to core
nor know how to perform source lookups. Furthermore, the tool will work
with any language that knows how to emit Core AST from the information
stored in the Dbgi chunk.

Please let me know if there are more questions or points I should clarify,

*José Valim*
www.plataformatec.com.br
Skype: jv.ptec
Founder and Director of R&D
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20170314/19fb665e/attachment.htm>