[erlang-questions] Reproducible and deterministic builds for GNU Guix (and Nix)

Joe Armstrong erlang@REDACTED
Fri Apr 8 17:06:49 CEST 2016


On Fri, Apr 8, 2016 at 8:59 AM, Pjotr Prins <pjotr.public12@REDACTED> wrote:
> On Tue, Apr 05, 2016 at 02:07:46PM +0100, Magnus Henoch wrote:
>>    Debian has included a patch that lets you use the environment variable
>>    SOURCE_DATE_EPOCH to fix the compile time, and thus obtain identical
>>    output (given the same compiler version and other things):

This is very hacky - it might work by accident but you'd want
stronger guarantees that if you compiled the same file
many times by the same compiler that you'd always get the same
object code file.

As I see things this is crucial to making reproducible builds. Having the
compilation time in the beam file is really bad (I don't remember if I did this,
but if so I apologise) - it should not matter what time you compiled the file.
If you want this information, stick it in a log file, or somewhere else.

If the beam file is uniquely determined by the version of the compiler,
the source and the macro definitons used when it was compiled then
we can use the SHA1 checksum of the beam file as a key, and inject the
code into a distributed hash table - this is the first step to making
a global
revision control system with strong guarantees on version consistency.

I see absolutely no reason for the dozens of different version and
revision control systems that pollute the planet when all that is
needed is a
DHT containing blogs identified by some checksum (like SHA1 or something).

>>    [1]https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=795834
>>
>>    This was briefly discussed on this mailing list:
>>    [2]http://erlang.org/pipermail/erlang-questions/2015-January/082699.html
>
> You may have heard of GNU Guix, the modern (functional) package
> manager of the GNU project. We are trying to add Erlang and Elixir to
> Guix, but we are running into the problem that building the Erlang
> compiler is not deterministic and therefore not reproducible, i.e. the
> beam files contain time stamps.

Great - I've not looked at Guix but I've been following NiX - I've wanted
GitTorrent (= Git + Bit Torrent) so both these seem like a step in the
right direction.

Re the time stamps - you can post-process the beam code to remove
the time stamps, but I'd like stronger guarantees than that.

What would happen if two different implementations of a module
produced the same beam file (I think just rearranging the comments
would achieve this, though I haven't tested this) should this be
allowed?

Personally, I think that the SHA of the source should be included in the
beam file, which will identify the code used to create the beam file.

For your purposes I can write a script to call erlc and strip out the
parts that make compilation reproducible. In the long term we should
discuss this, figure out what the correct thing to do is, and then do it.

Cheers

/Joe

>
> For normal software built by Erlang this can be overriden with
> SOURCE_DATE_EPOCH (as per mentioned Debian patch), but for the
> compiler itself we have not found how to do this.
>
> Do you have a suggestion how to bootstrap the compiler with
> SOURCE_DATE_EPOCH set or disable the time stamps? I am sure as a FP
> compiler designer you can appreciate determinism. Because GNU Guix is
> deterministic there is no need to keep track of time stamps. For hot
> reloading we can assume the start of EPOCH will do the trick, right?
>
> Pj.
>
> On Mon, Apr 04, 2016 at 01:49:44PM -0400, Leo Famulari wrote:
>> On Mon, Apr 04, 2016 at 12:50:12PM -0400, Leo Famulari wrote:
>> > On Mon, Apr 04, 2016 at 10:28:02AM +0200, Pjotr Prins wrote:
>> > > On Sun, Apr 03, 2016 at 11:39:24PM -0400, Leo Famulari wrote:
>> > > > Debian's package exhibits this problem. The timestamps are generated in
>> > > > the following places in the source code. I don't know how to approach
>> > > > this problem.
>> > > >
>> > > > lib/kernel/test/global_SUITE_data/global_trace.erl:    io:format("The trace was generated at ~p~n", [EndTime]),
>> > > > lib/reltool/bin/reltool.escript:    lists:flatten(io_lib:format("%% ~s generated at ~w ~w\n~p.\n\n",
>> > > > lib/reltool/src/reltool_server.erl:    IoList = io_lib:format("%% config generated at ~w ~w\n~p.\n\n",
>> > > > lib/reltool/src/reltool_target.erl:    RelIoList = io_lib:format("%% rel generated at ~w ~w\n~p.\n\n",
>> > > > lib/reltool/src/reltool_target.erl:    ScriptIoList = io_lib:format("%% script generated at ~w ~w\n~p.\n\n",
>> > > > lib/reltool/src/reltool_target.erl:            AppIoList = io_lib:format("%% app generated at ~w ~w\n~p.\n\n",
>> > > > lib/reltool/src/reltool_target.erl:            AppIoList = io_lib:format("%% app generated at ~w ~w\n~p.\n\n",
>> > > > lib/runtime_tools/src/erts_alloc_config.erl:    "generated at ~w-~2..0w-~2..0w ~2..0w:~2..0w.~2..0w by "
>> > > > lib/sasl/src/systools_make.erl:     io:format(Fd, "%% script generated at ~w ~w\n~p.\n",
>> > > > lib/wx/src/gen/gl.erl:%% The program object's information log is updated and the program is generated at the time
>> > >
>> > > If there is no easy work around I suggest simply patching them. Fortunately
>> > > the Erlang compiler does not change much at this level.
>> >
>> > The ideal solution would be to use the value of the environment variable
>> > SOURCE_DATE_EPOCH if it is set, and else to behave as it does now.
>> >
>> > > We can also contact Joe Armstrong, the author of Erlang, to discuss
>> > > this point. He appears to be approachable. I am sure he is open to
>> > > the idea of deterministic builds in a deterministic build system ;)
>> >
>> > I could go to the Erlang IRC channel or forums (whatever they use) and
>> > ask for advice. Since you are actually using Erlang, I think you would
>> > be the better person to contact Joe Armstrong himself, if we decide to
>> > do that.
>>
>> I presented the situation on IRC and it was recommended that I start the
>> discussion on a mailing list.
>>
>> I think that the erlang-questions list [0] could be a good place to
>> start.
>>
>> Pjotr, would you like to start the conversation? I can do it if you are
>> too busy or something.
>>
>> [0]
>> http://www.erlang.org/community
>>
>
> --
>
>>
>>    Regards,
>>    Magnus
>>
>>    On Mon, Apr 4, 2016 at 8:59 PM, Joe Armstrong <[3]erlang@REDACTED> wrote:
>>
>>      Hello,
>>
>>      I think I've asked this before but cannot find the answer:
>>
>>      I want the beam file produced by
>>
>>        $ erl file.erl
>>
>>      to always have the same sha1 checksum - there was, if I remember
>>      correctly, a hidden flag that removed the time of compilation etc from
>>      the beam code. Any ideas how to do this?
>>
>>      /Joe
>>      _______________________________________________
>>      erlang-questions mailing list
>>      [4]erlang-questions@REDACTED
>>      [5]http://erlang.org/mailman/listinfo/erlang-questions
>>
>> References
>>
>>    Visible links
>>    1. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=795834
>>    2. http://erlang.org/pipermail/erlang-questions/2015-January/082699.html
>>    3. mailto:erlang@REDACTED
>>    4. mailto:erlang-questions@REDACTED
>>    5. http://erlang.org/mailman/listinfo/erlang-questions
>
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
>
>
> --



More information about the erlang-questions mailing list