[erlang-questions] more thoughts about package/dependency management

Garrett Smith g@REDACTED
Tue May 29 20:41:09 CEST 2012


On Mon, May 28, 2012 at 5:11 PM, Tim Watson <watson.timothy@REDACTED> wrote:
> http://hyperthunk.wordpress.com/2012/05/28/does-erlangotp-need-a-new-package-management-solution/
>
> This is a summary of the Erlware-Questions discussions - hopefully I've been
> true to what was said on the list, but if I've misrepresented anyone's
> opinion then I apologise and hope that you'll put it down to my 'special'
> short/medium term memory. :)

Thanks for the summary! For fear-of-sheer-volume I didn't delve into
the upstream artifacts :)

I'd like to more directly respond to the post, but there's a lot
there, so I'm jotting down "brain dump" points without thinking too
hard about how they fit into the prior discussion.

Your post covers a large chunk of "already solved" problems, if you
leverage the last couple decades of system packaging experience. I'll
point to Arch Linux's outstanding tool chain for package authoring and
management as a starting point. Arch has achieved its remarkable
growth I believe largely due to this.

High points of Arch's package authoring/management facilities:

- Drop-dead easy to build a package

- Essentially line-of-sight automation for what people typically do
when consuming software (e.g. "configure && make && make install" in
the case of system packages)

- Convention is to pull source code directly from upstream sources,
whatever they may be (FTP, HTTP, git, hg, etc.)

- Package building and installation tools are decoupled from
publishing and hosting

- Custom/private repositories are identical to official/core
repositories and are easy to setup

- Security has become very good over the years (using PGP, web-of-trust)

Regarding packages, I don't think you need separate "publisher"
metadata. E.g. in system package land, it's very common to have
multiple packages of essentially the same software. Each has a
separate name but may share common "provides" metadata.

E.g. if I wanted to consume my fork of "rebar" I could the package
"rebar-gar1t", which would "provide" "rebar" and satisfy that
dependency if needed. This works well in practice as has a couple of
advantages:

- One less field to worry about

- Easy to spot the "official" package and derivatives

Here's a quick read and I think a comprehensive picture of the
metadata one could need:

https://wiki.archlinux.org/index.php/PKGBUILD

Arch does not support multiple installed versions, which is obviously
not workable for Erlang. That's a "must change/enhance" topic.

+1 that the tool must support both system level installations
(requiring privileged access) and local (suitable for unprivileged). I
haven't seen a "global vs local" OTP app setup in Erlang -- but as I
understand things, ERL_LIBS is a path, so it should work today based
on the boot scripts, yes?

I would not chose git for a repository format. This just requires
simple file access:

- A smallish index that can be downloaded separately
- Each package is a file in a directory

I think the packages themselves ought be securely signed (e.g. PGP
scheme) and authorized by the installer. White listing sources is of
course important, but I believe independent -- e.g. an installer might
require HTTPS or for all mirrors.

Does the preference for binary packages mean that source based
packages are not supported?

As a sanity check for your "list of tools" I've mapped each role to
what Arch uses:

* managing local and/or remote repositories

Repositories are simply hierarchical stores that can be accessed using
libcurl. There's no *one way* and you're free to use whatever tools
make sense (push via WebDav, rsync, scp, git commit + pull, whatever)

IMO there's no need or advantage to limit the repository format more than this.

Package indexes are managed using repo-add:

http://www.archlinux.org/pacman/repo-add.8.html

* solving dependencies

The Arch installer -- pacman -- does this.

http://www.archlinux.org/pacman/pacman.8.html

* fetching dependencies/indexes

All using libcurl.

Local indexes are synchronized independent of the packages.

Dependency resolution is performed before any packages are downloaded.

Packages are stored in a local cache for subsequent installs as needed
(repairs mainly) but can be purged easily enough using pacman.

* building

Building is a part of the packaging process, which is handled by makepkg:

http://www.archlinux.org/pacman/makepkg.8.html

It's performed by the "build" function in PKGBUILD (defined by the
package author).

* packaging/assembling

Packages are simply a gzipped tar of a directory that's created by the
"package" function in PKGBUILD plus the package metadata and install
scripts (used at install time).

* publishing

This is out-of-band in Arch -- it's up to the repository maintainer
entirely. I think this is a good point of separation.

You've nailed the list of tools/processes here -- nothing else to add!

I hate sounding like an Arch fan boy (guilty) -- but this packaging
problem is critical to keep simple yet effective, and that's hard to
achieve.

So I'd leverage as much as makes sense from the system packagers
before (re)inventing anything new. And just change what doesn't work.
IMO Arch is the best starting point for this.

Naively, an Erlang equivalent of:

- pacman - the program that performs all work on behalf of an installer
- makepkg - the program that performs all work on behalf of a packager
- repo-add - the program that manages a package index database
- PKGBUILD - the package metadata and supporting scripts
- /etc/pacman.conf - configuration for the installer (repositories,
trust levels, etc.)
- /etc/pacman.d/gnupg/ - local keyring, gpg related

Nothing here about repository access protocols.

Garrett

P.S. Apologies for the excessively high level hand-waving -- again
this is just a brain dump.



More information about the erlang-questions mailing list