[erlang-questions] Coon - new tool for building Erlang packages, dependency management and deploying Erlang services

Tue Feb 13 12:30:06 CET 2018

On Mon, Feb 12, 2018 at 11:36 PM, Vlad Dumitrescu <vladdu55@REDACTED> wrote:
>
>
> On Mon, Feb 12, 2018 at 10:58 PM, Joe Armstrong <erlang@REDACTED> wrote:
>>
>> On Mon, Feb 12, 2018 at 10:06 PM, Vlad Dumitrescu <vladdu55@REDACTED>
>> wrote:
>> > On Mon, Feb 12, 2018 at 9:06 PM, Jesper Louis Andersen
>> > <jesper.louis.andersen@REDACTED> wrote:
>> >> Using a cryptographic checksum for a package and then pointing the name
>> >> to
>> >> the checksum would have saved Node.js npm package manager a lot of
>> >> headaches
>> >> when people remove, rename or otherwise destroy packages.
>> >> It also allows you to comply with legal requests with a sunset period.
>> >> As
>> >> in "I hear you, and the name will be given to you. But we give people 6
>> >> months time to upgrade before we remove the old checksummed packages".
>> >> I'm interested in why someone did not try this yet. Or if one tried,
>> >> why
>> >> it didn't work out. It seems very obvious to build a
>> >> content-addressable-store for your packages.
>> >
>> >
>> > I'm not sure I understand this completely. Using the checksum of a
>> > package
>> > as identifier is IMHO only useful if it is used in the dependencies list
>> > of
>> > other packages. If the deps list uses names (and people will use names
>> > anyway, not checksums), then the problem remains that in case a package
>> > is
>> > renamed and another one reuses the name, we don't know to which one a
>> > reference points.
>>
>> The dependency list should be a list of checksums and NOT a list of
>> names - this list of
>> checksums has itself a checksum (the "true" name of the package).
>>
>> A human readable name is just an alias to a checksum - two different
>> human readable names
>> are the "same" if they are aliases to the same checksum.
>>
>> Basically files should be named by their checksums - for fairly
>> obvious reasons of
>> convenience tools should hide or reveal these names when necessary or
>> appropriate.
>>
>> For a given content the checksum is unique.
>>
>> To handle renamings you just need a lookup table of
>>
>>       {Name, Time, Checksum} tuples that tracks changes to the name of
>> the checksum over time
>
>
> Thanks for the explanation, I understand the mechanics, but not the "real
> world usage".
>
> * A checksum referes to a {package_name, time} tuple, so there is no way to
> refer to the package in general. Except by its name.
>
> * Even if there was, nobody is going to say "For a gizmo processing library,
> we have to choose between
> B17556DB683000BA50370B16C0619DF1337E7AF7ECBF7D64FBF8D1D6BCE3109B and
> 7ACC7D785B5ABE8A6E9ADBDE926A24E481F29956DD8B4DF49E3E4E7BCC92A018, which one
> is better?" So people will use names.

Given the hash you just have to look up in a database to find the
associated meta-data

 You might find that
B17556DB683000BA50370B16C0619DF1337E7AF7ECBF7D64FBF8D1D6BCE3109B
  was created on <date> by <person> key <publicKey>
  originally called <do_this.erl> at <time>
  name changed to <better_name.erl> at <time1>
  name changed to <even_better_name.erl> at <time2>
  keywords <"super","file","manager">
  called <gör_det_här.erl> in <swedish>
  called <gwnewch_hyn.erl> in <welsh>

Assume this is a write only log which can only be changed by the creator

Later somebody will want find code that does something so they use an
out-of-band method to
discover the name of the program (the keyword etc. can be indexed by a
search engine or something).

The discover "the name of the program" (it might by better_name.erl)
-- this might be unique (ie pointed to my one hash) - or their might
be several programs with the same name - they can discover this my
searching a database
of (hashes x names) - if there are several version they'll have to
choose one of them.

Most of this should be invisible to the user.

It would be nice (possible) to have multiple names for the same thing
- names can change because we think of
a better name, but we might like to name English, Chinese, French,
Russian names for the same thing.

The name of the "thing" is just a content hash - the English (Chinese,
...) name is just a convenient shortcut
to refer to the thing.

We do this all the time.  When I say "Thomas" I might mean one of
about half a dozen Thomas's that I know.
If the context is unclear I have to add some more information.

"By Thomas I mean my son"

If I was foolish enough to have several sons and give then all the
same name - I'd have to add more
context.

"By Thomas I mean the individual who's NDA hash is

"ABE8A6E9ADBDE926A24E481F29956DD8B4DF49E3E4E7BCC"

(Is there a DNA hash? - If you scanned my genome twice and computed
the SHA of the raw sequence of
base pairs you'd get different values)

In many old religions if had to know the "real name" of a god or
spirit in order to command
the spirit to do something - the true name of course was the SHA of
the spirits content.

We'll soon be there again

    "Alexa, tell CoffeMaker to make a cup of coffee"

   "I'm sorry I don't know who CoffeeMaker is"

   "Alexa, tell,
24E481F29956DD8B4DF49E3E24E481F29956DD8B4DF49E3EDEFGH45432134 to
    make a cup of coffee"

   "24E481F29956DD8B4DF49E3E24E481F29956DD8B4DF49E3EDEFGH45432134 is
making a cup of coffee"

   :-)

/Joe

>
> * Now the project is presumably configured in a file, written by a
> programmer - again the name will be used. The hash can be retrieved and
> stored by the build tool, so that we get a hard reference...
>
> * ... which is exactly what rebar and mix do with hex.pm (if I get it
> right), except they use the version string instead of timestamp. So if
> hex.pm keeps track of timestamps and of historical mappings between names
> and hashes, then it's done!
>
> * However, the imprecision of using names remains because we're humans.
> Tools already use hashes.
>
> Am I misunderstanding something?
>
> best regards,
> Vlad
>
>