Long string to short ID

Richard O'Keefe raoknz@REDACTED
Wed Aug 18 14:49:44 CEST 2021


If you mean "a <unique> ID", the answer is that it cannot be done,
because book titles are not unique.
For example, there is a book called "Inferno" by Dan Brown,
and there is another book called "Inferno" by Larry Niven and Jerry Pournelle.
(The latter is thought-provoking; the former is Dan Brown being Dan Brown.)

Discussions of what a book title even _is_ require the skills of a
William Kent of "Data and Reality" fame.
Is a subtitle included?  Is the edition part of the title?  Are
translations that preserve the title the same
book or different books?  Should "Football Girl" and "The Football
Girl" count as the same title?
What do you do with numeric titles like "1632" or ones with non-linear
aspects, like
Great Misteaks with a slash through the "e" in "Misteaks", a caret
under and between the "ks",
and an "e" above and between the "ks"?   (It turns out that I was
thinking of "Incompetence" by
Rob Grant, Incompetence : Rob Grant : 9780575074491 (bookdepository.com) .)
Then there are variations in spelling:"The Discovery of Witchcraft"
(modern spelling) vs
"The Discoverie of Witchcraft" (original spelling), with a semantic
change thrown in for good
measure (it's really the Uncovering of Witchcraft).  Did I mention my
grandfather from near
Tucepi?  If you have two books with identical contents except that one
uses the Cyrillic script and
the other uses the Latin script, do they have the same title or
different titles?  Should they get the
same ID?

To a first approximation, I suppose we could say that anything that is
listed as a book title in
"Books in Print" is a book title, and there are apparently some 12
million titles currently in print
in English.  (Sigh.  I am never going to read them all.  I am never
even going to read all the ones
I would *enjoy* reading.)

Then we have to clarify what "human-readable" means.
Are ISBN-13s "human-readable"?
Do you want a human who reads the ID to have any clue about the title
or contents of the book?

As it happens, I have spent a LOT of time recently filing electronic
copies of reports &c that I've
collected over the decades.  The best answer I have found?

    USE THE TITLE VERBATIM

Anything else WILL lose information.  In fact I've had to ADD author
information in some cases.
"Inferno@REDACTED: "Inferno@REDACTED" for example (if I had those
books on my disc).

Come to think of it, "Data and Reality" is a really good book.
William Kent thought he was writing
about data bases.  When I read it, I thought he was talking about
(symbolic) Artificial Intelligene.
Names (titles) are for people, and are not unique.
Surrogates (IDs) are for computers, and are not self-explanatory.

I think we need to know a heck of a lot more about your use case.

On Sat, 14 Aug 2021 at 07:44, Lloyd R. Prentice <lloyd@REDACTED> wrote:
>
> Hello,
>
> What might be a nifty way to turn a long book title with spaces into a short human-readable ID?
>
> Thanks,
>
> LRP
>
> Sent from my iPad


More information about the erlang-questions mailing list