Long string to short ID

Michael Truog mjtruog@REDACTED
Wed Aug 18 19:22:47 CEST 2021


The main limiting factor is likely the filesystem and modern filesystems 
are typically limiting you to 255 characters.  A title shouldn't have 
problems with that limitation.

I try to have a year prefix on files to keep track of when they are 
created (often with month and day as YYYYMMDD) to ensure the information 
isn't lost if the file gets copied.  Having the creation time as a 
prefix can make the file easier to find because you often know roughly 
what it should be when you are attempting to find the file.  A year 
prefix would be a way of distinguishing different editions of the same 
title with the same author.

On 8/18/21 5:49 AM, Richard O'Keefe wrote:
> If you mean "a <unique> ID", the answer is that it cannot be done,
> because book titles are not unique.
> For example, there is a book called "Inferno" by Dan Brown,
> and there is another book called "Inferno" by Larry Niven and Jerry Pournelle.
> (The latter is thought-provoking; the former is Dan Brown being Dan Brown.)
>
> Discussions of what a book title even _is_ require the skills of a
> William Kent of "Data and Reality" fame.
> Is a subtitle included?  Is the edition part of the title?  Are
> translations that preserve the title the same
> book or different books?  Should "Football Girl" and "The Football
> Girl" count as the same title?
> What do you do with numeric titles like "1632" or ones with non-linear
> aspects, like
> Great Misteaks with a slash through the "e" in "Misteaks", a caret
> under and between the "ks",
> and an "e" above and between the "ks"?   (It turns out that I was
> thinking of "Incompetence" by
> Rob Grant, Incompetence : Rob Grant : 9780575074491 (bookdepository.com) .)
> Then there are variations in spelling:"The Discovery of Witchcraft"
> (modern spelling) vs
> "The Discoverie of Witchcraft" (original spelling), with a semantic
> change thrown in for good
> measure (it's really the Uncovering of Witchcraft).  Did I mention my
> grandfather from near
> Tucepi?  If you have two books with identical contents except that one
> uses the Cyrillic script and
> the other uses the Latin script, do they have the same title or
> different titles?  Should they get the
> same ID?
>
> To a first approximation, I suppose we could say that anything that is
> listed as a book title in
> "Books in Print" is a book title, and there are apparently some 12
> million titles currently in print
> in English.  (Sigh.  I am never going to read them all.  I am never
> even going to read all the ones
> I would *enjoy* reading.)
>
> Then we have to clarify what "human-readable" means.
> Are ISBN-13s "human-readable"?
> Do you want a human who reads the ID to have any clue about the title
> or contents of the book?
>
> As it happens, I have spent a LOT of time recently filing electronic
> copies of reports &c that I've
> collected over the decades.  The best answer I have found?
>
>      USE THE TITLE VERBATIM
>
> Anything else WILL lose information.  In fact I've had to ADD author
> information in some cases.
> "Inferno@REDACTED: "Inferno@REDACTED" for example (if I had those
> books on my disc).
>
> Come to think of it, "Data and Reality" is a really good book.
> William Kent thought he was writing
> about data bases.  When I read it, I thought he was talking about
> (symbolic) Artificial Intelligene.
> Names (titles) are for people, and are not unique.
> Surrogates (IDs) are for computers, and are not self-explanatory.
>
> I think we need to know a heck of a lot more about your use case.
>
> On Sat, 14 Aug 2021 at 07:44, Lloyd R. Prentice <lloyd@REDACTED> wrote:
>> Hello,
>>
>> What might be a nifty way to turn a long book title with spaces into a short human-readable ID?
>>
>> Thanks,
>>
>> LRP
>>
>> Sent from my iPad



More information about the erlang-questions mailing list