Long string to short ID

Hugo Mills hugo@REDACTED
Fri Aug 13 22:19:30 CEST 2021


On Fri, Aug 13, 2021 at 03:44:29PM -0400, Lloyd R. Prentice wrote:
> Hello,
> 
> What might be a nifty way to turn a long book title with spaces into a short human-readable ID?

   Depends rather on what purpose you want to put this ID to.

   One solution would be to hash it (with, say sha256). If the hash is
too long for "short", truncate it. Note that this is not a
globally-unique value, as there are lots of books with identical
titles.

   If you want a globally unique identifier for printed books, then
ISBN is a reasonable one to use -- it's not precisely unique (there
have been errors assugning the same ISBN to two different books, for
example), but it's pretty good for most purposes.

   If you want an actual globally unique identifier, then some form of
UUID would do the job (UUIDv4 is the easiest). Alternatively, you
could register a DOI prefix and assign numbers inside your own
numberspace within the DOI system.

   If you want something vaguely human-readable, try dropping all the
stop-words (the, a, an, in, on, ...), all the vowels and all the
spaces. Truncate at whatever your idea of "short" is. Like the hashing
approach, it's not unique in the slightest.

   It all depends on your use-case.

   Hugo.

-- 
Hugo Mills             | Great films about cricket: Interview with the Umpire
hugo@REDACTED carfax.org.uk |
http://carfax.org.uk/  |
PGP: E2AB1DE4          |


More information about the erlang-questions mailing list