Long string to short ID

Hugo Mills hugo@REDACTED
Sat Aug 14 15:00:26 CEST 2021


   One thing that strikes me here -- if you want something (only
locally unique) that's for the user's consumption, you could probably
do a lot worse than simply ask them for an ID. That way, their ID
doesn't change when they decide that "Untitled 8" isn't a good title
for their book...

   You probably need to be able to handle a change of title anyway --
if your ID is based on the title, will that change too? If you've
asked the user for a short ID, can you change it? (And if not, how
will they feel about having "Unttld8" as the ID for their book for
ever more?)

   Hugo.

On Fri, Aug 13, 2021 at 04:34:07PM -0400, Lloyd R. prentice wrote:
> Thanks all,
> 
> Hugo, I like your third idea. I've been thinking about programming a stop word filtering function anyway. Plus, in my use case all of the  books are owned by the author so uniqueness is unlikely to be a problem.   
> 
> I can't  use ISBNs, since the ids are for books under development.  Bit I will definitely use them in other parts of my application.
> 
> I did program one idea:
> 
> make_id(String, First, Second) ->
>    List = string:tokens(String, " "),
>    F = lists:nth(First, List),
>    S = lists:nth(Second, List),
>    F ++ "_" ++ S.
> 
>  make_id(String, First) ->
>    List = string:tokens(String, " "),
>    F = lists:nth(First, List),
>    F.
> 
> It nicely fulfills the short and readable criteria and enables focus on two most significant words in the title, but I can't see a way to automate assignment of values to First and Second. So I played with just selecting the first or first two words in the title. But it makes me uncomfortable.
> 
> make_id(String) ->
>    List = string:tokens(String, " "),
>    case length(List) > 1 of
>       true ->   F = lists:nth(1, List),
>                 S = lists:nth(2, List),
>                 F ++ "_" ++ S;
>       false -> lists:nth(1, List)
>    end.
> 
> Best wishes,. Much appreciate the help.
> 
> LRP
> 
> 
> 
> 
> 
> 
> On Fri, Aug 13, 2021, at 4:19 PM, Hugo Mills wrote:
> > On Fri, Aug 13, 2021 at 03:44:29PM -0400, Lloyd R. Prentice wrote:
> > > Hello,
> > > 
> > > What might be a nifty way to turn a long book title with spaces into a short human-readable ID?
> > 
> >    Depends rather on what purpose you want to put this ID to.
> > 
> >    One solution would be to hash it (with, say sha256). If the hash is
> > too long for "short", truncate it. Note that this is not a
> > globally-unique value, as there are lots of books with identical
> > titles.
> > 
> >    If you want a globally unique identifier for printed books, then
> > ISBN is a reasonable one to use -- it's not precisely unique (there
> > have been errors assugning the same ISBN to two different books, for
> > example), but it's pretty good for most purposes.
> > 
> >    If you want an actual globally unique identifier, then some form of
> > UUID would do the job (UUIDv4 is the easiest). Alternatively, you
> > could register a DOI prefix and assign numbers inside your own
> > numberspace within the DOI system.
> > 
> >    If you want something vaguely human-readable, try dropping all the
> > stop-words (the, a, an, in, on, ...), all the vowels and all the
> > spaces. Truncate at whatever your idea of "short" is. Like the hashing
> > approach, it's not unique in the slightest.
> > 
> >    It all depends on your use-case.
> > 
> >    Hugo.
> > 

-- 
Hugo Mills             | Jenkins! Chap with the wings there! Five rounds
hugo@REDACTED carfax.org.uk | rapid!
http://carfax.org.uk/  |                 Brigadier Alistair Lethbridge-Stewart
PGP: E2AB1DE4          |                                Dr Who and the Daemons


More information about the erlang-questions mailing list