[erlang-questions] List comprehension puzzler

Wed Sep 21 04:51:38 CEST 2016

On 21/09/16 1:44 AM, lloyd@REDACTED wrote:
> ISBNS are either 10 or 13-digits. The standard specifies a check-sum algorithm, but that seemed too complicated for my purposes. So, after struggling with the list comprehension puzzler and Dan Gudmundsson's helpful input, I came up with this:
>
> isbn_format_valid(ISBN) ->
>    D = fun(I) -> (I >= $0) and (I =< 9) end,
>    NotIntegers = [I || I <- ISBN, D(I) == true],
>    Flag1 = NotIntegers == [],
>    Flag2 = (length(ISBN) == 13) or (length(ISBN) == 10),
>    Flag1 and Flag2.

I find this confusing.
First off, 10-digit ISBNs have four parts and 13-digit ones five.
- GS1 prefix,
- registration group element,
- registrant element
- publication element,
- check digit.
These things are commonly written separated by spaces or hyphens,
e.g., "978-1-59327-435-1" is a valid ISBN.  (That of LYSE, in fact.)
No, the definition says "The elements MUST .. be separated clearly
by hyphens or spaces when displayed in human readable form."

So we probably want to start by stripping out spaces and hyphens:

     [Space, Hyphen] = " -",
     Stripped = [C || C <- ISBN, C =/= Space, C =/= Hyphen],

or else document that this has already been done.

Strictly speaking, we'd need to check that the parts are
valid, and there's a file we could download from
https://www.isbn-international.org/range_file_generation
but that goes further than anyone but a specialist would need.

Second, all of the elements of a string are necessarily
integers, or it wouldn't _be_ a string.  So "NotIntegers"
is very confusing.

     NonDigits = [C || C <- Stripped, C < $0 orelse C > $9],

Third, this isn't actually right either.  The check digit
may be an X.  You don't want to reject a valid ISBN because
it has an X at the end.  Maybe you don't want to accept
10-digit ISBNs, but again, that should be documented.

We end up with the following code (which has had some very
limited testing):

%   isbn_is_valid(String)
%   is true when, after stripping out spaces and hyphens that
%   are required for human-readable display, String is a 10-
%   or 13-digit ISBN with a valid checksum (and a valid GS1
%   prefix if it's a 13-digit ISBN).  This assumes that the
%   argument is a string (character code list).

-spec isbn_is_valid(ISBN :: string()) -> boolean().

isbn_is_valid(ISBN) ->
     isbn_checksum_ok(stripped_isbn(ISBN)).

%   stripped_isbn(String)
%   returns String without any hyphens or spaces.
%   It does not check that the separators are in sensible
%   places because most of the fields are variable in size.

-spec stripped_isbn(ISBN :: string()) -> string().

stripped_isbn(ISBN) ->
     [Space, Hyphen] = " -",
     [C || C <- ISBN, C =/= Space, C =/= Hyphen].

-define(isdigit(X), ($0 =< X andalso X =< $9)).

%   isbn_checksum_ok(ISBN) reports whether a string
%   not containing hyphens or spaces is a valid 10-digit
%   or 13-digit ISBN, validity defined as valid checksum
%   and valid GS1 field, if any.  The other fields are
%   not checked for validity; that would need a data base
%   lookup.

-spec isbn_checksum_ok(ISBN :: string()) -> boolean().

isbn_checksum_ok([A,B,C,D,E,F,G,H,I,J])  % 10-digit
   when ?isdigit(A), ?isdigit(B), ?isdigit(C), ?isdigit(D), ?isdigit(E),
        ?isdigit(F), ?isdigit(G), ?isdigit(H), ?isdigit(I),
        ( J =:= $X orelse ?isdigit(J) ) ->
     ( (A - $0) * 10 +
       (B - $0) *  9 +
       (C - $0) *  8 +
       (D - $0) *  7 +
       (E - $0) *  6 +
       (F - $0) *  5 +
       (G - $0) *  4 +
       (H - $0) *  3 +
       (I - $0) *  2 +
       (if J =:= $X -> 10 ; true -> J - $0 end)
     ) rem 11 =:= 0;

isbn_checksum_ok([A,B,C,D,E,F,G,H,I,J,K,L,M])  % 13-digit
   when A =:= $9, B =:= $7, $8 =< C, C =< $9,
        ?isdigit(D), ?isdigit(E), ?isdigit(F), ?isdigit(G), ?isdigit(H),
        ?isdigit(I), ?isdigit(J), ?isdigit(K), ?isdigit(L), ?isdigit(M) ->
     ( ( (A-$0) + (C-$0) + (E-$0) + (G-$0) + (I-$0) + (K-$0) + (M-$0) )
     + ( (B-$0) + (D-$0) + (F-$0) + (H-$0) + (J-$0) + (L-$0) ) * 3
     ) rem 10 =:= 0;

isbn_checksum_ok(_) ->
     false.

The 13-digit checksum code has been tweaked to recognise the fact
that according to the 2012 Sixth edition of the ISBN User's Manual,
which is still current, the only valid GS1 prefixes are 978 and 979.

I repeat: this has had some but very limited testing.
It's free and worth every penny.