[erlang-questions] string:substr/2 gives confusing error message

Mon Dec 8 00:14:36 CET 2008

On 6 Dec 2008, at 4:18 am, Robert Virding wrote:

> Thinking some more and checking the code. Now substr/2 is defined as
>
>     substr(String, StartPos) -> String.
>
> so substr returns the rest of string starting at StartPos.

It's all in the way you say it.
Something I did while I was a Masters student was to work through
a complete APL-inspired algebra of sequences.  It was typed up but
never published, and I no longer have a copy.  I can remember some
of the basic ideas, one of which was that a *good* algebra of
sequence would satisfy as many "intuitive" laws as exceptionlessly
as possible.  One such law is

	n .take s ++ n .drop s = s

for ALL strings s and for ALL natural numbers n.
You can't make it work quite this easily for negative n, though.
Here .take is the APL up arrow and .drop is the APL down arrow.
There are also laws

	m .take (n .take s) = min(m,n) .take s
	m .drop (n .drop s) = (m+n) .drop s

for ALL strings s and for ALL non-negative m and n.

I see substr(String, StartPos) as a painfully clumsy way to write
	(StartPos - 1) .drop String
which means that it should be regarded as defined for ALL
integers StartPos >= 1.

> Logically this would mean that
>
>     string:substr("a",2) -> []
>
> is perfectly ok while
>
>     string:substr("",2)
>
> should generate an error as there is no string starting at the  
> second element in the string.

But substr/2 isn't talking about ELEMENTS at all; it is talking
about SEGMENTS of strings, rather clumsily.  The fact that there
is no second element is quite irrelevant to the fact that
1 .drop "" is perfectly well defined.

> Looking at substr/3 which is defined as
>
>     substr(String, StartPos, Length) -> String.

And this is just Length .take (StartPos-1) .drop String

It's not just perfectly meaningful for all non-negative integers
Length, all positive integers StartPos, and all lists String,
it's much much easier to work with the algebra if you don't have
strange exception cases all over the place.

It's bad enough that the library should be using a way of
identifying segments that couldn't have been better designed to
induce off-by-one-errors; there's no call for making it even
harder to reason about.

> But I do think that
>
>     string:substr("",2) -> []
>
> is being to kind and should generate an error.

Nope.  This is the unique right answer.

drop(0, L) when is_list(L) ->
     L;
drop(N, []) when is_integer(N), N > 0 ->
     [];
drop(N, [H|T]) when is_integer(N), N > 0 ->
     drop(N-1, T).

take(0, L) when is_list(L) ->
     [];
take(N, []) when is_integer(N), N > 0 ->
     [];
take(N, [H|T]) when is_integer(N), N > 0 ->
     [H | take(N-1, T)];

substr(L, I) -> drop(I-1, L).

substr(L, I, N) -> take(N, drop(I-1, L)).

(The original unpublished paper defined strings basically
  as functions, with take and drop being restriction combined
  with shifting.  In effect, n .take s = s .restricted-to [1,n]
and n .drop s = s .shifted n .restricted-to [1,infinity].)