[erlang-questions] Some comments on EEP 9 (binary: module)

Fri Mar 7 04:32:54 CET 2008

1.  Given that the module for lists is called 'lists', not 'list',
     it is rather confusing that the module for binaries is called  
'binary',
     instead of the expected 'binaries'.

     On the other hand, given that the module for strings is called  
"string",
     maybe it's 'lists' that has the wrong name.  Something needs to  
be done
     about naming consistency in modules for data types.

2.  What do you return if you look for something and it isn't there?
     For some reason, people seem to like returning an out-of-range  
index
     at the wrong end.  BASIC does this, Smalltalk does it, but that  
does not
     make it right.  Life gets *so* much simpler if match(Haystack,  
Needle)
     returns an index past the *right* end of the haystack.  Suppose,  
for
     example, we have input with an optional comment; we want to  
remove it.
     [Mind you, EEP 9 is handicapped by starting from a system where
      binary slicing uses just about the worst possible convention,  
but that is
      another and sadder story.]
     [Oh yes, the documentation for the erlang: module gets  
erlang:split_bionary/2
      wrong.  It says that the range for Pos is 1..size(Bin), but 0 is  
*rightly*
      allowed.  Pos is actually the size of the first part, which is  
just right.]

    Example.  Suppose we are given a line of text from some  
configuration file
    as a binary.  It might contain a # comment or it might not.  Our  
only interest is
    in getting rid of it.  In a rational design, where
	match(Haystack, Needle)
    returns the length of the longest prefix of Haystack *not*  
containing Needle,
    we just do
	{Wanted,_} = split_binary(Given, match(Given, <<"#">>))
     With the scheme actually proposed, we have to do
	case match(Given, <<"#">>)
           of 0 -> Wanted = Given
            ; N -> {Wanted,_} = split_binary(Given, N-1)
         end

3.  I appreciate that slicing binaries is supposed to be cheap, but I  
still
     think it would be nice if match had a 3rd argument, saying how  
many bytes
     at the beginning of Haystack to skip.  If it weren't 4pm on a  
Friday with
     my office floor still to tidy up, I could give examples of why  
this can
     make life simpler.

4.  I agree that the proposed binary:split/2 function is useful, but  
the name
     is far too close to split_binary/2 for comfort.  A longer name  
such as
	binaries:split_with_separator(Binary, Separator_Binary)
     might make for less confusion.  Better still, why not make this  
like
     string:tokens/2, which really has exactly the same purpose except  
for the
     data type it applies to?

5.  Ever since I met SNOBOL 4, I have known the operation of removing  
outer
     blanks from a string as trimmming.  It's a little odd to find it  
called
     stripping.  By analogy with ecdysiasis (hem hem) I would expect  
stripping
     to remove visible outer stuff.  I wish string:strip/[1,2,3] could  
be
     renamed.

6.  In the functions
	unsigned_to_bin/1
	bin_to_unsigned/1
     why is the word "binary" abbreviated to "bin"?

7.  "nth" is a very strange name for a substring operation.
     I would prefer
	subbinary(Binary, Offset[, Length])
	0 =< Offset =< byte_size(Binary)
	0 =< Length =< byte_size(Binary) - Offset
     which would make this compatible with the existing
	split_binary(Binary, Offset)
     function.

While I have been picky about some of the details, it seems to me
that this is a good beginning in a good direction.