[erlang-questions] Some comments on EEP 9 (binary: module)
Richard A. O'Keefe
ok@REDACTED
Fri Mar 7 04:32:54 CET 2008
1. Given that the module for lists is called 'lists', not 'list',
it is rather confusing that the module for binaries is called
'binary',
instead of the expected 'binaries'.
On the other hand, given that the module for strings is called
"string",
maybe it's 'lists' that has the wrong name. Something needs to
be done
about naming consistency in modules for data types.
2. What do you return if you look for something and it isn't there?
For some reason, people seem to like returning an out-of-range
index
at the wrong end. BASIC does this, Smalltalk does it, but that
does not
make it right. Life gets *so* much simpler if match(Haystack,
Needle)
returns an index past the *right* end of the haystack. Suppose,
for
example, we have input with an optional comment; we want to
remove it.
[Mind you, EEP 9 is handicapped by starting from a system where
binary slicing uses just about the worst possible convention,
but that is
another and sadder story.]
[Oh yes, the documentation for the erlang: module gets
erlang:split_bionary/2
wrong. It says that the range for Pos is 1..size(Bin), but 0 is
*rightly*
allowed. Pos is actually the size of the first part, which is
just right.]
Example. Suppose we are given a line of text from some
configuration file
as a binary. It might contain a # comment or it might not. Our
only interest is
in getting rid of it. In a rational design, where
match(Haystack, Needle)
returns the length of the longest prefix of Haystack *not*
containing Needle,
we just do
{Wanted,_} = split_binary(Given, match(Given, <<"#">>))
With the scheme actually proposed, we have to do
case match(Given, <<"#">>)
of 0 -> Wanted = Given
; N -> {Wanted,_} = split_binary(Given, N-1)
end
3. I appreciate that slicing binaries is supposed to be cheap, but I
still
think it would be nice if match had a 3rd argument, saying how
many bytes
at the beginning of Haystack to skip. If it weren't 4pm on a
Friday with
my office floor still to tidy up, I could give examples of why
this can
make life simpler.
4. I agree that the proposed binary:split/2 function is useful, but
the name
is far too close to split_binary/2 for comfort. A longer name
such as
binaries:split_with_separator(Binary, Separator_Binary)
might make for less confusion. Better still, why not make this
like
string:tokens/2, which really has exactly the same purpose except
for the
data type it applies to?
5. Ever since I met SNOBOL 4, I have known the operation of removing
outer
blanks from a string as trimmming. It's a little odd to find it
called
stripping. By analogy with ecdysiasis (hem hem) I would expect
stripping
to remove visible outer stuff. I wish string:strip/[1,2,3] could
be
renamed.
6. In the functions
unsigned_to_bin/1
bin_to_unsigned/1
why is the word "binary" abbreviated to "bin"?
7. "nth" is a very strange name for a substring operation.
I would prefer
subbinary(Binary, Offset[, Length])
0 =< Offset =< byte_size(Binary)
0 =< Length =< byte_size(Binary) - Offset
which would make this compatible with the existing
split_binary(Binary, Offset)
function.
While I have been picky about some of the details, it seems to me
that this is a good beginning in a good direction.
More information about the erlang-questions
mailing list