[erlang-questions] bit syntax: 0-sized segments

Fri Mar 8 01:45:40 CET 2013

Integers in Erlang do not have silly size restrictions
reflecting the underlying hardware.  It's not the case
that only 8, 16, 32, 64 make sense as integer sizes,
for example.

Let B and V be integers such that
  B >= 0
  0 <= V < 2**B
then
  <<V:B>>
is a bitstring containing exactly B bits such that
  <<R:B>> = <<V:B>>
will exactly recover R == V.

The interesting thing here is that B == 0 is NOT a special case.

It is not, or _should_ not, be in any way surprising.

What _has_ repeatedly caused surprised expressed in this mailing
list is the quiet truncation of integer values outside the [0,2**B)
range.  That would definitely justify an exception.

Suppose I am constructing an XML compressor, taking advantage of
knowing the DTD.  About to emit an element, I want to say "let P
be the number of elements allowed here, counting #PCDATA as an
element.  Let B be the smallest integer such that 2**B >= P.
Let V be the zero-origin index of the element type.  Now encode
V:B."  Considering the number of elements where only #PCDATA is
allowed, quite often I am going to want to encode 0:0."

Zero is a perfectly good size, even for an integer.

Floats are very very different.
Erlang *does* let a hardware size show through.
The *only* size that makes sense is 64.
And you _do_ get an exception if you specify the type as 'float'
and the size as anything that doesn't resolve to 64.

> When one moves to integer segments the above property does not make much sense anymore (esp. since bit_size is not defined for anything other than bitstrings). In particular, the current implementation of binary pattern matching has chosen to return an "arbitrary" integer, namely 0, as the result.

But it is not arbitrary at all.  It is forced by the rule for non-zero sizes.
If the legal range for B bits (where B > 0) is 0..2**B-1, then the legal
range for 0 bits *has* to be 0..0.  This is the *only* consistent value.

> I can may well see that many would consider the following binding for X to 0 a bit weird.
> 
> 1> <<X:0/integer>> = <<42:0>>.

That is a completely different issue.  The thing that is weird here is
in the *expression*, not the pattern, and it's allowing a value that
does not in fact fit into the field and quietly truncating it.

<<X:8/integer>> = <<257:8>>

gives you X = 1.  *THAT* is weird, but it has nothing whatever to do with
zero sizes, and banning the perfectly sensible zero sizes will do nothing
to stop the weirdness.

> Moreover, the situation is arguably even more weird for floats:
> 
> 3> <<F:0/float>> = <<42:0>>.
> <<>>
> 4> F.
> 0.0

*That* I grant you.  Since the only legal size in a construction is 64,
and you get an exception if you try any other number, then the only legal
size for a float in a pattern should also be 64.
> 
> I am not so convinced that pattern matching with 0-size segments make sense for types other than bitstrings (binaries).

It doesn't make sense for floats.
But it _does_ make sense for integers.

More precisely, it makes sense for *unsigned* integers.

A *signed* integer has to be at least one bit, because
there has to be somewhere to put the sign, otherwise it
isn't signed.

So:

	We AGREE that size 0 is sensible for bit strings.
	We AGREE that size 0 is not sensible for floats.
	We AGREE that size 0 is not sensible for signed integers.

Do we really disagree much about unsigned integers?

	We AGREE that an explicit size 0 is odd enough to
	deserve a compiler _warning_.  After all, if you
	_know_ you don't want any bits, why mention them?

A warning is not a refusal to compile.

Our disagreement seems to be limited to

	- whether an unsigned integer with a size of zero
	  determined at run time has semantics forced by
          and consistent with the semantics of nonzero
          sizes and should certainly be allowed or is so
          weird that it should raise an exception.