[erlang-questions] Strange difference between construction and matching of binaries
Wed Dec 23 16:24:27 CET 2015
When playing with a new testing tool for Erlang programs, we discovered
the following difference between construction and matching of binaries,
which, although we understand from an implementation point-of-view, we
still find sufficiently weird and worthy of at least some discussion here.
The simplest way of describing the difference between construction and
matching of binaries is the following interaction with the Erlang shell:
Eshell V7.2.1 (abort with ^G)
1> <<42:7>> = <<42:7>>.
2> <<42:6>> = <<42:6>>.
3> <<42:5>> = <<42:5>>.
** exception error: no match of right hand side value <<10:5>>
For those that find the above surprising, it should be pointed out that
the fine reference manual
contains the following note:
When constructing binaries, if the size N of an integer segment is
too small to contain the given integer, the most significant bits of
the integer are silently discarded and only the N least significant
bits are put into the binary.
So, the next line one may want to type in the shell could be:
4> <<42:5>> =:= <<234:5>>.
This may be a bit surprising but is fine in some sense. The problem is
that the fine reference manual nowhere explains what happens during
matching with segments that either contain concrete values (as in the
examples above) or variables that are bound to values that do not fit in
the size of their segment. From what can be seen in the above examples,
apparently something different happens to these segments when used in
matching instead of when used in construction.
Now, the problem with this difference between construction and matching
of binaries containing values that do not fit in their segments is that
it breaks many of the invariants that functional programmers (and their
compilers!) expect to hold. For example, the following clause heads are
not all the same:
foo(<<Int:5>>) when Int =:= 42 ->
foo(Bits) when Bits =:= <<42:5>> ->
and, perhaps surprisingly, only the third clause matches with <<10:5>>
(as well as <<42:5>>, <<106:5>>, <<234:5>>, ...). I am willing to bet
many may find the above as breaking the principle of least astonishment.
With this post, I want to initiate some discussion about the above in
the hope that we can come up with better semantics and implementation
for matching with bound binary segments than the current behavior. (Or
at least formally document this difference.)
More information about the erlang-questions