[erlang-questions] Bit syntax matching gotchas

Wed Feb 3 07:17:02 CET 2016

There are some gotchas in the bit syntax that comes up now and then on
the mailing list, and also as a bug report at the end of last year:
http://bugs.erlang.org/browse/ERL-44

We have decided to try do something about it in OTP 19. We have
discussed various solutions internally in the OTP group, and I have
spent far too much time lately thinking about it.

Here follows first my summary of the issues, and then my suggestion
how the compiler should be modified.

BACKGROUND ABOUT BIT SYNTAX CONSTRUCTION

When constructing binaries, there is an implicit masking of the
values. All of the following constructions give the same result:

<<255>>
<<16#FF>>
<<16#FFFF>>
<<-1>>

There have been complaints about the implicit masking behaviour, but
there is a lot of code that depends on it, so it would be unwise to
change it.

THE PROBLEM

There is no similar masking when matching values. That means that all
of the following expressions will all fail to match:

<<-1>> = <<-1>>
<<-1/unsigned>> = <<-1>
<<16#FF>> = <<16#FFFF>>
<<-12345/signed>> = <<-12345/signed>>

Let's look at how the compiler internally implements matching. Take
this function as an example:

f(<<-1:8/unsigned>>) -> ok.

It will be rewritten to:

f(<<V:8/unsigned>>) when V =:= -1 -> ok.

That is, an unsigned value (in the range 0-255) will be stored in the
variable V, which will then be compared to -1.

POSSIBLE SOLUTION #1

The most obvious solution is probably to let the compiler warn for the
above cases. The matching would still fail. The developer will need to
fix their code. For example:

<<-1/signed>> = <<-1>>

POSSIBLE SOLUTION #2

There is one problem with the solution #1. It is not possible to
produce a warning for the following example:

f(Val) ->
  <<Val:8>> = <<Val:8>>,
  Val.

So in addition to warning when possible, another solution is to mask
values also when matching. Internally, the compiler could rewrite the
function to something like:

f(Val) ->
  <<NewVar:8>> = <<Val:8>>,
  Val = NewVar band 16#FF,
  Val.

Similar rewriting should be done for literal integer, so the following
expression would now match:

<<-1>> = <<-1>>

WHICH SOLUTION?

Just to make to sure that I don't reject solution #2 just because it
seems like a lot work to implement, I have actually implemented it.

Now that I have implemented solution #2, I want to reject it.

The reason I reject it is that the matching previously bound variables
is uncommon. Even in the compiler test suites it is uncommon (test
suites typically match bound variables more often than production code
do).

Therefore, solution #2 would make behaviour of matching more
consistent with construction, but would not significantly increase the
number of correct programs. Also, if clauses that previously didn't
match start to match, then code that has not been executed before will
be executed. Code that has not been tested usually doesn't work.

Solution #1 would point all cases when literal integers could not
possibly match and force the developer to fix them.

Therefore I choose solution #1.

YOUR INPUT?

Are there better way to fix bit syntax matching? Anything I have
forgotten or not thought about?

/Björn

-- 
Björn Gustavsson, Erlang/OTP, Ericsson AB