[erlang-bugs] Binary memory reuse issue in unicode:characters_to_list

Tue Aug 13 14:03:01 CEST 2013

Just found this extremely unexpected behaviour when using binary
pattern matching and unicode:characters_to_list

http://pastebin.com/7EYEhu0Z

Given a 2 byte binary, e.g. <<65,128>> (65 = letter "A", 128 = invalid
standalone utf8 byte)

<<Char:8,Rest/binary>> = <<65,128>>,
Char = 65,
Rest = <<128>>.

unicode:characters_to_list(Rest) should error, with {error, [],
<<128>>} but instead is giving {error, [], "A"}

unicode:characters_to_list(<<128>>) produces the desired result even
though it should be identical.

Making a copy will also give the desired result:
Rest2 = <<Rest/binary>>,
unicode:characters_to_list(Rest).

Is this related to binary optimisations detailed here?
http://www.erlang.org/doc/efficiency_guide/binaryhandling.html

Seems like a bug in the unicode nif.

Note that it's not reproducing on all environments, even given the
same erlang version. Even 2 identical linux vms running under
virtualbox but on 2 separate host machines produced different results
(one showed the bug, one didn't)