[erlang-bugs] Binary memory reuse issue in unicode:characters_to_list
James Wheare
james@REDACTED
Tue Aug 13 14:03:01 CEST 2013
Just found this extremely unexpected behaviour when using binary
pattern matching and unicode:characters_to_list
http://pastebin.com/7EYEhu0Z
Given a 2 byte binary, e.g. <<65,128>> (65 = letter "A", 128 = invalid
standalone utf8 byte)
<<Char:8,Rest/binary>> = <<65,128>>,
Char = 65,
Rest = <<128>>.
unicode:characters_to_list(Rest) should error, with {error, [],
<<128>>} but instead is giving {error, [], "A"}
unicode:characters_to_list(<<128>>) produces the desired result even
though it should be identical.
Making a copy will also give the desired result:
Rest2 = <<Rest/binary>>,
unicode:characters_to_list(Rest).
Is this related to binary optimisations detailed here?
http://www.erlang.org/doc/efficiency_guide/binaryhandling.html
Seems like a bug in the unicode nif.
Note that it's not reproducing on all environments, even given the
same erlang version. Even 2 identical linux vms running under
virtualbox but on 2 separate host machines produced different results
(one showed the bug, one didn't)
More information about the erlang-bugs
mailing list