[erlang-bugs] Bug in unicode characters_to_list trap
David Buckley
isreal-erlang-bugs-at-erlang.org@REDACTED
Wed May 1 18:35:35 CEST 2013
Simple test session:
[ 17:28 ] bucko@REDACTED:~% erl
Erlang R15B01 (erts-5.9.1) [source] [64-bit] [smp:4:4] [async-threads:0] [hipe] [kernel-poll:false]
Eshell V5.9.1 (abort with ^G)
1> <<_, RR/binary>> = <<$a,164,161,$b>>.
<<"a¤¡b">>
2> RR.
<<"¤¡b">>
3> unicode:characters_to_list(RR).
{error,[],<<"a¤¡">>}
4> unicode:characters_to_list(list_to_binary(binary_to_list(RR))).
{error,[],<<"¤¡b">>}
I'm using Debian's default erlang build, but I've verified the bug on
various others, and can't see it in the release notes.
Description: The latter two calls should return the dame value, as
list_to_binary(binary_to_list(RR)) =:= RR.
I would guess that the code in erlang's guts is taking the falure offset
into the binary part as an offset into the full binary. At least, the
return values are consistent with this.
Workaround is just to call list_to_binary(binary_to_list()) on your data
before calling unicode:characters_to_list on it. Or manually offsetting
into the binary yourself in the case of a failed parse.
--
David Buckley
More information about the erlang-bugs
mailing list