[erlang-bugs] Bug in unicode characters_to_list trap

David Buckley <>
Wed May 1 18:35:35 CEST 2013


Simple test session:

[ 17:28 ] :~% erl
Erlang R15B01 (erts-5.9.1) [source] [64-bit] [smp:4:4] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.9.1  (abort with ^G)
1> <<_, RR/binary>> = <<$a,164,161,$b>>.
<<"a¤¡b">>
2> RR.
<<"¤¡b">>
3> unicode:characters_to_list(RR).      
{error,[],<<"a¤¡">>}
4> unicode:characters_to_list(list_to_binary(binary_to_list(RR))).
{error,[],<<"¤¡b">>}

I'm using Debian's default erlang build, but I've verified the bug on
various others, and can't see it in the release notes.

Description: The latter two calls should return the dame value, as
list_to_binary(binary_to_list(RR)) =:= RR.

I would guess that the code in erlang's guts is taking the falure offset
into the binary part as an offset into the full binary. At least, the
return values are consistent with this.

Workaround is just to call list_to_binary(binary_to_list()) on your data
before calling unicode:characters_to_list on it. Or manually offsetting
into the binary yourself in the case of a failed parse.

-- 
David Buckley


More information about the erlang-bugs mailing list