[erlang-bugs] Bug in unicode characters_to_list trap
Patrik Nyblom
pan@REDACTED
Thu May 2 10:48:10 CEST 2013
Hi David!
On 05/01/2013 06:35 PM, David Buckley wrote:
> Simple test session:
>
> [ 17:28 ] bucko@REDACTED:~% erl
> Erlang R15B01 (erts-5.9.1) [source] [64-bit] [smp:4:4] [async-threads:0] [hipe] [kernel-poll:false]
>
> Eshell V5.9.1 (abort with ^G)
> 1> <<_, RR/binary>> = <<$a,164,161,$b>>.
> <<"a¤¡b">>
> 2> RR.
> <<"¤¡b">>
> 3> unicode:characters_to_list(RR).
> {error,[],<<"a¤¡">>}
> 4> unicode:characters_to_list(list_to_binary(binary_to_list(RR))).
> {error,[],<<"¤¡b">>}
Yep - that's a bug, no doubt...
Can you try a source code patch when I've found a cure?
>
> I'm using Debian's default erlang build, but I've verified the bug on
> various others, and can't see it in the release notes.
>
> Description: The latter two calls should return the dame value, as
> list_to_binary(binary_to_list(RR)) =:= RR.
>
> I would guess that the code in erlang's guts is taking the falure offset
> into the binary part as an offset into the full binary. At least, the
> return values are consistent with this.
Good guess, I agree.
>
> Workaround is just to call list_to_binary(binary_to_list()) on your data
> before calling unicode:characters_to_list on it. Or manually offsetting
> into the binary yourself in the case of a failed parse.
>
Thanks!
/Patrik
More information about the erlang-bugs
mailing list