[erlang-bugs] UTF8 string handling in different erlang:*** functions
Nico Kruber
kruber@REDACTED
Tue Mar 29 15:06:46 CEST 2011
is it possible that UTF8 strings are not supported by both
erlang:md5/1 and
erlang:list_to_binary/1 (and possibly more?)
I'm getting a bad argument exception when running the following:
> erlang:md5("Wàgrain (Wågrŏã)").
** exception error: bad argument
in function erlang:md5/1
called as
erlang:md5([87,224,103,114,97,105,110,32,40,87,229,103,114,335,227,
41])
even simpler, one can call:
> erlang:md5([256]).
** exception error: bad argument
in function erlang:md5/1
called as erlang:md5([256])
for characters larger than 255, this exception is thrown. same for
erlang:list_to_binary/1.
Both state that the input should be an iodata() or iolist() which are defined
as:
iodata() = iolist() | binary()
iolist() = [char() | binary() | iolist()]
% a binary is allowed as the tail of the list
And according to
http://www.erlang.org/doc/reference_manual/typespec.html
a character is any valid integer between 0 and 16#10ffff and it should be this
way since erlang strings are unicode strings.
If this is correct behaviour, then how do I hash a unicode string without
using erlang:term_to_binary/1 (which is possibly costly and should be
unnecessary).
Regards
Nico
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20110329/57a95056/attachment.bin>
More information about the erlang-bugs
mailing list