[erlang-bugs] UTF8 string handling in different erlang:*** functions

Nico Kruber <>
Tue Mar 29 15:06:46 CEST 2011


is it possible that UTF8 strings are not supported by both
erlang:md5/1 and
erlang:list_to_binary/1 (and possibly more?)

I'm getting a bad argument exception when running the following:

> erlang:md5("Wàgrain (Wågrŏã)").                            
** exception error: bad argument
     in function  erlang:md5/1
        called as 
erlang:md5([87,224,103,114,97,105,110,32,40,87,229,103,114,335,227,
                              41])

even simpler, one can call:
> erlang:md5([256]).
** exception error: bad argument
     in function  erlang:md5/1
        called as erlang:md5([256])


for characters larger than 255, this exception is thrown. same for 
erlang:list_to_binary/1.

Both state that the input should be an iodata() or iolist() which are defined 
as:

iodata() = iolist() | binary()
iolist() = [char() | binary() | iolist()]
%  a binary is allowed as the tail of the list

And according to
http://www.erlang.org/doc/reference_manual/typespec.html
a character is any valid integer between 0 and 16#10ffff and it should be this 
way since erlang strings are unicode strings.

If this is correct behaviour, then how do I hash a unicode string without 
using erlang:term_to_binary/1 (which is possibly costly and should be 
unnecessary).


Regards
Nico
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20110329/57a95056/attachment.bin>


More information about the erlang-bugs mailing list