[erlang-questions] UTF-8 support.

Håkan Stenholm hokan.stenholm@REDACTED
Mon May 7 02:03:37 CEST 2007


Julio César Carrascal Urquijo (MCTS) wrote:
> Does OTP supports Unicode strings? I was looking for UTF-8 support in
> the module list but couldn't find anything like it.
>   
No not really, string operations (e.g. case conversion) usually assume 
that strings are lists of latin-1 character codes no bigger than 255.

xmerl can produce lists of unicode code points, when parsing xml 
containing various forms of UTF encodings - so there is some support for 
getting unicode data.

Note to other readers: unicode code points don't always match a single 
letter, the letter 'å' can for example be represented by the unicode 
code point for the letter 'a' and the code point for the o-ring.

If you don't need to do string comparisons (which may require various 
forms of unicode normalizations - as certain letters like 'å' have more 
than one representation), is probably easiest to store the unicode data 
either as a utf-8 string (byte string / binary) or as a list of unicode 
code points.
xmerl has some functions (you may need to look at the source code) to 
convert unicode codepoints to/from various UTF encodings.


ps: does anyone know if there are any other useful unicode related tools 
that can be used with erlang ?
> Thanks.
>
>   



More information about the erlang-questions mailing list