[erlang-questions] UTF-8 support.
Håkan Stenholm
hokan.stenholm@REDACTED
Mon May 7 02:03:37 CEST 2007
Julio César Carrascal Urquijo (MCTS) wrote:
> Does OTP supports Unicode strings? I was looking for UTF-8 support in
> the module list but couldn't find anything like it.
>
No not really, string operations (e.g. case conversion) usually assume
that strings are lists of latin-1 character codes no bigger than 255.
xmerl can produce lists of unicode code points, when parsing xml
containing various forms of UTF encodings - so there is some support for
getting unicode data.
Note to other readers: unicode code points don't always match a single
letter, the letter 'å' can for example be represented by the unicode
code point for the letter 'a' and the code point for the o-ring.
If you don't need to do string comparisons (which may require various
forms of unicode normalizations - as certain letters like 'å' have more
than one representation), is probably easiest to store the unicode data
either as a utf-8 string (byte string / binary) or as a list of unicode
code points.
xmerl has some functions (you may need to look at the source code) to
convert unicode codepoints to/from various UTF encodings.
ps: does anyone know if there are any other useful unicode related tools
that can be used with erlang ?
> Thanks.
>
>
More information about the erlang-questions
mailing list