[erlang-questions] cookbook entry #1 - unicode/UTF-8 strings
Anthony Ramine
nox@REDACTED
Thu Oct 20 11:18:03 CEST 2011
Le 19 oct. 2011 à 22:14, Michael Uvarov a écrit :
> Q: Is it easy to work with a list of code points?
> A: Both yes and no.
> Advantages:
> I you have an algorithm, which is based on code-paint processing, then
> it will be easy to implement. If you only pass text from point A to
> point B, I suggest keep a string as a binary. Also you can use both
> UTF-8 binaries and lists together to create an iolist from them.
>
> --
> Best regards,
> Uvarov Michael
From what I understand, iolist() have no notion of encoding whatsoever
and don't represent code points or characters, they are just sequences of
bytes.
Even though the typespec documentation says they can contain chars [1],
erl says otherwise:
1> iolist_to_binary([16#10ffff]).
** exception error: bad argument
in function iolist_to_binary/1
called as iolist_to_binary([1114111])
See also how io:format/2's "t" modifier behaves when used with "~s" [2],
iolist() and unicode:charlist() [3] are not the same types.
That has been already discussed on the ml a few months ago [4].
[1] http://www.erlang.org/doc/reference_manual/typespec.html
[2] http://www.erlang.org/doc/man/io.html#format-2
[3] http://www.erlang.org/doc/man/unicode.html
[4] http://erlang.org/pipermail/erlang-questions/2011-May/058012.html
--
Anthony Ramine / @nokusu
Dev:Extend — http://dev-extend.eu/
So as I pray, “Unlimited Erlang Works”
More information about the erlang-questions
mailing list