[erlang-questions] byte() vs. char() use in documentation
'Raimo Niskanen'
raimo+erlang-questions@REDACTED
Wed May 4 10:11:42 CEST 2011
On Tue, May 03, 2011 at 03:05:24PM -0500, David Mercer wrote:
> On Tuesday, May 03, 2011, Raimo Niskanen wrote:
>
> > I repeat again. The programmer decides what the bytes mean. The list
> > [0,0,16#21,16#2b] e.g would mean "angstrom sign" if the encoding is
> > UTF-32 big endian. And that is a valid iolist.
> > But [16#212b] is not.
>
> Out of curiosity, why does
>
> unicode:characters_to_binary([16#212b], {utf32, big}).
Man page says:
characters_to_binary(Data,InEncoding) -> binary() | ...
You changed the InEncoding to {utf32,big}.
You want this:
characters_to_binary(Data, InEncoding, OutEncoding) -> binary() | ...
1> unicode:characters_to_binary([16#212b], unicode, {utf32, big}).
<<0,0,33,43>>
>
> return the UTF-8 representation of ?$B"r (Angstrom sign) and not the big-endian
> UTF-32 like I expected?
InEncoding only applies to binaries in the indata since integers
are just Unicode code points and have no encoding:
2> unicode:characters_to_binary([16#212b,<<226,132,171>>], utf8, {utf32, big}).
<<0,0,33,43,0,0,33,43>>
Note: unicode is an alias for utf8 in the unicode module
since utf8 is the default encoding
It is all in the Erlang man page for unicode(3).
--
/ Raimo Niskanen, Erlang/OTP, Ericsson AB
More information about the erlang-questions
mailing list