[erlang-questions] byte() vs. char() use in documentation

Wed May 4 10:11:42 CEST 2011

On Tue, May 03, 2011 at 03:05:24PM -0500, David Mercer wrote:
> On Tuesday, May 03, 2011, Raimo Niskanen wrote:
> 
> > I repeat again. The programmer decides what the bytes mean. The list
> > [0,0,16#21,16#2b] e.g would mean "angstrom sign" if the encoding is
> > UTF-32 big endian. And that is a valid iolist.
> > But [16#212b] is not.
> 
> Out of curiosity, why does
> 
> 	unicode:characters_to_binary([16#212b], {utf32, big}).

Man page says:
	characters_to_binary(Data,InEncoding) -> binary() | ...

You changed the InEncoding to {utf32,big}.
You want this:
	characters_to_binary(Data, InEncoding, OutEncoding) -> binary() | ...

1> unicode:characters_to_binary([16#212b], unicode, {utf32, big}).
<<0,0,33,43>>

> 
> return the UTF-8 representation of ?$B"r (Angstrom sign) and not the big-endian
> UTF-32 like I expected?

InEncoding only applies to binaries in the indata since integers
are just Unicode code points and have no encoding:

2> unicode:characters_to_binary([16#212b,<<226,132,171>>], utf8, {utf32, big}).
<<0,0,33,43,0,0,33,43>>

	Note: unicode is an alias for utf8 in the unicode module
	      since utf8 is the default encoding

It is all in the Erlang man page for unicode(3).

-- 

/ Raimo Niskanen, Erlang/OTP, Ericsson AB