[erlang-questions] byte() vs. char() use in documentation

Raimo Niskanen raimo+erlang-questions@REDACTED
Mon May 2 11:14:15 CEST 2011


On Mon, May 02, 2011 at 12:01:49AM +0100, James Churchman wrote:
> more of a question than an actual answer, but in erlang can erlang strings ( therefore io-lists) be utf-16?

A string is a list of unicode code points.

An IO-list is a list of binaries or bytes.

> 
> I assume that binaries are obviously only ever utf8 representation, but a list of ints can obviously exceed number above 255..

You can choose your binary representation. See erlang man page unicode(3).

> 
> so maybe (??) the answer is
> 
> a) iolist CAN be a  char() (.. this is surely especially true if the data is only being messages threw erlang from other systems)

No. byte().

> 
> b) the binary to list are a bit less easy

Compare erlang:binary_to_list/1 and erlang:list_to_binary/1 with the
corresponding functions in module 'unicode'.

> 
> basically it can't be a char(), because it will always have started off as an 8bit ( utf8 ) representation so it will always come back as a list of byte() but in the general case, it's returning an io-list and that can be a char()
> 
> is this correct? and in that case does that make the bif's xml doc file in fact correct?

The documentation is incorrect. Once there was no difference between char()
and byte(). char() ment a ISO-8859-1 character which is the same
size as byte().


> 
> James
> 
> On 28 Apr 2011, at 17:26, Kostis Sagonas wrote:
> 
> > In the Erlang documentation, the language of types and specs makes a clear distinction between the following two types:
> > 
> >    byte() :: 0..255
> >    char() :: 0..16#10ffff
> > 
> > See http://erlang.org/doc/reference_manual/typespec.html#id72693
> > 
> > I think that nowadays there are very good reasons to have this distinction.
> > 
> > 
> > In trying to fix a bug today, I happened to notice that some key types of Erlang are inconsistent with this view in the Erlang/OTP documentation (In http://erlang.org/doc/man/erlang.html), most notably:
> > 
> >    iolist() :: [char() | binary() | iolist()]
> > 
> >  binary_to_list(Binary) -> [char()]
> >  binary_to_list(Binary, Start, Stop) -> [char()]
> >  bitstring_to_list(Bitstring) -> [char()|bitstring()]
> > 
> > and:
> > 
> >    BitstringList :: [BitstringList | bitstring() | char()]
> > 
> > which actually triggered this mail.
> > 
> > I think all the occurrences of char() above should read byte() instead.
> > Right?
> > 
> > If yes, could somebody at OTP (or some kind volunteer) please clean up this mess?  (I can provide a fix for the documentation of the 'erlang' module if you want me to.)
> > 
> > Kostis
> > _______________________________________________
> > erlang-questions mailing list
> > erlang-questions@REDACTED
> > http://erlang.org/mailman/listinfo/erlang-questions
> 
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions

-- 

/ Raimo Niskanen, Erlang/OTP, Ericsson AB



More information about the erlang-questions mailing list