[erlang-questions] byte() vs. char() use in documentation

James Churchman jameschurchman@REDACTED
Mon May 2 21:43:33 CEST 2011


So just for my own understanding, and as it seems extremely important (
strings are quite important these days!), as it stands now:

iolists cant can only ( officially?) contain utf8? ( as no utf8 code point
will exceed  255, like latin1 / asci, and are therefor are all byte() )

strings can be of utf8 utf16 or utf32, but only the utf8 version is allowed
in an iolist? ( and therefore if you wanted an "iolist" ( eg a non flat list
of chars) that contained utf 16 or 32 code points you would have to stick
exclusively to lists ( strings) and not binaries and use lists:flatten
before you finished with it, to remove all the nested lists )

binaries can be of any unicode type..

also there does seem to be a needed distinction between char() and byte() as
they are not the same at all, but the documentation is wrong as at the
moment iolists can infact only contain byte() not char()

the suggested direction is to repair the docs so that they specify only
allowing 0~255 ints( byte() ) in iolists rather than allowing io-lists to
contain any string as they did before the introduction of unicode / in the
days of latin1 etc.. ?


i think that that goes agents most ( even erlang implementers :-) ) opinion
of what an iolist is ( that being a list of any valid string or binary) but
maybe ( to raise a totally different problem) would prevent the possibility
of an iolist having a mixed unicode type and still begin "valid" ( even tho
i guess this is still possible as binaries can in fact be other utf
representations)



On 2 May 2011 11:13, Kostis Sagonas <kostis@REDACTED> wrote:

> Raimo Niskanen wrote:
>
>>
>> This became messy when char() was re-defined from latin-1 character
>> to unicode character. That affected string() that affected iolist()
>> and the latter was incorrect.
>>
>> We must clean up the mess.
>>
>
> Right.  The sooner it happens the better it is.
>
>  ... Either by completing the notion of char()
>>
>> being unicode and hence rewriting iolist() to contain byte() and binary(),
>> or by reverting to char() being latin-1 char and using unicode:char()
>> and unicode:string() where that is correct...
>>
>
> Please, by all means do the former.  The latter will only cause havoc
> everywhere.  For starters, I do not see any need in having two different
> basic types (byte() and char()) denoting (pretty much) the same thing. The
> only thing this does is cause unnecessary confusion to newcomers (and
> apparently to some old-timers too).  Second, if you choose the latter you
> will eventually have to change lots of type inference code, because I
> promise you I will not do this, and believe me you don't want to go there...
> (The Vietnam jungle is probably a friendlier place ;) )
>
> Cheers,
> Kostis
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20110502/5ab54c86/attachment.htm>


More information about the erlang-questions mailing list