[erlang-questions] EEP 10

Paul Fisher pfisher@REDACTED
Thu May 15 15:40:30 CEST 2008


On Thu, 2008-05-15 at 10:55 +0200, Raimo Niskanen wrote:
> EEP 10: Representing Unicode characters in Erlang
> has been recognized by the EEP editor(s).
> 
> http://www.erlang.org/eeps/eep-0010.html

Bravo!  Some comments:

1) "Formatting function section
In general, why the choice of ~ts for unicode string format specifier?
All others are single character and ~u is available, so ...


2) "Formatting functions" section
Is this really what was intended?

"9> io:format(Terminal,"~s",["smörgås"]).

- would convert the string "smörgås" (Swedish word for sandwich) to
UTF-8 before sending it to the terminal, ..."

I would have expected this to send the literal 0..255 latin1 characters
to the terminal rather than converting to utf-8, Behaving exactly as
file driver. Conversely, if ~s does not behave in this way, how would
you get the direct latin1 characters to the terminal? 

Is it the intent to have the terminal driver simply deal with utf-8,
converting (possibly) back to latin1 if the locale is not set to utf-8?
The section goes on to talk about io:read and terminal device driver,
saying "input should always be expected to be in UTF-8", which does seem
to indicate that this was the thinking.


3) I vote to support utf-16 in the binary support, might as well be
complete from the start.  The only issue is whether things like reading
files would automatically deal with the byte-order-mark used in
(some/most all) utf-16 docs.  Just something else to consider.


-- 
paul




More information about the erlang-questions mailing list