[erlang-questions] EEP 10
Paul Fisher
pfisher@REDACTED
Thu May 15 15:40:30 CEST 2008
On Thu, 2008-05-15 at 10:55 +0200, Raimo Niskanen wrote:
> EEP 10: Representing Unicode characters in Erlang
> has been recognized by the EEP editor(s).
>
> http://www.erlang.org/eeps/eep-0010.html
Bravo! Some comments:
1) "Formatting function section
In general, why the choice of ~ts for unicode string format specifier?
All others are single character and ~u is available, so ...
2) "Formatting functions" section
Is this really what was intended?
"9> io:format(Terminal,"~s",["smörgås"]).
- would convert the string "smörgås" (Swedish word for sandwich) to
UTF-8 before sending it to the terminal, ..."
I would have expected this to send the literal 0..255 latin1 characters
to the terminal rather than converting to utf-8, Behaving exactly as
file driver. Conversely, if ~s does not behave in this way, how would
you get the direct latin1 characters to the terminal?
Is it the intent to have the terminal driver simply deal with utf-8,
converting (possibly) back to latin1 if the locale is not set to utf-8?
The section goes on to talk about io:read and terminal device driver,
saying "input should always be expected to be in UTF-8", which does seem
to indicate that this was the thinking.
3) I vote to support utf-16 in the binary support, might as well be
complete from the start. The only issue is whether things like reading
files would automatically deal with the byte-order-mark used in
(some/most all) utf-16 docs. Just something else to consider.
--
paul
More information about the erlang-questions
mailing list