[erlang-questions] : EEP 10

Raimo Niskanen raimo+erlang-questions@REDACTED
Fri May 16 09:47:32 CEST 2008


On Thu, May 15, 2008 at 08:40:30AM -0500, Paul Fisher wrote:
> On Thu, 2008-05-15 at 10:55 +0200, Raimo Niskanen wrote:
> > EEP 10: Representing Unicode characters in Erlang
> > has been recognized by the EEP editor(s).
> > 
> > http://www.erlang.org/eeps/eep-0010.html
> 
> Bravo!  Some comments:
> 
> 1) "Formatting function section
> In general, why the choice of ~ts for unicode string format specifier?
> All others are single character and ~u is available, so ...

Because ~u is used for io:fread for unsigned int, and we prefer
to have the same "meaning" for both fread and fwrite.

Furthermore the t modifier may be needed for fread ~a and ~c
and fwrite ~p and ~P.

> 
> 
> 2) "Formatting functions" section
> Is this really what was intended?
> 
> "9> io:format(Terminal,"~s",["smörgås"]).
> 
> - would convert the string "smörgås" (Swedish word for sandwich) to
> UTF-8 before sending it to the terminal, ..."
> 
> I would have expected this to send the literal 0..255 latin1 characters
> to the terminal rather than converting to utf-8, Behaving exactly as
> file driver. Conversely, if ~s does not behave in this way, how would
> you get the direct latin1 characters to the terminal? ???
> 
> ???Is it the intent to have the terminal driver simply deal with utf-8,
> converting (possibly) back to latin1 if the locale is not set to utf-8?
> The section goes on to talk about io:read and terminal device driver,
> saying "input should always be expected to be in UTF-8", which does seem
> to indicate that this was the thinking.

Yes. A terminal driver will speak UTF-8 with Erlang. Period.
The terminal driver is responsible for conversion to Latin-1.

> 
> 
> 3) I vote to support utf-16 in the binary support, might as well be
> complete from the start.  The only issue is whether things like reading
> files would automatically deal with the byte-order-mark used in
> (some/most all) utf-16 docs.  Just something else to consider.

Yes. UTF-8 and UTF-16. No timeplan. We will have to think about
the byte-order mark.

> 
> 
> -- 
> paul
> 
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions

-- 

/ Raimo Niskanen, Erlang/OTP, Ericsson AB



More information about the erlang-questions mailing list