[erlang-questions] unicode: what about printing terms?

Vlad Dumitrescu vladdu55@REDACTED
Mon Oct 22 17:01:54 CEST 2012


Hi!

On Mon, Oct 22, 2012 at 4:42 PM, Patrik Nyblom <pan@REDACTED> wrote:
> On 10/22/2012 03:28 PM, Vlad Dumitrescu wrote:
> How is printing going to work when atoms/variables can be unicode?
>
> Well, when reading a binary, you need to know if it's UTF-8 or latin1, but
> you know that a "string" (a list interpreted as text) or an atom in R18
> always contain "Unicode" (latin1 codepoints is a subset of Unicode
> codepoints). The io module translates things to Erlang Unicode
> representation if needed and sends it to the io_server. The io_server in
> turn decides how to output this. Either in UTF-8 if it's a Unicode capable
> terminal (or a Werl window, where the driver for the window then converts it
> further to 16bit calls... *shudder*) or in any encoding set for a file. If
> the file is restricted to latin1, Unicode characters > 255 cannot be output
> (exception error:no_translation), if it's a eight-bit terminal they will be
> output as \{...}. The need for ~ts is solely for how to interpret the
> *input* data, the io_server is responsible for translating it to the output
> device.
>
> Maybe the two documents in stdlib users guide:
> http://www.erlang.org/doc/apps/stdlib/users_guide.html
> can help clear up the things I seem to be unable to explain properly.

I used http://www.erlang.org/doc/man/io.html#fwrite-1 as reference and
there ~s and ~ts are documented as options for output... I think the
problem is that we're talking about slightly different things :-)

So it means that for files, the encoding is defined when opening them
and for the console it is whatever the environment sets it to (and
good luck if there's a mismatch with the sent data)? When debugging a
live telecom node one often has to go through several gateways, and
not all of them have new OS versions with UTF-8 support, I hope that
they just pass the data as-is and not mangle it.

And when encoding terms to external format, how will atom names be
encoded? We must be able to read them from external programs too (Java
nodes, C nodes, etc) and from older versions of Erlang.

regards,
Vlad



More information about the erlang-questions mailing list