[erlang-questions] Atom Unicode Support

José Valim jose.valim@REDACTED
Mon Feb 1 09:44:16 CET 2016

Thank you Björn!

Should we expect list_to_binary to work?

  list_to_binary(io_lib:format("~s", [Atom]))

list_to_binary expects a list of bytes, instead of a list of codepoints, so
I would except those cases to always raise for a UTF8 encoded atom. The
same way this raises:

  list_to_binary(io_lib:format("~ts", [<<12494/utf8>>]))

On the other hand, the line below works but the result is in the wrong

  list_to_binary(io_lib:format("~s", [<<12494/utf8>>]))

So I would say list_to_binary is behaving as expected and that it should
not change as those "limitations" are there today. Same for port_command,
as it expects iodata. Or am I missing something?

Regarding the raising issue, one possible option is to introduce
atom_to_list(Atom, Encoding), similar to atom_to_binary/2, with the default
encoding of Latin1. If you call atom_to_list(Atom, latin1) with a UTF-8
encoded atom, it will raise. This way the code semantics won't change and
we won't have to worry about UTF-8 atoms creeping in unless explicitly

*José Valim*
Skype: jv.ptec
Founder and Director of R&D

On Mon, Feb 1, 2016 at 8:14 AM, Björn Gustavsson <bjorn@REDACTED> wrote:

> On Sat, Jan 30, 2016 at 9:04 PM, José Valim
> <jose.valim@REDACTED> wrote:
> >
> > With all that said, are there any plans of supporting UTF-8 encoded
> atoms on
> > Erlang R19? If the feature is desired but not planned, I would love to
> > contribute the compiler and bytecode changes above although I will likely
> > need some guidance. If that is an option, I would love to get in touch.
> >
> It is not planned for OTP 19. IMO, the feature is desired,
> but it is probably too late for OTP 19.
> Extending the BEAM format is necessary but not sufficient.
> It is also necessary to make sure that other code in OTP
> doesn't break. For example:
>   list_to_binary(atom_to_list(Atom))
>   list_to_binary(io_lib:format("~s", [Atom]))
>   erlang:port_command(Port, N, atom_to_list(Atom))
> list_to_atom/1 could also potentially be problematic
> if the code expects an exception for any non-latin1
> characters.
> Other things to be done is to update the documentation
> and specs.
> I think that the community could help us there, both
> in collecting a list of things that must be fixed
> or modified, and also in helping fixing them.
> /Björn
> --
> Björn Gustavsson, Erlang/OTP, Ericsson AB
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20160201/fb57a372/attachment.htm>

More information about the erlang-questions mailing list