[erlang-questions] Atom Unicode Support

Björn Gustavsson bjorn@REDACTED
Mon Feb 1 12:59:45 CET 2016


On Mon, Feb 1, 2016 at 9:44 AM, José Valim
<jose.valim@REDACTED> wrote:
> So I would say list_to_binary is behaving as expected and that it should not
> change as those "limitations" are there today. Same for port_command, as it
> expects iodata. Or am I missing something?

My point is that we must look for code in
OTP that will break when the change to
the atoms are made.

As an hypothetical example, say that we
find the following code in some application:

  Str = atom_to_list(Atom),
  .
  .
  .
  port_command(Port, Cmd, Str)

We must look at the context to determine
what we should do. There could be one
of several solutions, for example:

1. If the atoms that can be passed to this
code have been internally generated we
could know that the resulting list is always
safe to send to the port. In that case we
don't need to update the code.

2. If the origin of the atom is unknown,
and the driver cannot handle UTF-8,
the solution could be to return an error
to the caller if the atom contains
non-latin1 characters.

3. If the driver can handle UTF-8 or can
be modified to handle UTF-8, the solution
could be to use atom_to_binary(Atom, utf8)
instead of atom_to_list/1.

Basically, we must look at every atom_to_list/1
in the OTP code base and determine whether
it is safe or if it must be modified in some way.

/Björn

-- 
Björn Gustavsson, Erlang/OTP, Ericsson AB



More information about the erlang-questions mailing list