[erlang-questions] Fwd: Breaking backwards compatibility in Release 17.0-rc2

Thu Mar 13 08:26:51 CET 2014

Hi Richard,

Thank you for the patch. We are going to take the parts related to opening
the file, in order to reduce the number of times a file is opened, but we
are not going to introduce the proposed flag to the erlc options.

Early on, before the actual introduction of Unicode/UTF-8 encoding, it was
decided that Unicode using UTF-8 encoding shall be supported through an
encoding instruction in a comment in the beginning of the file (as in
Python), which informs the tool chain about the input file encoding format.
This approach has the benefit of allowing encoding information in all kind
of source code in Erlang/OTP using the same mechanism; that is, it solves
issues with both erlang source, yecc, leex, and file:consult. The
underlying design philosophy is that the encoding is a property of the
file, and not part of the language.

The default file encoding was ISO-Latin-1 until R16, but will be changed to
UTF-8 in OTP 17. The intention was that source code does not need to be
changed in R16, but adding a comment denoting ISO-Latin-1 encoding ensures
that the code can be compiled with the OTP 17 compiler. Likewise, adding a
comment denoting UTF-8 encoding allows for Unicode characters with code
points > 255 in string and character literals in R16. The same comment will
allow for atoms containing any Unicode code point in OTP 18.

As pointed out in previous mails, we did not communicate the impact on the
applications clearly enough. Hence, we are now introducing a workaround, in
which the preprocessor processes the file again using ISO-Latin-1 encoding
if it failed to read the file with the default UTF-8 encoding. This
solution, although it may seem awkward, is in line with the original design
philosophy.

We anticipate that there will only be a small amount of files for which
this automatic workaround needs to kick in. ASCII is valid UTF-8; and thus,
only files containing ISO-Latin-1 characters #80 and above are impacted.
The workaround is going to be removed in a future release and the
corresponding deprecation warning will then be turned into an error.
Whether that will happen in OTP 18 or later is for further discussion.

Andreas Schumacher, Erlang/OTP, Ericsson AB

 *From: *Richard Carlsson <carlsson.richard@REDACTED>
 *Subject: * *Re: [erlang-questions] Fwd: FW: Breaking backwards
compatibility in Release 17.0-rc2*
 *Date: *7 Mar 2014 08:01:13 GMT+1
 *To: *Andreas Schumacher <andreas@REDACTED>, "erlang-questions@REDACTED"
<erlang-questions@REDACTED>

On 2014-03-06 02:15 , Andreas Schumacher wrote:

In OTP 17.0-rc{1,2}, a file that is encoded in latin-1 and contains
non-UTF-8/non-ASCII-7 characters, causes a compiler error similar to the
following:

  tst.erl:1: cannot parse file, giving up
  tst.erl:1: no module definition
  tst.erl:1: cannot translate from UTF-8

In OTP 17.0, if a file is encoded in latin-1 and contains
non-UTF-8/non-ASCII characters, but does not declare the encoding with a
magic encoding comment at the beginning of the file, epp (the Erlang
code pre-processor) issues a deprecation warning, and processes the file
again, assuming latin-1 encoding.

In a future major version, preferably in OTP 18, the deprecation warning
will be turned into an error again. That is, only UTF-8 encoded files,
and files that declare the source code encoding at the beginning of the
source code file, will be accepted.

Still not good enough. I want to be able to move up to R18 when that
time comes without having to modify files all over our codebase. (And
retrying is an ugly workaround anyway.) The following patch has allowed
me to compile all of our sources under R17 by simply adding
'+{default_encoding,latin1}' to the erlc options in our Makefiles:

https://github.com/erlang/otp/pull/276

    /Richard
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20140313/87f6ec1d/attachment.htm>