<div dir="ltr">Hi Richard,<div><br></div><div><div>Thank you for the patch. We are going to take the parts related to opening the file, in order to reduce the number of times a file is opened, but we are not going to introduce the proposed flag to the erlc options.</div>
<div><br></div><div>Early on, before the actual introduction of Unicode/UTF-8 encoding, it was decided that Unicode using UTF-8 encoding shall be supported through an encoding instruction in a comment in the beginning of the file (as in Python), which informs the tool chain about the input file encoding format. This approach has the benefit of allowing encoding information in all kind of source code in Erlang/OTP using the same mechanism; that is, it solves issues with both erlang source, yecc, leex, and file:consult. The underlying design philosophy is that the encoding is a property of the file, and not part of the language. </div>
<div><br></div><div>The default file encoding was ISO-Latin-1 until R16, but will be changed to UTF-8 in OTP 17. The intention was that source code does not need to be changed in R16, but adding a comment denoting ISO-Latin-1 encoding ensures that the code can be compiled with the OTP 17 compiler. Likewise, adding a comment denoting UTF-8 encoding allows for Unicode characters with code points > 255 in string and character literals in R16. The same comment will allow for atoms containing any Unicode code point in OTP 18. </div>
<div><br></div><div>As pointed out in previous mails, we did not communicate the impact on the applications clearly enough. Hence, we are now introducing a workaround, in which the preprocessor processes the file again using ISO-Latin-1 encoding if it failed to read the file with the default UTF-8 encoding. This solution, although it may seem awkward, is in line with the original design philosophy. </div>
<div><br></div><div>We anticipate that there will only be a small amount of files for which this automatic workaround needs to kick in. ASCII is valid UTF-8; and thus, only files containing ISO-Latin-1 characters #80 and above are impacted. The workaround is going to be removed in a future release and the corresponding deprecation warning will then be turned into an error. Whether that will happen in OTP 18 or later is for further discussion. </div>
</div><div><br></div><div>Andreas Schumacher, Erlang/OTP, Ericsson AB</div><div><br><div class="gmail_quote"><div style="word-wrap:break-word"><div>
<blockquote type="cite">
<div style="margin:0px">
<span style="font-family:Helvetica;color:rgb(0,0,0)"><b>From: </b></span><span style="font-family:Helvetica">Richard Carlsson <<a href="mailto:carlsson.richard@gmail.com" target="_blank">carlsson.richard@gmail.com</a>><br>
</span></div>
<div style="margin:0px">
<span style="font-family:Helvetica;color:rgb(0,0,0)"><b>Subject: </b>
</span><span style="font-family:Helvetica"><b>Re: [erlang-questions] Fwd: FW: Breaking backwards compatibility in Release 17.0-rc2</b><br>
</span></div>
<div style="margin:0px">
<span style="font-family:Helvetica;color:rgb(0,0,0)"><b>Date: </b></span><span style="font-family:Helvetica">7 Mar 2014 08:01:13 GMT+1<br>
</span></div>
<div style="margin:0px">
<span style="font-family:Helvetica;color:rgb(0,0,0)"><b>To: </b></span><span style="font-family:Helvetica">Andreas Schumacher <<a href="mailto:andreas@erlang.org" target="_blank">andreas@erlang.org</a>>, "<a href="mailto:erlang-questions@erlang.org" target="_blank">erlang-questions@erlang.org</a>"
<<a href="mailto:erlang-questions@erlang.org" target="_blank">erlang-questions@erlang.org</a>><br>
</span></div>
<br>
<div>On 2014-03-06 02:15 , Andreas Schumacher wrote:<br>
<blockquote type="cite">In OTP 17.0-rc{1,2}, a file that is encoded in latin-1 and contains<br>
non-UTF-8/non-ASCII-7 characters, causes a compiler error similar to the<br>
following:<br>
<br>
tst.erl:1: cannot parse file, giving up<br>
tst.erl:1: no module definition<br>
tst.erl:1: cannot translate from UTF-8<br>
<br>
In OTP 17.0, if a file is encoded in latin-1 and contains<br>
non-UTF-8/non-ASCII characters, but does not declare the encoding with a<br>
magic encoding comment at the beginning of the file, epp (the Erlang<br>
code pre-processor) issues a deprecation warning, and processes the file<br>
again, assuming latin-1 encoding.<br>
<br>
In a future major version, preferably in OTP 18, the deprecation warning<br>
will be turned into an error again. That is, only UTF-8 encoded files,<br>
and files that declare the source code encoding at the beginning of the<br>
source code file, will be accepted.<br>
</blockquote>
<br>
Still not good enough. I want to be able to move up to R18 when that <br>
time comes without having to modify files all over our codebase. (And <br>
retrying is an ugly workaround anyway.) The following patch has allowed <br>
me to compile all of our sources under R17 by simply adding <br>
'+{default_encoding,latin1}' to the erlc options in our Makefiles:<br>
<br>
<a href="https://github.com/erlang/otp/pull/276" target="_blank">https://github.com/erlang/otp/pull/276</a><br>
<br>
/Richard<br>
<br>
</div>
</blockquote>
</div>
<br>
</div>
</div><br></div></div>