[erlang-questions] utf8 in source files

Mon Nov 8 17:31:31 CET 2010

On Sun, Nov 7, 2010 at 9:13 PM, Vlad Dumitrescu <vladdu55@REDACTED> wrote:
> Hi!
>
> I've got lately more and more requests (some even with foul language)
> about why erlide (the erlang ide for eclipse) doesn't support utf8
> encoding of erlang source files. My answer has always been to point
> out to the erlang docs where it is clearly stated that source files
> are still to be Latin1 encoded, regardless of the recent
> unicode-supporting libraries. The documentation
> (http://www.erlang.org/doc/apps/stdlib/unicode_usage.html) that says:
> "Also the source code is (for now) still expected to be written using
> the ISO-latin-1 character set, why Unicode characters beyond that
> range cannot be entered in string literals."
>
> The trouble is that the scanner actually accepts utf8 encoding in
> literal strings and comments, possibly even in quoted atoms. See the
> attached file for an example, it compiles fine even on my Swedish
> system (the output from running it is garbled for me, I assume it
> would look fine on a properly setup terminal). So users from countries
> where Latin1 is useless but utf8 is not, may use the latter instead
> without getting any problems.
>
> Eclipse requires the encoding of a file to be specified and of course,
> I can't set that to something else than the official Latin1. But
> people that used utf8 before and it worked, now get frustrated because
> it's no longer accepted when they use erlide. It's easy to blame
> erlide for that and ask for a fix because it looks like it works
> outside of eclipse...
>
> The simple solution would be to update the docs above to say something
> like "Please note that UTF-8 encoded files can be accepted by the
> compiler and may work as expected in some environments, but this usage
> is not recommended and not supported." Also, comments should be
> mentioned alongside literal strings. Would this be acceptable?
>
> The complex solution would be to make sources encoding-aware (maybe as
> suggested here http://www.erlang.org/cgi-bin/ezmlm-cgi?4:msp:46892),
> but I know this is by no means a simple task. Does the OTP team have
> anything scheduled in this area? R15, R16?

The OTP team does not have anything scheduled regarding support for
UTF-8 in sources yet, but
I agree on that we ought to support it.

The question is "only" how it should work to be as useful as possible
and still reasonably enough backward compatible.
I think this is a prefect fit for an EEP. Maybe the link
http://www.erlang.org/cgi-bin/ezmlm-cgi?4:msp:46892
is a good start. Any volunteers?

/Kenneth , Erlang/OTP, Ericsson
>
> best regards,
> Vlad
>
>
> ________________________________________________________________
> erlang-questions (at) erlang.org mailing list.
> See http://www.erlang.org/faq.html
> To unsubscribe; mailto:erlang-questions-unsubscribe@REDACTED
>