utf8 in source files

Vlad Dumitrescu vladdu55@REDACTED
Sun Nov 7 21:13:09 CET 2010


Hi!

I've got lately more and more requests (some even with foul language)
about why erlide (the erlang ide for eclipse) doesn't support utf8
encoding of erlang source files. My answer has always been to point
out to the erlang docs where it is clearly stated that source files
are still to be Latin1 encoded, regardless of the recent
unicode-supporting libraries. The documentation
(http://www.erlang.org/doc/apps/stdlib/unicode_usage.html) that says:
"Also the source code is (for now) still expected to be written using
the ISO-latin-1 character set, why Unicode characters beyond that
range cannot be entered in string literals."

The trouble is that the scanner actually accepts utf8 encoding in
literal strings and comments, possibly even in quoted atoms. See the
attached file for an example, it compiles fine even on my Swedish
system (the output from running it is garbled for me, I assume it
would look fine on a properly setup terminal). So users from countries
where Latin1 is useless but utf8 is not, may use the latter instead
without getting any problems.

Eclipse requires the encoding of a file to be specified and of course,
I can't set that to something else than the official Latin1. But
people that used utf8 before and it worked, now get frustrated because
it's no longer accepted when they use erlide. It's easy to blame
erlide for that and ask for a fix because it looks like it works
outside of eclipse...

The simple solution would be to update the docs above to say something
like "Please note that UTF-8 encoded files can be accepted by the
compiler and may work as expected in some environments, but this usage
is not recommended and not supported." Also, comments should be
mentioned alongside literal strings. Would this be acceptable?

The complex solution would be to make sources encoding-aware (maybe as
suggested here http://www.erlang.org/cgi-bin/ezmlm-cgi?4:msp:46892),
but I know this is by no means a simple task. Does the OTP team have
anything scheduled in this area? R15, R16?

best regards,
Vlad
-------------- next part --------------
A non-text attachment was scrubbed...
Name: main.erl
Type: application/octet-stream
Size: 306 bytes
Desc: not available
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20101107/07781f80/attachment.obj>


More information about the erlang-questions mailing list