[erlang-questions] UTF8 and EDoc

Tomas Abrahamsson tomas.abrahamsson@REDACTED
Mon Oct 5 21:38:52 CEST 2009


> Ngoc Dao wrote:
>> When I use EDoc library in Erlang R13B02-1 to create document with
>> Japanese characters in the doc comments, there is error:

Richard Carlsson wrote:
> Yes, this is a known problem. The short answer is that the input
> encoding for Erlang source code is defined to be Latin-1. [...]
> What would be needed is something like a \u-escaping preprocessing
> stage, as specified for Java. But then, the tools must also know
> about \u escape sequences and turn them back into the proper code
> point in UTF-8 or whatever.

An option could be to adopt the way it is done in Python:
it (re)uses the editor's encoding declaration. If it finds the text
   -*- coding: utf-8 -*-  or  vim: set fileencoding=utf-8 :
on the first or second line of the source file, then it sets
the encoding for the entire source file accordingly. (It also
understands unicode byte-order marks at the beginning
of the file, which apparently makes life easier in editors
on Windows.)

See http://www.python.org/peps/pep-0263.html for details.

An advantage with this scheme seems to be that it fits nicely
with editors. They already know how to handle this.

It would probably require the Erlang compiler, edoc, and other tools
to be modified to know about source file encodings, though.

I suppose that with the \u-escaping, existing tools would continue
to work without modification, but it would be more work for the
programmer to type the text in as \u-seqences, unless editors
already know how to do such a transformation on the fly?

If no such encoding declaration is found, Python assumes ASCII,
but Erlang could maybe assume Latin-1. If Python finds non-ASCII
characters in a file with no encoding declaration, then it spits
out an error like this (wrapped for readability):

  prompt# python /tmp/x.py
    File "/tmp/x.py", line 3
  SyntaxError: Non-ASCII character '\xe5' in file /tmp/x.py on line 3,
  but no encoding declared; see http://www.python.org/peps/pep-0263.html
  for details
  prompt# cat /tmp/
  #! /usr/bin/env python

  print 'åäö'

BRs
Tomas


More information about the erlang-questions mailing list