[erlang-questions] UTF8 and EDoc

Ngoc Dao ngocdaothanh@REDACTED
Mon Oct 5 11:31:49 CEST 2009


This is my fix to make EDoc work with Japanese (R13B02-1, .erl and
overview.edoc files are saved in UTF-8). I think it will work for
other languages:

1. At edoc_lib:write_file/4

Change

file:open(File)

to

file:open(File, [write, {encoding, utf8}])

This is better than my previous dirty hack.

2. At edoc_tags:parse_tags/5

Change

case dict:fetch(Name, How) of
    text ->
        parse_tags(Ts, How, Env, Where, [T | Ts1]);

to

case dict:fetch(Name, How) of
    text ->
        Data = unicode:characters_to_list(list_to_binary(T#tag.data)),
        T2 = T#tag{data = Data},
        parse_tags(Ts, How, Env, Where, [T2 | Ts1]);

Regards,
Ngoc


On Wed, Sep 30, 2009 at 6:37 PM, Richard Carlsson
<carlsson.richard@REDACTED> wrote:
> Ngoc Dao wrote:
>> When I use EDoc library in Erlang R13B02-1 to create document with
>> Japanese characters in the doc comments, there is error:
>
> Yes, this is a known problem. The short answer is that the input
> encoding for Erlang source code is defined to be Latin-1. That is,
> if you put things like Japanese or Russian characters in the
> source files, you are breaking the rules to begin with. (If it's only
> in comments, and using UTF-8, it will not prevent the compiler from
> skipping the comments and compiling the program, but you can't
> expect anything else to work.)
>
> What would be needed is something like a \u-escaping preprocessing
> stage, as specified for Java. But then, the tools must also know
> about \u escape sequences and turn them back into the proper code
> point in UTF-8 or whatever.
>
>    /Richard
>


More information about the erlang-questions mailing list