[erlang-questions] A specification or a bug of io:get_line ?

Kawatake Chiharu <>
Fri Apr 24 08:31:05 CEST 2009


Hi,

I have just started learning Erlang by myself and came across this. I
searched for this in the past questions, but I was not able to find anything
that might be related this. So please let me ask about this. I am using R13B
on Windows. But this also occurred on Linux.

I wrote fragments of code that read a file and print its content like the
following.

----------

-module(rw).

-export([for_each_line/4, print/3, main/1]).

for_each_line(Filename, [_, {encoding, Encoding}]=Mode, F, Args) ->
    case file:open(Filename, Mode) of
        {ok, Device} ->
            F(Device, Encoding, Args);
        {error, Reason} ->
            erlang:error(Reason)
    end.

print(Device, Encoding, Args) ->
    case io:get_line(Device, "") of
        {error, Reason} ->
            erlang:error(Reason),
            file:close(Device);
        eof ->
            file:close(Device);
        Data ->
            io:format("~s~n", [unicode:characters_to_binary(Data,
Encoding)]),
            print(Device, Encoding, Args)
    end.

main(Args) ->
    [Filename] = Args,
    for_each_line(Filename, [read, {encoding, utf8}], fun rw:print/3, []).

------------

I made files encoded in utf-8 and tried to print its content. This worked
fine to some extent.
However, when I tried to read files that contained Japanese characters, I
got an error saying that

------------

$ erl -noshell -s rw main test.txt
{"init terminating in
do_boot",{collect_line,[{rw,print,3},{init,start_it,1},{init,start_em,1}]}}

Crash dump was written to: erl_crash.dump
init terminating in do_boot ()

------------

After I looked into this for a while, I found that it seemed that if a line
started with three consecutive ASCII characters following some Japanese
characters, this error could occur.
For example,

..ほげほげ,ほげほげ,ほげほげ,ほげほげ,ほげほげ,,,ほげほげ,,ほげほげ,ほげほげ,ほげほげ,ほげほげ,ほげほげ,,,

which starts with two dots following Japanese characters, was ok. But

...ほげほげ,ほげほげ,ほげほげ,ほげほげ,ほげほげ,,,ほげほげ,,ほげほげ,ほげほげ,ほげほげ,ほげほげ,ほげほげ,,,

where a dot "." was added at the head of the line, got the error.

If my code has some problem, which might be so more likely, please advise
me.
Please let me know if there is someone who has come across something similar
to this.

I will post the dump file later if necessary.

Best regards.

Chiharu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20090424/16fa2fe4/attachment.html>


More information about the erlang-questions mailing list