[erlang-questions] A specification or a bug of io:get_line ?

Fri Apr 24 12:09:42 CEST 2009

Hi!

First, a small reflection about your program: I'm not sure what your 
io:format in print/3 is supposed to do, it will output UTF-8 "as is" which 
will work for an UTF-8 terminal in oldshell (and default latin1) mode, but 
will output strange things on other combinations (like in werl). Better to 
set the standard output in unicode mode and print with 
io:format("~ts~n",[Data]) instead, which will work as long as you have an 
unicode aware terminal (like werl on windows and most modern terminals on 
Linux).

However, the real problem is that I've let through a bug in the 
file_io_server in kernel, causing the error...

I've attached a patch for the source tree, hope you can apply that and 
rebuild kernel. We will of course fix it in the next service release.

Cheers,
/Patrik, OTP

On Fri, 24 Apr 2009, Kawatake Chiharu wrote:

> Hi,
>
> I have just started learning Erlang by myself and came across this. I
> searched for this in the past questions, but I was not able to find anything
> that might be related this. So please let me ask about this. I am using R13B
> on Windows. But this also occurred on Linux.
>
> I wrote fragments of code that read a file and print its content like the
> following.
>
> ----------
>
> -module(rw).
>
> -export([for_each_line/4, print/3, main/1]).
>
> for_each_line(Filename, [_, {encoding, Encoding}]=Mode, F, Args) ->
>    case file:open(Filename, Mode) of
>        {ok, Device} ->
>            F(Device, Encoding, Args);
>        {error, Reason} ->
>            erlang:error(Reason)
>    end.
>
> print(Device, Encoding, Args) ->
>    case io:get_line(Device, "") of
>        {error, Reason} ->
>            erlang:error(Reason),
>            file:close(Device);
>        eof ->
>            file:close(Device);
>        Data ->
>            io:format("~s~n", [unicode:characters_to_binary(Data,
> Encoding)]),
>            print(Device, Encoding, Args)
>    end.
>
> main(Args) ->
>    [Filename] = Args,
>    for_each_line(Filename, [read, {encoding, utf8}], fun rw:print/3, []).
>
> ------------
>
> I made files encoded in utf-8 and tried to print its content. This worked
> fine to some extent.
> However, when I tried to read files that contained Japanese characters, I
> got an error saying that
>
> ------------
>
> $ erl -noshell -s rw main test.txt
> {"init terminating in
> do_boot",{collect_line,[{rw,print,3},{init,start_it,1},{init,start_em,1}]}}
>
> Crash dump was written to: erl_crash.dump
> init terminating in do_boot ()
>
> ------------
>
> After I looked into this for a while, I found that it seemed that if a line
> started with three consecutive ASCII characters following some Japanese
> characters, this error could occur.
> For example,
>
> ..¤Û¤²¤Û¤²,¤Û¤²¤Û¤²,¤Û¤²¤Û¤²,¤Û¤²¤Û¤²,¤Û¤²¤Û¤²,,,¤Û¤²¤Û¤²,,¤Û¤²¤Û¤²,¤Û¤²¤Û¤²,¤Û¤²¤Û¤²,¤Û¤²¤Û¤²,¤Û¤²¤Û¤²,,,
>
> which starts with two dots following Japanese characters, was ok. But
>
> ...¤Û¤²¤Û¤²,¤Û¤²¤Û¤²,¤Û¤²¤Û¤²,¤Û¤²¤Û¤²,¤Û¤²¤Û¤²,,,¤Û¤²¤Û¤²,,¤Û¤²¤Û¤²,¤Û¤²¤Û¤²,¤Û¤²¤Û¤²,¤Û¤²¤Û¤²,¤Û¤²¤Û¤²,,,
>
> where a dot "." was added at the head of the line, got the error.
>
> If my code has some problem, which might be so more likely, please advise
> me.
> Please let me know if there is someone who has come across something similar
> to this.
>
> I will post the dump file later if necessary.
>
> Best regards.
>
> Chiharu
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: unicode_get_line.diff
Type: text/x-patch
Size: 773 bytes
Desc: Patch for OTP-7974
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20090424/b020892c/attachment.bin>