[erlang-questions] A specification or a bug of io:get_line ?

pan+eq@REDACTED pan+eq@REDACTED
Thu Apr 30 11:38:42 CEST 2009


Hi!

It seems the last patch was not enough... There were several errors in the 
handling of files with unicode encodings.

Attached is a new patch for the same source file which passes more 
thorough testing. Backout the previous patch before applying this from the 
source code top.

Big thanks to Chiharu for helping in tracking down the problems!

Cheers,
/Patrik

On Fri, 24 Apr 2009, pan+eq@REDACTED wrote:

> Hi!
>
> First, a small reflection about your program: I'm not sure what your 
> io:format in print/3 is supposed to do, it will output UTF-8 "as is" which 
> will work for an UTF-8 terminal in oldshell (and default latin1) mode, but 
> will output strange things on other combinations (like in werl). Better to 
> set the standard output in unicode mode and print with 
> io:format("~ts~n",[Data]) instead, which will work as long as you have an 
> unicode aware terminal (like werl on windows and most modern terminals on 
> Linux).
>
> However, the real problem is that I've let through a bug in the 
> file_io_server in kernel, causing the error...
>
> I've attached a patch for the source tree, hope you can apply that and 
> rebuild kernel. We will of course fix it in the next service release.
>
> Cheers,
> /Patrik, OTP
>
> On Fri, 24 Apr 2009, Kawatake Chiharu wrote:
>
>> Hi,
>> 
>> I have just started learning Erlang by myself and came across this. I
>> searched for this in the past questions, but I was not able to find 
>> anything
>> that might be related this. So please let me ask about this. I am using 
>> R13B
>> on Windows. But this also occurred on Linux.
>> 
>> I wrote fragments of code that read a file and print its content like the
>> following.
>> 
>> ----------
>> 
>> -module(rw).
>> 
>> -export([for_each_line/4, print/3, main/1]).
>> 
>> for_each_line(Filename, [_, {encoding, Encoding}]=Mode, F, Args) ->
>>    case file:open(Filename, Mode) of
>>        {ok, Device} ->
>>            F(Device, Encoding, Args);
>>        {error, Reason} ->
>>            erlang:error(Reason)
>>    end.
>> 
>> print(Device, Encoding, Args) ->
>>    case io:get_line(Device, "") of
>>        {error, Reason} ->
>>            erlang:error(Reason),
>>            file:close(Device);
>>        eof ->
>>            file:close(Device);
>>        Data ->
>>            io:format("~s~n", [unicode:characters_to_binary(Data,
>> Encoding)]),
>>            print(Device, Encoding, Args)
>>    end.
>> 
>> main(Args) ->
>>    [Filename] = Args,
>>    for_each_line(Filename, [read, {encoding, utf8}], fun rw:print/3, []).
>> 
>> ------------
>> 
>> I made files encoded in utf-8 and tried to print its content. This worked
>> fine to some extent.
>> However, when I tried to read files that contained Japanese characters, I
>> got an error saying that
>> 
>> ------------
>> 
>> $ erl -noshell -s rw main test.txt
>> {"init terminating in
>> do_boot",{collect_line,[{rw,print,3},{init,start_it,1},{init,start_em,1}]}}
>> 
>> Crash dump was written to: erl_crash.dump
>> init terminating in do_boot ()
>> 
>> ------------
>> 
>> After I looked into this for a while, I found that it seemed that if a line
>> started with three consecutive ASCII characters following some Japanese
>> characters, this error could occur.
>> For example,
>> 
>> ..¤Û¤²¤Û¤²,¤Û¤²¤Û¤²,¤Û¤²¤Û¤²,¤Û¤²¤Û¤²,¤Û¤²¤Û¤²,,,¤Û¤²¤Û¤²,,¤Û¤²¤Û¤²,¤Û¤²¤Û¤²,¤Û¤²¤Û¤²,¤Û¤²¤Û¤²,¤Û¤²¤Û¤²,,,
>> 
>> which starts with two dots following Japanese characters, was ok. But
>> 
>> ...¤Û¤²¤Û¤²,¤Û¤²¤Û¤²,¤Û¤²¤Û¤²,¤Û¤²¤Û¤²,¤Û¤²¤Û¤²,,,¤Û¤²¤Û¤²,,¤Û¤²¤Û¤²,¤Û¤²¤Û¤²,¤Û¤²¤Û¤²,¤Û¤²¤Û¤²,¤Û¤²¤Û¤²,,,
>> 
>> where a dot "." was added at the head of the line, got the error.
>> 
>> If my code has some problem, which might be so more likely, please advise
>> me.
>> Please let me know if there is someone who has come across something 
>> similar
>> to this.
>> 
>> I will post the dump file later if necessary.
>> 
>> Best regards.
>> 
>> Chiharu
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: unicode_get_line3.diff
Type: text/x-patch
Size: 4578 bytes
Desc: New version of patch for OTP-7974
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20090430/5a06f99d/attachment.bin>


More information about the erlang-questions mailing list